Saturday, 17 November 2018

Slimming down 1-node Elastic cluster

If you ever ran Elastic Search especially quick and dirty - single node and default config, you will notice the health is always showing yellow and that it's a proper hog for the system. Well, yes, it will be, especially in default config, as my good friend Justin Borland pointed out.

I'm a complete newbie when it comes to Elastic, deployed few in Docker containers to quickly ingest data and dig in with Kibana, but that was it. Luckily for me Justin is absolute beast when it comes to all things Elastic - he just looked at my node and right on the spot explained what's wrong with it and how to fix/improve.

Basically my default setup was running 5 shards for each of the indices stored in the system, and I had quite a few daily indices already there - we're talking months of DNS research data and web spider runs across thousands of websites... all repeated daily. This means the optimisation to be really effective needs to also deal with what's in there, not just new data I will be adding.

Plan:
  1. Change default template to run only 1 shard and 0 replicas - it's a single node deployment, so anything more complex doesn't make much sense.
  2. Use reindex API to rewrite all of existing indices as single shard versions, the deleting the old ones using 5 shards - there's no other way to do it than through reindexing.
  3. My indices are treated append-only on the day, then become read-only, so we can merge the segments - leaving technical details behind, this will mean no random access later, just linear file reads, but that's perfectly acceptable in my particular use scenario.

Let's do it!

1. Setting general template to use 1 shard and no replicas is easy:

2. Reindexing of all indices can be done using this script, provided as-is (i.e. worked for me, use at your own risk). Warning: this step deleted all my visualisations and dashboards!

3. Merging segments - this should be ran periodically; it accounts for about a half of disk reclaim achieved in this exercise.


Was it worth it? 

What is the actual benefit, if any?

Actually it's quite a massive difference, even Justin didn't expect it to work out so well for me:

  • JVM Heap use decreased by 62% (32GB is what I allocated, host has more memory)
  • Disk space used by data decreased by 56%
  • Primary shards count (obviously) decreased by 79.5% which translates to document count decreased by almost 89%
  • Max response time decreased by 22%
  • Cluster health shows green instead of yellow

Pics or it didn't happen - here's before and after adjustment:

Before

After


Thanks Justin, that's amazing!