I'm a complete newbie when it comes to Elastic, deployed few in Docker containers to quickly ingest data and dig in with Kibana, but that was it. Luckily for me Justin is absolute beast when it comes to all things Elastic - he just looked at my node and right on the spot explained what's wrong with it and how to fix/improve.
Basically my default setup was running 5 shards for each of the indices stored in the system, and I had quite a few daily indices already there - we're talking months of DNS research data and web spider runs across thousands of websites... all repeated daily. This means the optimisation to be really effective needs to also deal with what's in there, not just new data I will be adding.
Plan:
- Change default template to run only 1 shard and 0 replicas - it's a single node deployment, so anything more complex doesn't make much sense.
- Use reindex API to rewrite all of existing indices as single shard versions, the deleting the old ones using 5 shards - there's no other way to do it than through reindexing.
- My indices are treated append-only on the day, then become read-only, so we can merge the segments - leaving technical details behind, this will mean no random access later, just linear file reads, but that's perfectly acceptable in my particular use scenario.
Let's do it!