for file in /var/lib/mongodb/* ; do vmtouch -t -m 10G $file done
If you are restoring from EBS snapshot, the dd and vmtouch utilities are also a great way to touch all of your data and make sure that it’s been downloaded from S3. Touching your data will only read the data into memory, however, and ideally you should warm up both your data files and your indexes. For that you can do a natural sort on all of your collections, or a full table scan, or search for something guaranteed not to be there — any of these will load both your indexes and your data into RAM.
for file in /var/lib/mongodb/* ; do time dd if=$file of=/dev/null bs=16m done
This problem gets trickier if you have lots of data. If you have a terabyte of data and only 64 gigs of RAM, how do you choose what to load in to memory?Our answer to this was to write a pair of utilities: mongo_gatherops.rb and mongo_preheat.rb. The first script runs on the old primary prior to the switch and samples the current ops every quarter second for a configurable number of samples, then sorts and outputs the list of most-active collections to a file. So this command gathers all the collections accessed for 30 minutes (7200 quarter-second samples), and outputs a sorted list of collections:
$ ruby mongo_gatherops.rb 7200 > top_collections_20130306
You can then copy the list to the secondary and use it as an input to the preheat script, which runs a full table scan on each of the collections and reads its indexes into memory.
$ ruby mongo_preheat.rb top_collections_20130306
This takes quite a while to run, but it’s worth it. Once your working set is loaded into RAM, you can elect a new primary with no site outages or degradation of performance.We’ve made these tools available on our public github repo, at https://github.com/ParsePlatform/Ops. Enjoy!