The Parse platform relies heavily on MongoDB. We use MongoDB for a variety of workloads, such as routing and storing application data, monitoring, real-time API request performance analysis, and billing and backend platform analytics. We love the flexibility and richness of the feature set that MongoDB provides, as well as the rock-solid high availability and primary elections.
So we are incredibly excited about the upcoming MongoDB 2.6.0 production release. You can read the full release notes here, but I’d like to highlight a few of the changes and features we are most excited about.
First and foremost, the index build enhancements. You will now be able to build background indexes on secondaries! For those who are unfamiliar with the way mongo performs indexing, the default behavior is to build indexes in the foreground on both primaries and secondaries. Foreground indexing means the index op will grab the global lock and no other database operations will be able to execute until the index has finished building (this includes killing the index build). Obviously this is not a reasonable option for those of us who ensure indexes are built in the normal request path. You can instruct the primary to build indexes in the background, which lets the indexing operation yield to other ops, but until now there has been no similar functionality on the secondaries. Parse makes extensive use of read queries to secondaries for things like our push services, and we have had to implement a complicated set of health checks to verify that secondaries are not locked and indexing while we try to read from them. Background indexes on secondaries will make this process much simpler and more robust.
Another terrific indexing improvement is the ability to resume interrupted index builds.
We are also very excited about a number of query planner improvements. The query planner has been substantially rewritten in 2.6.0, and we are eager to take it for a test drive. Mongo has also now implemented index intersection, which allows the query planner to use more than one index when planning a query.
The explain() function has been beefed up and they have added a whole suite of methods for introspecting your query plan cache. In the past we have often been somewhat frustrated trying to infer which query plan is being used and why there are execution differences between replica set members so it is great to have these decisions exposed directly.
Another interesting change is that PowerOf2Sizes is now the default storage allocation strategy. I imagine this is somewhat controversial, but I think it’s the right call. PowerOf2 uses more disk space, but is far more resistant to fragmentation. An ancillary benefit is that padding factors are no longer relevant. One issue we have had at Parse (that no one else in the world seems to have, to be fair) is that we cannot do initial syncs or repairDatabase() because it resets all the padding factors to 1.0. This causes all updates or writes to move bits around disk for weeks to come as the padding factors are relearned, which in turn hoses performance. The inability to do initial sync or repair means we have had no way of reclaiming space from the database.
The hard-coded maxConns limit is also being lifted. Previously your connection limit was set to 70% of ulimit or 20k connections, whichever is lower, but the hard cap is now gone. This totally makes sense and I am glad it has been lifted. However you should still be wary of piling on tens of thousands of connections, because each connection uses 1mb of memory and you do not want to starve your working set of RAM.
Here’s another thing I missed the first dozen or so times I read through the release notes: QUERY RUNTIME LIMITS! MongoDB now lets you tag a cursor or a command with the maxTimeMS() method to limit the length of time a query is allowed to run. This is a thrilling change. Parse (and basically everyone else who runs mongo at scale) has a cron job that runs every minute and kills certain types of queries that have been running past their useful lifespan (e.g. their querying connection has vanished) or are grabbing a certain type of lock and not yielding. If maxTimeMS() works as advertised, the days of the kill script may be gloriously numbered.
Ok, so those are the delicious goodie bits. Lastly let’s take a look at a painful but necessary change that I am afraid is going to take a lot of people by surprise: stricter enforcement of index key length. In all previous versions of mongo, it would allow you to insert a key with an indexed value larger than 1024 bytes, and it would simply warn you that the document would not be indexed. In 2.6 it will start rejecting those writes or updates by default. This is unquestionably the correct behavior, but will probably be very disruptive for a lot of mongo users when previously accepted writes start to break. They have added a flag to optionally preserve the old behavior, but all mongo users should be thinking about how to move their data sets to a place where this restriction is acceptable. The right answer here is probably some sort of prefix index or hashed index, depending on the individual workload.
There is a lot of exceptionally rich feature development and operational enhancements in this 2.6 release. We have been smoke testing the secondaries in production for some time now and we’re very much looking forward to upgrading our fleet. Be sure to check out the rest of the 2.6 release notes for more great stuff!