Smart Indexing at Parse

We love running databases at Parse! We also believe that mobile app developers shouldn’t have to be DBAs to build beautiful, performant apps.

No matter what your data model looks like, one of the most important things to get right is your indexes. If your data is not indexed at all, every query will have to perform a full table scan to return even a single document. But if your data is indexed appropriately, the number of documents scanned to return a correct query result should be low.

We host over 200k mobile apps at Parse, so obviously we can not sit down and individually determine the correct indexes for each app. Even if we did, this would be an ineffective mechanism for generating indexes because developers can change their app schemas or query access patterns at any time. So instead we rely on our algorithmically generated smart indexing strategy.

We perform two types of index creation logic. The first generates simple indexes for each API request, and the second does offline processing to pick good compound indexes based on real API traffic patterns. In this blog post I will cover the strategies we use to determine simple single-column indexes. We will detail our compound indexing strategies in a followup post.

At a high level, our indexing strategy attempts to pick good indexes based on a) the order of operators’ usefulness and b) the expected entropy of the value space for the key. We believe that every query should have an index; we choose to err on the side of too many indexes rather than too few. (Every index causes an additional write, but this is a problem that is easier to solve operationally than unindexed queries.)

Operator usefulness is determined primarily based on its effectiveness in providing the smallest search space for the query. The order of operators’ usefulness is:

  • equals
  • <, <=, =>, >
  • prefix string matches
  • $in
  • $ne
  • $nin
  • everything else

For data types, we heavily demote booleans, which have very low entropy and are not useful to index. We also throw out relations/join tables, since no values are stored for these keys. We heavily promote geopoints since MongoDB won’t run a geo query without a geo index. Other data types are ranked by their expected entropy of the value space for the key. Data types in order of usefulness are:

  • array
  • pointers
  • date
  • string
  • number
  • other

We score each query according to the above metrics, and run ensureIndex() on the three top-scoring fields for each query. If the query includes a $or, we compute the top three indexes for each branch of the $or.

When we first started using smart indexing, all indexes were ensured inline with each API request. This created some unfortunate pile-on behavior any time a large app had schema or query pattern changes. Instead of ensuring the index in the request path, we now drop the indexing job in a queue where it is consumed by an indexer worker. This ensures that no more than one index can be generated at a time per replica set, which protects us from these pile-on effects.

Even the best indexing strategy can be defeated by suboptimal queries. Please see Abhishek’s primer from last week on how to write queries that aren’t terrible.

Charity Majors
April 1, 2014

Writing Efficient Parse Queries

Any time you store objects on Parse, they get stored on a database. Usually, a database stores a bag of data on disk, and when you issue Parse queries it looks up the data and returns the matching objects. Since it doesn’t make sense for the database to look at all the data present in a particular Parse class for every query, the database uses something called an index. An index is a sorted list of items matching a given criteria. Indexes help because they allow the database to do an efficient search and return matching results without looking at all of the data. For the more algorithmically inclined, this is an O(log(n)) vs O(n) tradeoff. Indexes are typically smaller in size and fit in memory, resulting in faster lookups.

There are a couple of operations – $ne (Not Equal To) and $nin (Not In) which cannot be supported by indexes. When you run a query with a “Not Equal To” clause, the database has to look at all objects in the particular Parse class. Consider an example: we have a column “Color” in our Parse Class. The index helps us know which objects have “Color” blue, green and so on. When running the query:

 { "Color" : { "$ne" : "red" } }

The index is useless, as it can only tell which objects have a particular color. Instead, the database has to look at all the objects. In database language this is called a “Full Table Scan” and it causes your queries to get slower and slower as your database grows in size.

Luckily, most of the time a $ne query can be rewritten as a $in query. Instead of querying for the absence of values, you ask for values which match the rest of the column values. Doing this allows the database to use an index and your queries will be faster.

For example if the “User” object has a column called “State” which has values “SignedUp”, “Verified”, and “Invited”, the slow way to find all users who have used the app at least once would be to run the query:

 { "State" : { "$ne" : "Invited" } }

It would be faster to use the $in version of the query:

 { "State" : { "$in" : [ "SignedUp", "Verified" ] }

In a similar vein a $nin query is bad, too, as it also can’t use an index. You should always try to use the complimentary $in query. Building on the above example, if the “State” column had one more value, “Blocked”, to represent blocked users, a slow query to find active users would be:

 { "State" : { "$nin" : [ "Blocked", "Shadow" ] } }

Using a complimentary $in query will be always be faster:

 { "State" : { "$in" : [ "Active", "Verified" ] } }

We’re working hard here at Parse so that you don’t have to worry about indexes and databases, and knowing some of the tradeoffs in using the $ne, $nin operators goes a long way in making your app faster.

Abhishek
March 27, 2014

MongoDB 2.6 Production Release Highlights

The Parse platform relies heavily on MongoDB. We use MongoDB for a variety of workloads, such as routing and storing application data, monitoring, real-time API request performance analysis, and billing and backend platform analytics. We love the flexibility and richness of the feature set that MongoDB provides, as well as the rock-solid high availability and primary elections.

So we are incredibly excited about the upcoming MongoDB 2.6.0 production release. You can read the full release notes here, but I’d like to highlight a few of the changes and features we are most excited about.

First and foremost, the index build enhancements. You will now be able to build background indexes on secondaries! For those who are unfamiliar with the way mongo performs indexing, the default behavior is to build indexes in the foreground on both primaries and secondaries. Foreground indexing means the index op will grab the global lock and no other database operations will be able to execute until the index has finished building (this includes killing the index build). Obviously this is not a reasonable option for those of us who ensure indexes are built in the normal request path. You can instruct the primary to build indexes in the background, which lets the indexing operation yield to other ops, but until now there has been no similar functionality on the secondaries. Parse makes extensive use of read queries to secondaries for things like our push services, and we have had to implement a complicated set of health checks to verify that secondaries are not locked and indexing while we try to read from them. Background indexes on secondaries will make this process much simpler and more robust.

Another terrific indexing improvement is the ability to resume interrupted index builds.

We are also very excited about a number of query planner improvements. The query planner has been substantially rewritten in 2.6.0, and we are eager to take it for a test drive. Mongo has also now implemented index intersection, which allows the query planner to use more than one index when planning a query.

The explain() function has been beefed up and they have added a whole suite of methods for introspecting your query plan cache. In the past we have often been somewhat frustrated trying to infer which query plan is being used and why there are execution differences between replica set members so it is great to have these decisions exposed directly.

Another interesting change is that PowerOf2Sizes is now the default storage allocation strategy. I imagine this is somewhat controversial, but I think it’s the right call. PowerOf2 uses more disk space, but is far more resistant to fragmentation.  An ancillary benefit is that padding factors are no longer relevant. One issue we have had at Parse (that no one else in the world seems to have, to be fair) is that we cannot do initial syncs or repairDatabase() because it resets all the padding factors to 1.0. This causes all updates or writes to move bits around disk for weeks to come as the padding factors are relearned, which in turn hoses performance. The inability to do initial sync or repair means we have had no way of reclaiming space from the database.

The hard-coded maxConns limit is also being lifted. Previously your connection limit was set to 70% of ulimit or 20k connections, whichever is lower, but the hard cap is now gone.  This totally makes sense and I am glad it has been lifted. However you should still be wary of piling on tens of thousands of connections, because each connection uses 1mb of memory and you do not want to starve your working set of RAM.

Here’s another thing I missed the first dozen or so times I read through the release notes: QUERY RUNTIME LIMITS! MongoDB now lets you tag a cursor or a command with the maxTimeMS() method to limit the length of time a query is allowed to run. This is a thrilling change. Parse (and basically everyone else who runs mongo at scale) has a cron job that runs every minute and kills certain types of queries that have been running past their useful lifespan (e.g. their querying connection has vanished) or are grabbing a certain type of lock and not yielding. If maxTimeMS() works as advertised, the days of the kill script may be gloriously numbered.

Ok, so those are the delicious goodie bits. Lastly let’s take a look at a painful but necessary change that I am afraid is going to take a lot of people by surprise: stricter enforcement of index key length. In all previous versions of mongo, it would allow you to insert a key with an indexed value larger than 1024 bytes, and it would simply warn you that the document would not be indexed. In 2.6 it will start rejecting those writes or updates by default. This is unquestionably the correct behavior, but will probably be very disruptive for a lot of mongo users when previously accepted writes start to break. They have added a flag to optionally preserve the old behavior, but all mongo users should be thinking about how to move their data sets to a place where this restriction is acceptable. The right answer here is probably some sort of prefix index or hashed index, depending on the individual workload.

There is a lot of exceptionally rich feature development and operational enhancements in this 2.6 release. We have been smoke testing the secondaries in production for some time now and we’re very much looking forward to upgrading our fleet. Be sure to check out the rest of the 2.6 release notes for more great stuff!

Charity Majors
March 25, 2014

Building Scalable Apps on Parse

At the SF Parse Meetup earlier this month, I talked with some of you about tips for making your Parse app faster and more reliable. Here is a recap of that presentation.

At Parse, we are constantly improving our infrastructure to make your app scale auto-magically. Our platform load balances your traffic, elastically provisions your app’s backend resources, and backs up your data on redundant storage. On top of our MongoDB datastore, we built an API layer that seamlessly integrates with our client-side SDKs. Our cloud infrastructure uses online learning algorithms to automatically rewrite inefficient queries and generate database indexes based on your app’s realtime query stream.

You can further improve your app’s performance with a few more simple tips. (Some links here are for Android, but we have analogous docs for other platforms too.)

  1. Cache whenever you can. You can turn on client-side query caching using the CachePolicy option.
  2. Use Cloud Code for post-processing. In Cloud Code, you can run custom JavaScript logic whenever Parse Objects are saved or deleted. This is great for validating data and modifying related objects (see the displaying counts and text search examples below). We also offer Background Jobs for running more time-consuming data migrations.
  3. Write restrictive queries. You should write queries that return only what the client actually needs. Returning a ton of unnecessary objects back to the client can make your app slower due to network traffic. Please review our Querying Guide to see whether you should add any constraints on your existing queries. You should also try to avoid non-restrictive query clauses such as whereNotEqualTo. If you are issuing queries on GeoPoints, make sure you specify a reasonable radius. In addition, you can limit the object fields returned with selectKeys so that the phone client doesn’t have to wait for large unrelated fields to download.
  4. Design your data model for efficient querying. Please check out our Relations Guide for details.

We also have a few recommendations for specific scenarios:

  1. Displaying counts on the UI. Suppose you are building a product catalog. You might want to display the count of products in each category on the top-level navigation screen. If you run a count query for each of these UI elements, they will not run efficiently on large data sets because MongoDB does not use counting B-trees. Instead, we recommend that you use a separate Parse Object to keep track of counts for each category. Whenever a product gets added or deleted, you can increment or decrement the counts in an afterSave or afterDelete Cloud Code handler.
  2. Text search. MongoDB is not efficient for doing partial string matching except for the special case where you only want a prefix match. For adding efficient text search to your app, please see our previous post.

We hope that these tips will help you build apps that are responsive and delightful to use. I also want to thank everyone who submitted feedback or questions because this talk was mainly based on the feedback from our wonderful developer community. So keep building awesome apps, and send us your thoughts!

Stanley Wang
December 19, 2013

MongoNYC and Masters Summit Recap

As many of you may already know, the Parse backend makes extensive use of a nonrelational document database called MongoDB. On June 20-21 I was thrilled to attend MongoDB Days in NYC, as well as my first MongoDB Masters’ summit.

The masters’ summit was phenomenal. The event drew 20 or 30 of the world’s most experienced MongoDB engineers of all sorts — CTOs, developers, evangelists, devops, and third-party contributors — to share their experiences and provide feedback on the product. There were breakout sessions on everything from from driver APIs to indexing to deployment best practices. We also got to preview some exciting new features.

The next day was the MongoNYC conference. This was actually my third MongoDB conference, and each one seems to get bigger and better than the last. I particularly enjoyed “High Performance, High Scale MongoDB on AWS: A Hands On Guide” from Miles Ward, a solutions architect at Amazon Web Services, and “MongoDB Hacks of Frustration”, from Leo Kim of Foursquare.

I also gave a talk on “Managing a Maturing MongoDB Ecosystem”, based on some of the lessons we have learned scaling MongoDB at Parse over the past year and a half. My talk covered things like managing multiple clusters with Chef, dealing with fragmentation and compaction on aging data, configuring for high availability on AWS, and other operational issues.
 
It was a privilege and a treat to meet so many of the fine folks at 10gen, and spend a few days hanging out and talking shop with other engineers who are running MongoDB at scale. We also came away with some ideas for holding great developer events that we are eager to put into practice for the Parse developer community. Can’t wait for the next one!
 
Charity Majors
July 10, 2013

Summary of the June 11th Service Disruption

We would like to share some information about the service disruption on Tuesday, June 11th. Around 8:00 A.M. PDT, one of our database clusters triggered an indexing bug in the underlying database layer. This data was replicated across four different machines; however, the bug triggered on three of the machines simultaneously. Service was unaffected at this point because our configuration allowed the primary to continue serving traffic.

However, this left the data in a dangerous state where continued operation would be too risky without our typical replication standards. In order to restore acceptable functionality, we began to take the live copy down and perform a snapshot to restore our working replica set members to health. We throttled access to the cluster at 11:30 A.M., which disabled service for a small percentage of our users. Unfortunately, our snapshotting mechanism failed due to a low-level hardware issue that we are still investigating. This caused the recovery procedure to take more time than initially estimated. As a failover recovery mechanism, we provisioned new volumes and performed an rsync of the data, then reattached those volumes to a secondary node. Once data redundancy was restored, we turned back on access to the cluster. Full access was restored to all apps around 4:45 PM.

In response to this incident, we have several planned improvements to improve future performance in similar incidents:

  • We are working to improve our process to allow recovery from multiple database failures without downtime by isolating failed operations.

  • We are developing a mechanism for notifying individual developers of downtime incidents that impact their applications.

  • We are investigating the source of the initial database bug that led to failures in secondaries.

  • We are investigating the source of the underlying snapshot failure that delayed our first recovery efforts.

  • We are working on a monitoring tool to observe recovery progress to enable faster decision making when multiple recovery methods are possible.

We apologize for the outage. We take application reliability very seriously and will strive to apply the lessons learned from this outage to enhance future reliability.

 

Charity Majors
June 12, 2013

Summary of the May 23rd Parse Service Disruption

We would like to share some more details about the service disruption we experienced on Thursday, May 23rd, and the steps we are taking to mitigate this sort of issue in the future.

At around 4:46 P.M. PDT we experienced a surge of elevated error rates. We tracked this down to an underlying hardware event on the primary node for our application routing data cluster, which caused the kernel to kill the software RAID controller and sent the database primary into an indeterminate state. This also triggered a rare client side driver bug that caused all our application layer services to hang. We failed over to a secondary node and restarted all of the services, and full service was restored by 5:29 P.M.

We are taking the following steps to ensure that we can recover more quickly from similar hardware events in the future:

  • Let the application services read routing data from secondaries, so they do not depend on the availablity of the primary
  • Better detection for when the mongo client drivers need to be reinitialized in the application code
  • Continuous warmup on all secondary nodes, so that failover is consistently fast

We apologize for this outage. We are working very hard to improve the resiliency of our platform, and we know that outages are extremely disruptive to our customers. If you have any questions, please reach out to us through our Help & Community portal.

Charity Majors
May 24, 2013

Versioning and Verification for your AWS Security Group Configuration

We strive to keep all aspects of our infrastructure managed by code instead of people; all of our servers are managed by Chef, Jenkins builds and tests our code, and so on. Additionally, we work to verify that all these things are behaving as expected – though Chef configures a service to run on a server, our monitoring server will verify that that service is, in fact, running. We wanted the same verification to make sure our AWS Security Groups are configured correctly and no unexpected changes have occurred.

This verification process is managed by the same service that watches all our other services for problems: Nagios. We check in a copy of our Security Group configuration to git and ask Nagios to verify the state of the live Security Groups against the checked-in version. If there is any discrepancy, Nagios will page us and alert us to the rule(s) that has/have been added or removed. The extra content field in the alert contains an easy-to-read statement about what changed. For example, “ec2 has an extra rule in the web security group: tcp/80-80 is allowed from 0.0.0.0.”

Beyond allowing us to use our existing code review processes for peer review, keeping the configuration in git provides us with an added benefit: we can add comments to the file with more detail about the reason behind a specific rule. No more will you have to remember that port 5666 is for NRPE; you can write a comment to that effect right there in the configuration file. While this isn’t too big a deal for standard services, it’s a huge benefit when adding custom or obscure services.

Since the AWS API will give you a dump of your Security Groups in JSON, it seemed like the easiest format to use for the configuration file in git. Unfortunately, JSON does not allow for comments (though it is lenient in differences in whitespace). So that we can have comments, the Nagios check will strip any line that matches /^ *#/ (any line that begins with a # character) before running it through the JSON parser.

To get started using this check,

  1. Grab a copy of check_aws_secgroups from our Ops repo on github. The check requires Python and the Boto library to talk to AWS. Run it once with -h to see the various flags to configure file locations.
  2. Set up an automated git checkout on your Nagios server. You’ll need to create an ssh key and a user on github that has read-only access.
  3. Configure it by putting your AWS credentials in /etc/nagios3/conf.d/aws.yml.

    :access_key_id: ABCDEFABCDEF0123456
    :secret_access_key: abcdef0123456789ABCDEF0123456789

  4. Get an initial dump of your Security Groups from EC2: check_aws_secgroups --dump-ec2 > /tmp/serialized_rules.json. You’ll probably want to rearrange and reformat the file to maximize readability, as well as add initial comments before checking this into git.
  5. Configure the Nagios check appropriately..

Getting an automated git checkout was a little harder than expected; there are a few variables that make it easier.

Here is the crontab entry:

*/10 * * * * export GIT_SSH=/var/lib/git_checkout/.ssh/git_ssh_wrapper.sh; export GIT_DIR=/var/lib/git_checkout/chef-repo/.git; export GIT_WORK_TREE=/var/lib/git_checkout/chef-repo; git fetch -q && git fetch –tags -q && git reset -q –hard origin/master

The git_ssh_wrapper.sh makes sure ssh won’t balk at github’s host key and gives the path to your read-only user’s ssh key.

#!/usr/bin/env bash
/usr/bin/env ssh -o “StrictHostKeyChecking=no” -i “/var/lib/git_checkout/.ssh/id_rsa” “$@”

Enjoy your deeper sleep, as you rest assured that your Security Groups really are what you know they should be!

Ben Hartshorne
April 29, 2013

Always Be Compacting

Running a big MongoDB installation requires a certain amount of routine maintenance. Over time, collections in a MongoDB database can become fragmented. This can be a particularly serious problem if your data usage patterns are relatively unstructured. In the long run, this can result in your databases taking up more space on disk and in RAM to hold the same amount of data, it can make many database operations noticeably slower, and it can reduce your overall query capacity significantly.

Conveniently, MongoDB provides 2 different ways to compact your data and restore optimal performance: repairDatabase and compact. RepairDatabase is appropriate if your databases are relatively small, or you can afford to take a node out of rotation for quite a long time. For our database sizes and query workload, it made more sense to run continuous compaction over all our collections.

To do this we wrote a small utility script to help us compact all our databases incrementally. We run this utility on a secondary node in our replicaset, and once it’s compacted everything, we can rotate that node in to be the primary node with minimal downtime. We also have this secondary node configured as our snapshot backup host, so if we ever need to reconstruct nodes from snapshots, the new nodes are as freshly compacted as possible.

Here’s how it works: it first fetches the list of all the databases in your replicaset, and then lists of all the collections in each database. It then goes through all of these collections and runs the compact command on each one. This is a blocking operation that puts the database into RECOVERY mode, so after each collection, it checks to see if replication has fallen too far behind, and if so, it waits for replication to catch up before resuming. If it’s interrupted or encounters an error, it saves the list of collections remaining to a file and then prints out instructions for how to resume it.

Here’s how to use it. To compact everything on the localhost mongo instance, you run it with no arguments (note that if you run this on the primary node, it will silently do nothing):

./mongo_compact.rb

To run it on a particular set of your databases (comma separated) you specify them with the -d option:

./mongo_compact.rb -d userdata1,userdata2,userdata3

To run it from cron, you use the -c option, and it will automatically save and resume its collection list in /var/run/mongo_compact/ and check if it’s already running using a .pid file in the same directory.

./mongo_compact.rb -c

You can see a full list of options with –help:

./mongo_compact.rb --help

You can find it in our public git repo for ops utilities: mongo_compact.rb

We hope you find this useful! If you want to hear more of these kind of tips, we’ll be sharing more of our tools and best practices at MongoSF on May 10th.

Brad Kittenbrink
March 26, 2013

Summary of the March 21st Parse Service Disruption

We would like to share more information about the service disruption on Thursday, March 21st, and the steps we are taking to prevent this sort of issue from happening again. We take reliability for our customers very seriously, and are always working to improve stability and uptime.

The outage began at 1:22 P.M. PDT, when all of the secondaries for one of our database clusters went into a crash loop. Upon investigation, we found that we had triggered a rare database bug where there was a corruption in the operation log used to synchronize from primary to secondary nodes. Our primary node was still alive, but we could not form a quorum to serve traffic from it. Unfortunately, our application routing logic had an incorrect dependency on the data stored in this database. This led to service disruption even for applications on separated clusters.

We were able to restore secondary nodes from snapshot to serve as arbiters and bring the cluster back up by 2:15 P.M., but we were then forced to snapshot off the primary to restore the cluster to a stable state. Service was intermittent for most users until approximately 5:00 P.M. The last few apps were restored to full functionality at 5:40 P.M.

In response to this incident, we have taken the following steps:

  • We have deployed a fix that will prevent us from triggering this database bug.
  • We are adding arbiters across multiple AWS availability zones so we do not need to rely on secondaries to elect a primary.
  • We are building increased isolation of services into our stack, so that one failing cluster will not affect our enterprise customers or customers hosted on other clusters.
  • We are developing custom database utilities that will allow us to manually skip or override corrupt operations.

We apologize for the outage. We know that platform downtime is highly disruptive to our customers. Our entire team is working very hard to increase reliability and prevent such an outage from happening again. If you have any questions, please reach out to us through our Help & Community portal.

Charity Majors
March 22, 2013