神刀安全网

MongoDB queries don’t always return all matching documents

MongoDB queries don’t always return all matching documents!

When I query a database, I generally expect that it will return all the results that match my query. Recently, I was surprised to find that this isn’t always the case for MongoDB. Specifically, if a document is updated while the query is running, MongoDB may not return it from the query — even if it matches both before and after the update! If you use MongoDB, you should be aware of this subtle edge case and make sure your queries don’t fall victim to it.

Discovering the problem

These days, my main job is building the backend of the Meteor Galaxy hosting service . We store a lot of data in our MongoDB databases, including the state of all of the containers we’ve run. Containers have a variety of states, like “starting”, “healthy”, “unhealthy”, and “stopped”.

One of our services periodically polls the database and reads the list of running containers with the query

containers.find({state: {$in: ['healthy', 'unhealthy']}})

Running containers can flap back and forth between “healthy” and “unhealthy”, but once they get changed to some other state like “stopped”, they should never return to “healthy” or “unhealthy”. So, if a container that was returned from one iteration of this query later disappears from the query’s results, we should never see it re-appear again.

While investigating a bug in the service, I saw in the logs that occasionally (a few times a day), the service saw a container appear in the query’s results, disappear from the results when the query was run again, and then reappear on a third run. This was really surprising! I figured maybe a bug in the state-writing code broke my assumptions about legitimate state transitions.

One nice thing about MongoDB is that you can actually see the history of your database, by running a query on the oplog . I looked for changes to this container’s document in the oplog. First I figured out the approximate time that I cared about when a particular container vanished from the repeated query, in seconds since the January 1970 Unix epoch:

node> new Date("2016–03–011 07:22:53 GMT-0800").valueOf() / 1000
1457709773

Then I logged in to the “local” database in my MongoDB cluster and ran a query against the oplog, starting a minute earlier. (The “addOption(8)” is magic that means “this is the oplog collection and my query contains ‘ts’ so do a binary search even though there is no index” (it really is specific to a field called ‘ts’!). “batchSize(1)” means that the server will send each entry to the client as soon as it is found, which is helpful when running a slow query with few results.)

I saw only reasonable changes: the container went from “starting” to “healthy” and then occasionally flipped back and forth between “unhealthy” and “healthy”. So at every point after this document started matching the query, it should have continued to match the query! But… the logs showed that it did not. Specifically, around the time that the container flipped from “unhealthy” to “healthy” (1457709773), it failed to match the “{state: {$in: [‘healthy’, ‘unhealthy’]}}” query.

But why would MongoDB fail to return this document, which matched the query both before and after the update?

MongoDB: neither fish nor fowl

MongoDB occupies an interesting middle ground between systems like SQL databases and systems like key-value stores or Bigtable.

SQL databases offer powerful transactional guarantees and a query planner that can run queries against various user-defined indexes, but you tend to lose these guarantees when you shard data in order to scale.

In the pursuit of scalability, key-value stores and Bigtable don’t let you change arbitrary data in a single transaction. This generally means that they don’t have built-in indexes and query planning, and it’s the your job to structure data in a way that’s efficient for the queries you need to make.

MongoDB is somewhere in the middle. On the one hand, the basic unit of atomicity is the single document: you can make transactional writes to a document, but not across documents. On the other hand, MongoDB does support indexes, and has a query planner that knows how to use them.

MongoDB has a long document describing its concurrency properties . The basic gist of it is that you should only expect consistency at the single document level. So it’s not surprising that it provides “Non-point-in-time read operations” : if you modify a few documents while executing a slow query on their collection, you might see some of them in the state before the modification, and some of them as already modified.

What’s a little more surprising is this caveat:

Reads may miss matching documents that are updated during the course of the read operation.

Well, that seemed to be exactly what I saw. So what’s going on?

How MongoDB queries actually work

The MongoDB query planner is relatively straightforward. (I’m going to ignore things like geospatial and full-text indexes.) Most queries are handled by a single scan either over an entire collection or over a subset of an index. There’s no big lock taken out for the scan; it’s possible that during the scan, writes occur in the collection. Writes won’t happen while looking at a single document, however.

If we’re scanning over the whole collection, writes may change a document before we get to it, or they may not, but they’re not going to re-order the documents of the collection.

But scanning over an index works differently! The index is essentially a list of document IDs, sorted first by the actual index keys and then by the ID itself. If a document is updated in a way that affects an index key, it actually moves around in the index — that’s the whole point!

Let’s make this concrete. Our containers table had an index on the “state” field. What this means is that MongoDB maintains a list of all containers, sorted first by the “state” field and then by the container ID. To run the query:

containers.find({state: {$in: ['healthy', 'unhealthy']}})

MongoDB uses binary search on the index to find the beginning of the “healthy” section and walks down it, reading each healthy container ID and looking up the full container document in the main collection storage. When it gets to the end of the “healthy” section, it searches for “unhealthy” in the index and walks down that section as well.

But while these scans are occurring, writes can happen too! Let’s say we’ve made it partway through the “healthy” section and we’re looking at a healthy container with ID “XYZ”, when a write comes in changing the status of container “ABC” from “unhealthy” to “healthy”. This container matches the query both before and after the write. But when this change gets written, its index entry is moved from the “unhealthy” section to the “healthy” section… and in fact, to a part of that section that our scan has already passed!

MongoDB queries don’t always return all matching documents

So the scan won’t see “ABC” when it gets to the “unhealthy” section, because it’s not there any more, but it also won’t see it in the “healthy” section, because we’ve already passed its location in the index.

For our particular case, there was a relatively easy workaround. We denormalized by adding a second index boolean field, “up”, which is true if “state” is either “healthy” or “unhealthy”, and false otherwise. Instead of making this particularly query look at “state”, it now looks at “up”. These writes that bounce a container between “healthy” and “unhealthy” don’t touch “up”, so they can’t cause this problem.

We then went and audited our entire system to check every query that used an index to make sure it couldn’t fall into this particular trap. Fortunately, this was the only instance in our backend service.

Problems in the middle ground

I think this problem comes from MongoDB’s “middle ground” approach to being a database. If we used something like Bigtable that didn’t claim to have indexes or global transactions, we’d have to set up our own indexes for every query we care about optimizing. The fact that these writes could move the container around in the index table while another query was scanning it would be evident in our code, rather than hidden inside the database engine. If we used a more transactional database like a traditional SQL database, we wouldn’t have this problem — it would be solved by the “Isolation” part of the ACID guarantees.

You could certainly fix this issue in the current MongoDB model by creating dependencies between in-progress index scans and writes to that index. Any time a document is moved backwards in an index, you could check it against existing scans and see if it is relevant. That said, if you’ve already returned some of the documents to the querying client and the index is also used for the client’s specified sort order, you may not be able to return the “new value” of the moved document, but the “old value” may still be satisfactory.

This issue can also affect you even if your query doesn’t allow for multiple values of a field, if your index references multiple fields . For example, if your “people” collection has a compound index on “(country, city)” (and no index just on “country”) and you run:

people.find({country: "France"})

Then a write which changes a document from “country France, city Paris” to “country France, city Bordeaux” while the query is currently scanning “country France, city Nice” will miss that person.

Long story short…

  • This issue doesn’t affect queries that don’t use an index, such as queries that just look up a document by ID.
  • It doesn’t affect queries which explicitly do a single value equality match on all fields used in the index key.
  • It doesn’t affect queries that use indexes whose fields are never modified after the document is originally inserted.
  • But any other kind of MongoDB query can fail to include all the matching documents!

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » MongoDB queries don’t always return all matching documents

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址