Sunday, July 2, 2017

Lucene gets concurrent deletes and updates!

Long ago, Lucene could only use a single thread to write new segments to disk. The actual indexing of documents, which is the costly process of inverting incoming documents into in-memory segment data structures, could run with multiple threads, but back then, the process of writing those in-memory indices to Lucene segments was single threaded.

We fixed that, more than 6 years ago now, yielding big indexing throughput gains on concurrent hardware.

Today, hardware has only become even more concurrent, and we've finally done the same thing for processing deleted documents and updating doc values!

This change, in time for Lucene's next major release (7.0), shows a 53% indexing throughput speedup when updating whole documents, and a 7.4X - 8.6X speedup when updating doc values, on a private test corpus using highly concurrent hardware (an i3.16xlarge EC2 instance).

Buffering versus applying


When you ask Lucene's IndexWriter to delete a document, or update a document (which is an atomic delete and then add), or to update a doc-values field for a document, you pass it a Term, typically against a primary key field like id, that identifies which document to update. But IndexWriter does not perform the deletion right away. Instead, it buffers up all such deletions and updates, and only finally applies them in bulk once they are using too much RAM, or you refresh your near-real-time reader, or call commit, or a merge needs to kick off.

The process of resolving those terms to actual Lucene document ids is quite costly as Lucene must visit all segments and perform a primary key lookup for each term. Performing lookups in batches gains some efficiency because we sort the terms in unicode order so we can do a single sequential scan through each segment's terms dictionary and postings.

We have also optimized primary key lookups and the buffering of deletes and updates quite a bit over time, with issues like LUCENE-6161, LUCENE-2897, LUCENE-2680, LUCENE-3342. Our fast BlockTree terms dictionary can sometimes save a disk seek for each segment if it can tell from the finite state transducer terms index that the requested term cannot possibly exist in this segment.

Still, as fast as we have made this code, only one thread is allowed to run it at a time, and for update-heavy workloads, that one thread can become a major bottleneck. We've seen users asking about this in the past, because while the deletes are being resolved it looks as if IndexWriter is hung since nothing else is happening. The larger your indexing buffer the longer the hang.

Of course, if you are simply appending new documents to your Lucene index, never updating previously indexed documents, a common use-case these days with the broad adoption of Lucene for log analytics, then none of this matters to you!

Concurrency is hard


With this change, IndexWriter still buffers deletes and updates into packets, but whereas before, when each packet was also buffered for later single-threaded application, instead IndexWriter now immediately resolves the deletes and updates in that packet to the affected documents using the current indexing thread. So you gain as much concurrency as indexing threads you are sending through IndexWriter.

The change was overly difficult because of IndexWriter's terribly complex concurrency, a technical debt I am now convinced we need to address head-on by somehow refactoring IndexWriter. This class is challenging to implement since it must handle so many complex and costly concurrent operations: ongoing indexing, deletes and updates; refreshing new readers; writing new segment files; committing changes to disk; merging segments and adding indexes. There are numerous locks, not just IndexWriter's monitor lock, but also many other internal classes, that make it easy to accidentally trigger a deadlock today. Patches welcome!

The original change also led to some cryptic test failures thanks to our extensive randomized tests, which we are working through for 7.0.

That complex concurrency unfortunately prevented me from making the final step of deletes and updates fully concurent: writing the new segment files. This file writing takes the in-memory resolved doc ids and writes a new per-segment bitset, for deleted documents, or a whole new doc values column per field, for doc values updates.

This is typically a fast operation, except for large indices where a whole column of doc-values updates could be sizable. But since we must do this for every segment that has affected documents, doing this single threaded is definitely still a small bottleneck, so it would be nice, once we succeed in simplifying IndexWriter's concurrency, to also make our file writes concurrent.

[I work at Amazon and the postings on this site are my own and don't necessarily represent Amazon's position]

No comments:

Post a Comment