Changing Bits: 265% indexing speedup with Lucene's concurrent flushing

Saturday, May 7, 2011

265% indexing speedup with Lucene's concurrent flushing

A week ago, I described the nightly benchmarks we use to catch any unexpected slowdowns in Lucene's performance. Back then the graphs were rather boring (a good thing), but, not anymore! Have a look at the stunning jumps in Lucene's indexing rate:

(Click through the image to see details about what changed on dates A, B, C and D).

Previously we were around 102 GB of plain text per hour, and now it's about 270 GB/hour. That's a 265% jump! Lucene now indexes all of Wikipedia's 23.2 GB (English) export in 5 minutes and 10 seconds.

How did this happen? Concurrent flushing.

That new feature, having lived on a branch for quite some time, undergoing many fun iterations, was finally merged back to trunk about a week ago.

Before concurrent flushing, whenever IndexWriter needed to flush a new segment, it would stop all indexing threads and hijack one thread to perform the rather compute intensive flush. This was a nasty bottleneck on computers with highly concurrent hardware; flushing was inherently single threaded. I previously described the problem here.

But with concurrent flushing, each thread freely flushes its own segment even while other threads continue indexing. No more bottleneck!

Note that there are two separate jumps in the graph. The first jump, the day concurrent flushing landed (labelled as B on the graph), shows the improvement while using only 6 threads and 512 MB RAM buffer during indexing. Those settings resulted in the fastest indexing rate before concurrent flushing.

The second jump (labelled as D on the graph) happened when I increased the indexing threads to 20 and dropped the RAM buffer to 350 MB, giving the fastest indexing rate after concurrent flushing.

One nice side effect of concurrent flushing is that you can now use RAM buffers well over 2.1 GB, as long as you use multiple threads. Curiously, I found that larger RAM buffers slow down overall indexing rate. This might be because of the discontinuity when closing IndexWriter, when we must wait for all the RAM buffers to be written to disk. It would be better to measure steady state indexing rate, while indexing an effectively infinite content source, and ignoring the startup and ending transients; I suspect if I measured that instead, we'd see gains from larger RAM buffers, but this is just speculation at this point.

There were some very challenging changes required to make concurrent flushing work, especially around how IndexWriter handles buffered deletes. Simon Willnauer does a great job describing these changes here and here. Concurrency is tricky!

Remember this change only helps you if you have concurrent hardware, you use enough threads for indexing and there's no other bottleneck (for example, in the content source that provides the documents). Also, if your IO system can't keep up then it will bottleneck your CPU concurrency. The nightly benchmark runs on a computer with 12 real (24 with hyperthreading) cores and a fast (OCZ Vertex 3) solid-state disk. Finally, this feature is not yet released: it was committed to Lucene's trunk, which will eventually be released as 4.0.

13 comments:

KristianMay 28, 2011 at 9:22 AM
Wow! Amazing job on this one. I once had to index 6MM document and had a goal to make it happen in less than 10 minutes for 14GB of data. While running solr, I saw the same problem and it was the single thing that prevented me from having a single process hit my goal.

I'm thrilled to check this out - thanks.
ReplyDelete
Replies
ElishaJune 28, 2012 at 10:14 AM
That sounds great - in which lucene version was this feature developed?
ReplyDelete
Replies
Michael McCandlessJune 30, 2012 at 12:36 PM
Hi Elisha,

This is in the upcoming Lucene 4.0 .. the alpha release should be out any day now!
ReplyDelete
Replies
AnonymousAugust 1, 2012 at 4:37 AM
hi, i wonder if we can configure the number of indexing threads through solr4 ?
also would you mind explaining more on how RAM buffer affects the indexing rate? many thanks!
ReplyDelete
Replies
Michael McCandlessAugust 1, 2012 at 5:53 AM
Hi, please ask those questions on the solr-user@lucene.apache.org list. Thanks.
ReplyDelete
Replies
UnknownJune 27, 2014 at 1:13 AM
do u mean mutipul interWriter write to the same index path concurrently ?
ReplyDelete
Replies
UnknownJuly 6, 2014 at 8:47 AM
This comment has been removed by the author.
ReplyDelete
Replies
BDD_1970June 11, 2015 at 11:22 AM
Thanks for the work Michael, this was very good to know since I am now working in Petabytes of data.....
ReplyDelete
Replies

Add comment