Changing Bits: 2X faster PhraseQuery with Lucene using C++ via JNI

Saturday, June 22, 2013

2X faster PhraseQuery with Lucene using C++ via JNI

I recently described the new lucene-c-boost github project, which provides amazing speedups (up to 7.8X faster) for common Lucene query types using specialized C++ implementations via JNI.

The code works with a stock Lucene 4.3.0 JAR and default codec, and has a trivial API: just call NativeSearch.search instead of IndexSearcher.search.

Now, a quick update: I've optimized PhraseQuery now as well:

Task	QPS base	StdDev base	QPS opt	StdDev opt	% change
HighPhrase	3.5	(2.7%)	6.5	(0.4%)	1.9 X
MedPhrase	27.1	(1.4%)	51.9	(0.3%)	1.9 X
LowPhrase	7.6	(1.7%)	16.4	(0.3%)	2.2 X

~2X speedup (~90% - ~119%) is nice!

Again, it's great to see a reduced variance on the runtimes since hotspot is mostly not an issue. It's odd that LowPhrase gets slower QPS than MedPhrase: these queries look mis-labelled (I see the LowPhrase queries getting more hits than MedPhrase!).

All changes have been pushed to lucene-c-boost; next I'd like to figure out how to get facets working.

4 comments:

Simon ReavelyJuly 22, 2013 at 9:56 PM
Hey Mike, interesting

Out of interest, do you have any theories about your why the Java code is so much slower?
Have you learnt anything about C++ versus Java optimization here?

Cheers,
Simon
ReplyDelete
Replies

Add comment