The code works with a stock Lucene 4.3.0 JAR and default codec, and has a trivial API: just call
NativeSearch.search
instead
of IndexSearcher.search
.
Now, a quick update: I've optimized
PhraseQuery
now as
well:
Task | QPS base | StdDev base | QPS opt | StdDev opt | % change |
---|---|---|---|---|---|
HighPhrase | 3.5 | (2.7%) | 6.5 | (0.4%) | 1.9 X |
MedPhrase | 27.1 | (1.4%) | 51.9 | (0.3%) | 1.9 X |
LowPhrase | 7.6 | (1.7%) | 16.4 | (0.3%) | 2.2 X |
~2X speedup (~90% - ~119%) is nice!
Again, it's great to see a reduced variance on the runtimes since hotspot is mostly not an issue. It's odd that
LowPhrase
gets slower QPS
than MedPhrase
: these queries look mis-labelled (I
see the LowPhrase
queries getting more hits than MedPhrase
!).
All changes have been pushed to lucene-c-boost; next I'd like to figure out how to get facets working.
Hey Mike, interesting
ReplyDeleteOut of interest, do you have any theories about your why the Java code is so much slower?
Have you learnt anything about C++ versus Java optimization here?
Cheers,
Simon
I suspect most of the gains are from specializing/hardwiring the code to a specific query, collector, etc., but I haven't done the obvious test (create the same specialized code in Java instead of C)...
DeleteThis is not surprising. Even though Java's performance is close to that of C++, it seems that there is still about a 1.5-2x difference
DeleteSee, e.g.:
http://benchmarksgame.alioth.debian.org/u32/java.php
Thanks for sharing that link Itman.
Delete