By far the hottest spot in Lucene during searching is the method (DataInput.readVInt) that decodes Lucene's variable-length integer representation (vInt). This method is called an insane number of times, while iterating the postings lists (docs, freqs, positions), during query execution.
This representation is not CPU friendly: for every single byte read, the CPU hits a hard-to-predict if statement (testing the high bit). Alternative block-based formats do a good job reducing branches while decoding, such as these encodings (PFOR-DELTA is currently an experimental codec attached as a patch on LUCENE-1410) or Google's Group Varint.
Anyway, I decided to test whether porting readVInt to C would gives us a performance boost. I had known the overhead of JNI was highish in the past, but I was hoping in modern JREs this had been improved.
Unfortunately, it wasn't: I see a ~10%-~40% slowdown, depending on the query. Likely hotspot's inability to inline native methods is also a factor here. Perhaps, if/when we switch to a block-based codec, we'll see a gain with a native implementation, since the amortized overhead of invoking JNI will be lower.