RAM has become very affordable recently, so for high-traffic sites the performance gains from holding the entire index in RAM should quickly pay for the up-front hardware cost.
The obvious approach is to load the index into Lucene's
RAMDirectory
, right?
Unfortunately, this class is known to put a heavy load on the garbage collector (GC): each file is naively held as a
List
of byte[1024]
fragments (there are open Jira issues to
address this but they haven't been committed yet). It also has unnecessary
synchronization. If the application is updating the index (not just
searching), another challenge is how to persist ongoing changes
from RAMDirectory
back to disk. Startup is much slower
as the index must first be loaded into RAM. Given these problems,
Lucene developers generally recommend using RAMDirectory
only for small indices or for testing purposes, and otherwise trusting
the operating system to manage RAM by using MMapDirectory
(see Uwe's
excellent post for more details).
While there are open issues to improve
RAMDirectory
(LUCENE-4123
and LUCENE-3659),
they haven't been committed and many users simply
use RAMDirectory
anyway.
Recently I heard about the Zing JVM, from Azul, which provides a pauseless garbage collector even for very large heaps. In theory the high GC load of
RAMDirectory
should not be a problem for
Zing. Let's test it! But first, a quick digression on the importance
of measuring search response time of all requests.
Search response time percentiles
Normally our performance testing scripts (luceneutil) measure the average response time, discarding outliers. We do this because we are typically testing an algorithmic change and want to see the effect of that change, ignoring confounding effects due to unrelated pauses from the the OS, IO systems, or GC, etc.
But for a real search application what matters is the total response time of every search request. Search is a fundamentally interactive process: the user sits and waits for the results to come back and then iterates from there. If even 1% of the searches take too long to respond that's a serious problem! Users are impatient and will quickly move on to your competitors.
So I made some improvements to luceneutil, including separating out a load testing client (sendTasks.py) that records the response time of all queries, as well as scripts (loadGraph.py and responseTimeGraph.py) to generate the resulting response-time graphs, and a top-level responseTimeTests.py to run a series of tests at increasing loads (queries/sec), automatically stopping once the load is clearly beyond the server's total capacity. As a nice side effect, this also determines the true capacity (max throughput) of the server rather than requiring an approximate measure by extrapolating from the average search latency.
Queries are sent according to a Poisson distribution, to better model the arrival times of real searches, and the client is thread-less so that if you are testing at 200 queries/sec and the server suddenly pauses for 5 seconds then there will be 1000 queries queued up once it wakes up again (this fixes an unfortunately common bug in load testers that dedicate one thread per simulated client).
The client can run (via password-less ssh) on a separate machine; this is important because if the server machine itself (not just the JVM) is experiencing system-wide pauses (e.g. due to heavy swapping) it can cause pauses in the load testing client which will skew the results. Ideally the client runs on an otherwise idle machine and experiences no pauses. The client even disables Python's cyclic garbage collector to further reduce the chance of pauses.
Wikipedia tests
To test Zing, I first indexed the full Wikipedia English database (as of 5/2/2012), totalling 28.78 GB plain text across 33.3 M 1 KB sized documents, including stored fields and term vectors, so I could highlight the hits using
FastVectorHighlighter
. The
resulting index was 78 GB. For each test, the server loads the entire
index into
RAMDirectory
and then the client sends the top 500
hardest (highest document frequency) TermQuery
s,
including stop words, to the server. At the same time, the server
re-indexes documents (updateDocument
) at the rate of
100/sec (~100 KB/sec), and reopens the searcher once per second.
Each test ran for an hour, discarding results for the first 5 minutes to allow for warmup. Max heap is 140 GB (-Xmx 140G). I also tested
MMapDirectory
, with max heap of 4 GB, as a baseline. The
machine has 32 cores (64 with hyper-threading) and 512 GB of RAM, and
the server ran with 40 threads.
I tested at varying load rates (queries/sec), from 50 (very easy) to 275 (too hard), and then plotted the resulting response time percentiles for different configurations. The default Oracle GC (Parallel) was clearly horribly slow (10s of seconds collection time) so I didn't include it. The experimental garbage first (G1) collector was even slower starting up (took 6 hours to load the index into
RAMDirectory
, vs 900 seconds for Concurrent
Mark/Sweep (CMS)), and then the queries hit > 100 second latencies, so
I also left it out (this was surprising as G1 is targeted towards
large heaps). The three configurations I did test were CMS at its
defaults settings, with MMapDirectory
as the baseline,
and both CMS and Zing with RAMDirectory
.
At the lightest load (50 QPS), Zing does a good job maintaining low worst-case response time, while CMS shows long worst case response times, even with
MMapDirectory
:
To see the net capacity of each configuration, I plotted the 99% response time, across different load rates:
From this it's clear that the peak throughput for
CMS +
MMap
was somewhere between 100 and 150 queries/sec, while
the RAMDirectory
based indices were somewhere between 225
and 250 queries/second. This is an impressive performance gain! It's
also interesting because in separately
testing RAMDirectory
vs MMapDirectory
I
usually only see minor gains when measuring average query latency.
Plotting the same graph, without
CMS + MMapDirectory
and
removing the 275 queries/second point (since it's over-capacity):
Zing remains incredibly flat at the 99% percentile, while CMS has high response times already at 100 QPS. At 225 queries/sec load, the highest near-capacity rate, for just CMS and Zing on
RAMDirectory
:
The pause times for CMS are worse than they were at 50 QPS: already at the 95% percentile the response times are too slow (4479 milli-seconds).
Zing works!
It's clear from these tests that Zing really has a very low-pause garbage collector, even at high loads, while managing a 140 GB max heap with 78 GB Lucene index loaded into
RAMDirectory
.
Furthermore, applications can expect a substantial increase in max
throughput (around 2X faster in this case) and need not fear
using RAMDirectory
even for large indices, if they can
run under Zing.
Note that Azul has just made the Zing JVM freely available to open-source developers, so now we all can run our own experiments, add Zing into the JVM rotation for our builds, etc.
Next I will test the new
DirectPostingsFormat
which holds all postings in RAM in simple arrays (no compression) for
fast search performance. It requires even more RAM than this test,
but gives even faster search performance!
Thank you to Azul for providing a beefy server machine and the Zing JVM to run these tests!
Stats for Zing + MMapDirectory would be very useful to those of us who need a persistent index.
ReplyDeleteI second Steve's request. The Zing looks mighty impressive but if it also speeds up MMapDirectory, then RAMDirectory stays a non-viable option for most large setups.
ReplyDeleteHi Steve, Toke,
ReplyDeleteI agree it'd be good to see results for Zing on MMapDir as well
... but I'm not sure I'll have time to run this any time soon! Feel
free to run your own tests too: Zing is free for open-source
development now.
Zing JVM uses too much RAM :(
ReplyDeleteKunnu,
ReplyDeleteIt uses as much RAM as you tell it to... like all other JVMs.
There is one difference though: Zing has a kernel module, and that kernel module grabs the RAM as soon as it's started, and then the RAM is held for Zing even if no Zing JVMs are running. But if that's a problem you can manually stop the kernel module (service zing-memory stop) when you're not using Zing.
Mike, I'm doing my best to figure out how/where to inject DirectPostingsFormat into my source code, but I'm new to Lucene. I've been unable to come up with a working sample online that demonstrates how to use this technique to improve search times. Here is where I'm at on my indexer code:
ReplyDeletevar analyzer = new NGramAnalyzer();
var index = new MMapDirectory(new File(@"..\..\..\PaperAuthors")); // var index = new RAMDirectory();
var config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer);
config.setCodec(new org.apache.lucene.codecs.memory.DirectPostingsFormat());
var indexWriter = new IndexWriter(index, config);
which does not compile because DirectPostingsFormat is not an acceptable codec.
Similarly, how would I inject DirectPostingsFormat into my searcher code which starts with:
var analyzer = new NGramAnalyzer();
var index = new MMapDirectory(new File(@"..\..\..\PaperAuthors")); // var index = new RAMDirectory();
var directoryReader = DirectoryReader.open(index);
var indexSearcher = new IndexSearcher(directoryReader);
A quick simple example would be greatly appreciated!
Hi Jay,
ReplyDeleteA PostingsFormat is just one part of a Codec; the simplest way to use DirectPF (or any other PF) for a given field is something like this:
Codec myCodec = new Lucene42Codec() {
@Override
public PostingsFormat getPostingsFormatPerField(String field) {
return PostingsFormat.forName("Direct");
}
};
You can customize that method to e.g. return a different format for each field ...
Thanks for the post Michael. Can you share all the JVM options you used for the various runs? Also what are your thoughts on using linux ram disk as an alternative to RAMDirector or MMapDirectory?
ReplyDeleteHi Johnny,
DeleteBesides setting max heap, and picking which GC (G1, CMS) to use, I left all other JVM options at their defaults.
I think Linux RAM disk doesn't make much sense, because if you use MMapDir, the OS will soak up any free RAM and act just like a RAM disk, and a RAM disk of course isn't persisted so you lose the index on reboot/unmount.
I have often wondered if some really bright person is going to create SmartRAM whereby you throw your search terms at a pipeline into the memory hardware.
ReplyDeleteI went over this site and I believe you have a lot of fantastic information, saved to favorites (:.
ReplyDeleteHi Michael,
ReplyDeleteI am involved in a project where we are trying to solve a problem very similar to the one you are describing in your blog post.
I am wondering if you would be available for a call/hangout/GTM to help us out.
Not sure what the proper way is for asking for help but we really would appreciate your guidance
Not sure how elso to contact you. Will also hunt for an email address.
Cheers, Helmut
Hi, sorry, I'm not available now; maybe try emailing the Lucene user's list (java-user@lucene.apache.org)?
Delete