Changing Bits: Lucene's RAM usage for searching

Monday, July 5, 2010

Lucene's RAM usage for searching

For fast searching, Lucene loads certain data structures entirely into RAM:

The terms dict index requires substantial RAM per indexed term (by default, every 128th unique term), and is loaded when IndexReader is created. This can be a very large amount of RAM for indexes that have an unusually high number of unique terms; to reduce this, you can pass a terms index divisor when opening the reader. For example, passing 2, which loads only every other indexed term, halves the RAM required. But, in tradeoff, seeking to a given term, which is required once for every TermQuery, will become slower as Lucene must do twice as much scanning (on average) to find the term.
Field cache, which is used under-the-hood when you sort by a field, takes some amount of per-document RAM depending on the field type (String is by far the worst). This is loaded the first time you sort on that field.
Norms, which encode the a-priori document boost computed at indexing time, including length normalization and any boosting the app does, consume 1 byte per field X document used for searching. For example, if your app searches 3 different fields, such as body, title and abstract, then that requires 3 bytes of RAM, per document. These are loaded on-demand the first time that field is searched.
Deletions, if present, consume 1 bit per doc, created during IndexReader construction.

Warming a reader is necessary because of the data structures that are initialized lazily (norms, FieldCache). It's also useful to pre-populate the OS's IO cache with those pages that cover the frequent terms you're searching on.

With flexible indexing, available in Lucene's trunk (4.0-dev), we've made great progress on reducing the RAM required for both the terms dict index and the String index field cache (some details here). We have substantially reduced the number of objects created for these RAM resident data structures, and switched to representing all character data as UTF8, not java's char, which halves the RAM required when the character data is simple ascii.

So, I ran a quick check against a real index, created from the first 5 million documents from the Wikipedia database export. The index has a single segment with no deletions. I initialize a searcher, and then load norms for the body field, and populate the FieldCache for sorting by the title field, using JRE 1.6, 64bit:

3.1-dev requires 674 MB of RAM
4.0-dev requires 179 MB of RAM

That's a 73% reduction on RAM required!

However, there seems to be some performance loss when sorting by a String field, which we are still tracking down.

Note that modern OSs will happily swap out RAM from a process, in order to increase the IO cache. This is rather silly: Lucene loads these specific structures into RAM because we know we will need to randomly access them, a great many times. Other structures, like the postings data, we know we will sweep sequentially once per search, so it's less important that these structures be in RAM. When the OS swaps our RAM out in favor of IO cache, it's reversing this careful separation!

This will of course cause disastrous search latency for Lucene, since many page faults may be incurred on running a given search. On Linux, you can fix this by tuning swappiness down to 0, which I try to do on every Linux computer I touch (most Linux distros default this to a highish number). Windows also has a checkbox, under My Computer -> Properties -> Advanced -> Performance Settings -> Advanced -> Memory Usage, that lets you favor Programs or System Cache, that's likely doing something similar.

12 comments:

Gili NachumMarch 11, 2013 at 11:51 AM
FS Cache Vs Java heap - Would have been nice if the swappiness factor could have been provided per process. Though if you're running a dedicated Search server then this is less of a problem.
ReplyDelete
Replies
Michael McCandlessMarch 11, 2013 at 2:16 PM
Hi Gili,

Maybe there is some way, but I don't know about it! That sure would be nice.
ReplyDelete
Replies
kbrosApril 27, 2013 at 3:21 PM
In case I use MMapDirectory, should I worry about the O.S swappiness?
ReplyDelete
Replies
Michael McCandlessApril 28, 2013 at 8:10 AM
Hi kbros,

You should stil worry about OS swappiness even when using MMapDir, if you care about search latency.

Lucene loads certain structures into RAM (deleted docs, norms, terms index, field cache / doc values) and if the OS swaps those out it will cause latency spikes in your searching.

I turn it off (set swappiness to 0) on every Linux box I touch... swapping is a poor abstraction.
ReplyDelete
Replies
AshishMay 17, 2013 at 7:42 PM
hi Mike, So when you say -"The terms dict index requires substantial RAM per indexed term". Does RAM indicate heap memory ? Or are you referring to non-heap memory?
ReplyDelete
Replies
SwamiFebruary 13, 2014 at 7:25 AM
Hi Michael the method you provide for turning down the IO Caching of the OS for windows doesn't seem to exist starting Windows 2008 upwards. The current option on Windows 2008 upwards is only the ->System Properties -> Advanced -> Performance -> Advanced -> Adjust for Performance of "Programs/ Background Services" which actually would only control the processor scheduling.

However Microsoft does seem to provide a Dynamic Cache Service available for download for Windows 2008 http://support.microsoft.com/kb/976618 & for Windows 2008 R2 it can be got only via a MSDN ticket.
ReplyDelete
Replies
Michael McCandlessFebruary 13, 2014 at 8:10 AM
Thanks Swami; it's spooky that it's getting harder in Windows to have it NOT swap out your process's RAM.
ReplyDelete
Replies
Murali Krishna POctober 25, 2016 at 7:17 AM
Hi Michael, looks like terms index divisor is no longer supported. Is there some other way of controlling what is loaded into memory?
ReplyDelete
Replies

Add comment