Monday, September 26, 2011

Lucene's SearcherManager simplifies reopen with threads

Modern computers have wonderful hardware concurrency, within and across CPU cores, RAM and IO resources, which means your typical server-based search application should use multiple threads to fully utilize all resources.

For searching, this usually means you'll have one thread handle each search request, sharing a single IndexSearcher instance. This model is effective: the Lucene developers work hard to minimize internal locking in all Lucene classes. In fact, we recently removed thread contention during indexing (specifically, flushing), resulting in massive gains in indexing throughput on highly concurrent hardware.

Since IndexSearcher exposes a fixed, point-in-time view of the index, when you make changes to the index you'll need to reopen it. Fortunately, since version 2.9, Lucene has provided the IndexReader.reopen method to get a new reader reflecting the changes.

This operation is efficient: the new reader shares already warmed sub-readers in common with the old reader, so it only opens sub-readers for any newly created segments. This means reopen time is generally in proportion to how many changes you made; however, when a large merge had completed it will be longer. It's best to warm the new reader before putting it into production by running a set of "typical" searches for your application, so that Lucene performs one-time initialization for internal data structures (norms, field cache, etc.).

But how should you properly reopen, while search threads are still running and new searches are forever arriving? Your search application is popular, users are always searching and there's never a good time to switch! The core issue is that you must never close your old IndexReader while other threads are still using it for searching, otherwise those threads can easily hit cryptic exceptions that often mimic index corruption.

Lucene tries to detect that you've done this, and will throw a nice AlreadyClosedException, but we cannot guarantee that exception is thrown since we only check up front, when the search kicks off: if you close the reader when a search is already underway then all bets are off.

One simple approach would be to temporarily block all new searches and wait for all running searches to complete, and then close the old reader and switch to the new one. This is how janitors often clean a bathroom: they wait for all current users to finish and block new users with the all-too-familiar plastic yellow sign.

While the bathroom cleaning approach will work, it has an obviously serious drawback: during the cutover you are now forcing your users to wait, and that wait time could be long (the time for the slowest currently running search to finish).

A much better solution is to immediately direct new searches to the new reader, as soon as it's done warming, and then separately wait for the still-running searches against the old reader to complete. Once the very last search has finished with the old reader, close it.

This solution is fully concurrent: it has no locking whatsoever so searches are never blocked, as long as you use a separate thread to perform the reopen and warming. The time to reopen and warm the new reader has no impact on ongoing searches, except to the extent that reopen consumes CPU, RAM and IO resources to do its job (and, sometimes, this can in fact interfere with ongoing searches).

So how exactly do you implement this approach? The simplest way is to use the reference counting APIs already provided by IndexReader to track how many threads are currently using each searcher. Fortunately, as of Lucene 3.5.0, there will be a new contrib/misc utility class, SearcherManager, originally created as an example for Lucene in Action, 2nd edition, that does this for you! (LUCENE-3445 has the details.)

The class is easy to use. You first create it, by providing the Directory holding your index and a SearchWarmer instance:

  class MySearchWarmer implements SearchWarmer {
    @Override
    public void warm(IndexSearcher searcher) throws IOException {
      // Run some diverse searches, searching and sorting against all
      // fields that are used by your application
    }
  }

  Directory dir = FSDirectory.open(new File("/path/to/index"));
  SearcherManager mgr = new SearcherManager(dir,
                                            new MySearchWarmer());

Then, for each search request:

  IndexSearcher searcher = mgr.acquire();
  try {
    // Do your search, including loading any documents, etc.
  } finally {
    mgr.release(searcher);

    // Set to null to ensure we never again try to use
    // this searcher instance after releasing:
    searcher = null;
}

Be sure you fully consume searcher before releasing it! A common mistake is to release it yet later accidentally use it again to load stored documents, for rendering the search results for the current page.

Finally, you'll need to periodically call the maybeReopen method from a separate (ie, non-searching) thread. This method will reopen the reader, and only if there was actually a change will it cutover. If your application knows when changes have been committed to the index, you can reopen right after that. Otherwise, you can simply call maybeReopen every X seconds. When there has been no change to the index, the cost of maybeReopen is negligible, so calling it frequently is fine.

Beware the potentially high transient cost of reopen and warm! During reopen, as you must have two readers open until the old one can be closed, you should budget plenty of RAM in the computer and heap for the JVM, to comfortably handle the worst case when the two readers share no sub-readers (for example, after a full optimize) and thus consume 2X the RAM of a single reader. Otherwise you might hit a swap storm or OutOfMemoryError, effectively taking down entire whole search application. Worse, you won't see this problem early on: your first few hundred reopens could easily use only small amounts of added heap, but then suddenly on some unexpected reopen the cost is far higher. Reopening and warming is also generally IO intensive as the reader must load certain index data structures into memory.

Next time I'll describe another utility class, NRTManager, available since version 3.3.0, that you should use instead if your application uses Lucene's fast-turnaround near-real-time (NRT) search. This class solves the same problem (thread-safety during reopening) as SearcherManager but adds a fun twist as it gives you more specific control over which changes must be visible in the newly opened reader.

12 comments:

  1. Hi Mike,

    Nice blog! Is there an e-mail address I can contact you in private?

    ReplyDelete
  2. I wish there was something similar done about IndexReaders. In our app, we have noticed that after heavy usage of an IndexReader, it starts to bloat memory and leaves tens of thousands of open files on the disk (although those files do not exist any more, but handles are still there), leading to the infamous TooManyOpenFiles exception.

    The only solution for us is to close those readers and replace them with a new ones, and we are facing exactly the same problem you are describing in your article.

    The same might be a necessary thing to do if, for example, someone would want to control the memory usage. There can be situations where caching all of the terms is not desirable, and the API doesn't allow to adjust the cache capacity/ratio/etc.

    Unfortunately, Lucene API gives no public contract telling whether a reader is actively used hence we have to rely on closing it inside the finalizer :( And, needless to say, finalizers are the purest form of evil - especially when used for things like closing file handles. This is *explicitly* mentioned in Josh Bloch's "Effective Java".

    Is there anything preventing solving this problem in the way soft/weak references are addressed - that is, having some sort of stale IndexReader queue? Or maybe some another, better solution exist?

    ReplyDelete
  3. Hi mindas,

    Something sounds very very wrong in your setup: IndexReader doesn't hold open so many deleted files normally.

    I suspect you're not actually closing them (relying on finalizers to do so is very dangerous). SearcherManager should in fact solve your problem of tracking whether an IndexReader is still in use, I think?

    Can you describe your problem on the Lucene user's list? java-user@lucene.apache.org

    ReplyDelete
  4. Hi Mike,

    The issue with open deleted file handles is virtually impossible to reproduce. We have literally spent days trying to create a self-consistent test but unfortunately couldn't produce anything. If this helps, I can testify that lsof was showing ever-growing amount of handles (I think it went up to tens of thousands) for an index which had very frequent CRUD document rates and was of a few thousand items of size. No one else would be CRUD files in the Lucene index directory.

    You might not remember, but I have described parts of this problem about a year ago, and you even replied to my post :), see

    http://www.gossamer-threads.com/lists/lucene/java-user/107945#107945

    From your last reply I have got an impression that this is not a significant problem so I simply gave up.

    ReplyDelete
  5. Aha, now I remember that email!

    Unfortunately, the only real solution I see is to explicitly close your IndexReaders (the try/finally approach). Relying on finalizers to do so is dangerous.

    If you don't close IndexReaders in a timely way yourself then you'll see deleted-but-still-open files accumulate.

    if you do close your IndexReaders in a timely way yet still see deleted-but-open-files accumulate then it's possible there's a bug in Lucene's file handling, in which case you should re-raise this on the list / open an issue / etc.

    ReplyDelete
  6. in a non-realtime setting, we open a thread to close our obsolete index readers almost exactly like the way described here for searchers.

    We have a TimeLimitingCollector based implementation of search, and disallow rogue searches that take more than a few seconds.

    So for us things are easy: open a new index reader whenever we are "told" to reload an index, forget (close references to) any Searchers that are composed of the old index reader, and immediately direct all new requests to the new reader (for which a new searcher is created since now none exists in the cache).

    In the meanwhile, we remembered the readers that had to be closed, and started a "GrimReaper" thread that will simply wait for a specified time (about equal to the search timeout defined via the timelimiting collector), and then close them.

    of course finalize should not be used for this, because of the indeterminacy of gc.

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. Hey Mike,

    First of all my apology to post question here. but could not find way to connect you.

    I have posted in past one question related zoie and lucene. Finally I found something to share. It seems problem is on lucene. I am using Lucene 3.5.

    Here is my problem:
    We have Quartz scheduler job to create index. First time index Quartz job runs, it deletes existing index files first. Indexwriter writing files successfully it run successfully.

    Next time the Quartz job runs, it does not delete content of index files. As show below stack, IndexWriter acquire lock on index directory.
    However IndexWriter.copySegmentAsIs generates the identical file name to add index directory, which in turns tries to delete by method FSDirectory.java void ensureCanWrite(String name) throws which does not allow since current index writer has already put the file write.lock in this directory.

    IndexWriter.copySegmentAsIs(SegmentInfo, String, Map, Set) line: 3320
    IndexWriter.addIndexes(Directory...) line: 3159
    DiskSearchIndex(BaseSearchIndex).loadFromIndex(BaseSearchIndex) line: 253
    DiskLuceneIndexDataLoader(LuceneIndexDataLoader).loadFromIndex(RAMSearchIndex) line: 251
    DiskLuceneIndexDataLoader.loadFromIndex(RAMSearchIndex) line: 140
    RealtimeIndexDataLoader.processBatch() line: 182
    BatchedIndexDataLoader$LoaderThread.run() line: 394


    FSDirectory.java (Lucene package)

    protected void ensureCanWrite(String name) throws IOException

    throws the exception. I seems that it is failing at file.delete because of writer.lock file present in that index directory.

    if (file.exists() && !file.delete()) // delete existing, if any
    throw new IOException("Cannot overwrite: " + file);

    Note: We have multiple threads hitting to ZoieSystem.consume(...) method.
    We executes the query in multi thread environment, which in turns create DataEvent object and pass to this consume method. so-essentially we are keep passing the DataEvent object to consume method until we finish our index process.


    Any pointer will be greatly appreciated.

    Best Wishes,
    Brij

    ReplyDelete
  9. Hi In Search Of,

    Can you send an email to the java-user@lucene.apache.org list? (You should also subscribe to it so you see your response) ... that's the best way to ask Lucene questions.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Thanks Mike for pointer.

      Best Regards,
      Brij

      Delete