Monday, September 26, 2011

Lucene's SearcherManager simplifies reopen with threads

Modern computers have wonderful hardware concurrency, within and across CPU cores, RAM and IO resources, which means your typical server-based search application should use multiple threads to fully utilize all resources.

For searching, this usually means you'll have one thread handle each search request, sharing a single IndexSearcher instance. This model is effective: the Lucene developers work hard to minimize internal locking in all Lucene classes. In fact, we recently removed thread contention during indexing (specifically, flushing), resulting in massive gains in indexing throughput on highly concurrent hardware.

Since IndexSearcher exposes a fixed, point-in-time view of the index, when you make changes to the index you'll need to reopen it. Fortunately, since version 2.9, Lucene has provided the IndexReader.reopen method to get a new reader reflecting the changes.

This operation is efficient: the new reader shares already warmed sub-readers in common with the old reader, so it only opens sub-readers for any newly created segments. This means reopen time is generally in proportion to how many changes you made; however, when a large merge had completed it will be longer. It's best to warm the new reader before putting it into production by running a set of "typical" searches for your application, so that Lucene performs one-time initialization for internal data structures (norms, field cache, etc.).

But how should you properly reopen, while search threads are still running and new searches are forever arriving? Your search application is popular, users are always searching and there's never a good time to switch! The core issue is that you must never close your old IndexReader while other threads are still using it for searching, otherwise those threads can easily hit cryptic exceptions that often mimic index corruption.

Lucene tries to detect that you've done this, and will throw a nice AlreadyClosedException, but we cannot guarantee that exception is thrown since we only check up front, when the search kicks off: if you close the reader when a search is already underway then all bets are off.

One simple approach would be to temporarily block all new searches and wait for all running searches to complete, and then close the old reader and switch to the new one. This is how janitors often clean a bathroom: they wait for all current users to finish and block new users with the all-too-familiar plastic yellow sign.

While the bathroom cleaning approach will work, it has an obviously serious drawback: during the cutover you are now forcing your users to wait, and that wait time could be long (the time for the slowest currently running search to finish).

A much better solution is to immediately direct new searches to the new reader, as soon as it's done warming, and then separately wait for the still-running searches against the old reader to complete. Once the very last search has finished with the old reader, close it.

This solution is fully concurrent: it has no locking whatsoever so searches are never blocked, as long as you use a separate thread to perform the reopen and warming. The time to reopen and warm the new reader has no impact on ongoing searches, except to the extent that reopen consumes CPU, RAM and IO resources to do its job (and, sometimes, this can in fact interfere with ongoing searches).

So how exactly do you implement this approach? The simplest way is to use the reference counting APIs already provided by IndexReader to track how many threads are currently using each searcher. Fortunately, as of Lucene 3.5.0, there will be a new contrib/misc utility class, SearcherManager, originally created as an example for Lucene in Action, 2nd edition, that does this for you! (LUCENE-3445 has the details.)

The class is easy to use. You first create it, by providing the Directory holding your index and a SearchWarmer instance:

  class MySearchWarmer implements SearchWarmer {
    @Override
    public void warm(IndexSearcher searcher) throws IOException {
      // Run some diverse searches, searching and sorting against all
      // fields that are used by your application
    }
  }

  Directory dir = FSDirectory.open(new File("/path/to/index"));
  SearcherManager mgr = new SearcherManager(dir,
                                            new MySearchWarmer());

Then, for each search request:

  IndexSearcher searcher = mgr.acquire();
  try {
    // Do your search, including loading any documents, etc.
  } finally {
    mgr.release(searcher);

    // Set to null to ensure we never again try to use
    // this searcher instance after releasing:
    searcher = null;
}

Be sure you fully consume searcher before releasing it! A common mistake is to release it yet later accidentally use it again to load stored documents, for rendering the search results for the current page.

Finally, you'll need to periodically call the maybeReopen method from a separate (ie, non-searching) thread. This method will reopen the reader, and only if there was actually a change will it cutover. If your application knows when changes have been committed to the index, you can reopen right after that. Otherwise, you can simply call maybeReopen every X seconds. When there has been no change to the index, the cost of maybeReopen is negligible, so calling it frequently is fine.

Beware the potentially high transient cost of reopen and warm! During reopen, as you must have two readers open until the old one can be closed, you should budget plenty of RAM in the computer and heap for the JVM, to comfortably handle the worst case when the two readers share no sub-readers (for example, after a full optimize) and thus consume 2X the RAM of a single reader. Otherwise you might hit a swap storm or OutOfMemoryError, effectively taking down entire whole search application. Worse, you won't see this problem early on: your first few hundred reopens could easily use only small amounts of added heap, but then suddenly on some unexpected reopen the cost is far higher. Reopening and warming is also generally IO intensive as the reader must load certain index data structures into memory.

Next time I'll describe another utility class, NRTManager, available since version 3.3.0, that you should use instead if your application uses Lucene's fast-turnaround near-real-time (NRT) search. This class solves the same problem (thread-safety during reopening) as SearcherManager but adds a fun twist as it gives you more specific control over which changes must be visible in the newly opened reader.