But what if you sometimes need even better than near-real-time? What if you need to look up truly live or real-time values, so for any document id you can retrieve the very last value indexed?
Just use the newly committed
LiveFieldValues
class!
It's simple to use: when you instantiate it you provide it with your
SearcherManager
or NRTManager
,
so that it can subscribe to the RefreshListener
to be
notified when new searchers are opened, and then whenever you add,
update or delete a document, you notify
the LiveFieldValues
instance. Finally, call
the get
method to get the last indexed value for a given
document id.
This class is simple inside: it holds the values of recently indexed documents in a
ConcurrentHashMap
, keyed by the document
id, to hold documents that were just indexed but not yet available
through the near-real-time searcher. Whenever a new near-real-time
searcher is successfully opened, it clears the map of all entries that
are now included in that searcher. It carefully handles the
transition time from when the reopen started to when it finished by
checking two maps for the possible value, and failing that, it falls
back to the current searcher.
LiveFieldValues
is abstract: you must subclass it and
implement the lookupFromSearcher
method to retrieve a
document's value from an IndexSearcher
, since how your
application stores the values in the searcher is application dependent
(stored fields, doc values or even postings, payloads or term
vectors).
Note that this class only offers "live get", i.e. you can get the last indexed value for any document, but it does not offer "live search", i.e. you cannot search against the value until the searcher is reopened. Also, the internal maps are only pruned after a new searcher is opened, so RAM usage will grow unbounded if you never reopen! It's up to your application to ensure that the same document id is never updated simultaneously (in different threads) because in that case you cannot know which update "won" (Lucene does not expose this information, although LUCENE-3424 is one possible solution for this).
An example use-case is to store a
version
field per
document so that you know the last version indexed for a given id; you
can then use this to reject a later but out-of-order update for that
same document whose version is older than the version already indexed.
LiveFieldValues
will be available in the next Lucene
release (4.2).
Mike, why it reminds me Solr's realtime get, and transaction log, especially by ConcurrentHashMap, and lookup by id? Is it a kind of dupe?
ReplyDeleteHi Mikhail,
ReplyDeleteI think the idea is the same as Solr's realtime get, but I don't know the full details of how Solr's implementation works ... so sort of a dup?
absolutely.
Deletehttps://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/handler/component/RealTimeGetComponent.java#L145
http://yonik.com/solr/realtime-get/
excuse my late reply
Hi Mike, I would like to get in touch with you on mail regarding a consulting / training on Lucene.
ReplyDeleteRespond to johnsindhu at gmail dot com.
Hi John,
ReplyDeleteSorry, I'm not available for Lucene consulting / training... maybe try sending an email to the java-user@lucene.apache.org list?