tag:blogger.com,1999:blog-8623074010562846957.post5694774165455660309..comments2023-09-01T03:38:08.236-04:00Comments on Changing Bits: Near-real-time readers with Lucene's SearcherManager and NRTManagerMichael McCandlesshttp://www.blogger.com/profile/04277432937861334672noreply@blogger.comBlogger41125tag:blogger.com,1999:blog-8623074010562846957.post-83888710409381795962018-12-13T03:34:56.914-05:002018-12-13T03:34:56.914-05:00Thanks a lot, Mike. I am using searcher manager to...Thanks a lot, Mike. I am using searcher manager to refresh the documents and getting the error. I will reach out to lucene user's list.<br /><br />Regards,<br />PavanPavanhttps://www.blogger.com/profile/18045418186694409357noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-35510324707041094352018-12-12T09:54:02.875-05:002018-12-12T09:54:02.875-05:00Hi Pavan,
You should use SearcherManager -- it ma...Hi Pavan,<br /><br />You should use SearcherManager -- it makes it really simple to refresh the searcher while queries are still in flight across multiple threads.<br /><br />It's best to ask on the Lucene user's list -- java-user@lucene.apache.orgMichael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-62507864206854967452018-12-12T01:29:09.460-05:002018-12-12T01:29:09.460-05:00Hello Mike,
Need your help to address the below e...Hello Mike,<br /><br />Need your help to address the below error while refreshing the lucene index,<br /><br />java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code<br /><br />We have a batch process which on a daily basis, creates the index and refreshes with old indexes and we do have an api, which will be consuming the indexes mean time.<br /><br />We are getting an error from api while this refresh happens.<br />Can you help us to know, what is the best practice to refresh the lucene indexes without affecting any existing components which are using the lucene indexes?<br /><br />Your suggestions are highly appreciated.<br /><br />Regards,<br />PavanPavanhttps://www.blogger.com/profile/18045418186694409357noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-17985923124564203682014-12-19T06:23:37.832-05:002014-12-19T06:23:37.832-05:00You should not need to call Searcher.IndexReader.R...You should not need to call Searcher.IndexReader.Reopen like that, assuming the C# port is like Lucene's. A single call to .maybeRefresh will open a new NRT reader, if there are any changes.<br /><br />Also, NRTManager (renamed / factored out a while back to ControlledRealTimeReopenThread in Lucene) is only needed when you have some threads that want a "real-time" reader and other threads that are OK with the current near-real-time reader.<br /><br />Maybe you should simplify your test to just use an "ordinary" SearcherManager and see if the problem still happens?<br /><br />If so, there must be a bug somewhere in tracking of changes in the C# IndexWriter...Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-88452196025087013612014-10-17T05:27:02.457-04:002014-10-17T05:27:02.457-04:00Hi Mike,
I'm implementing NRTManager in c# usi...Hi Mike,<br />I'm implementing NRTManager in c# using Lucene.Net.Contrib.Management.dll. I load all documents using an IndexWriter:<br /><<<br />Directory d = new RAMDirectory();<br /> indexWriter = new IndexWriter(d, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_CURRENT), !IndexReader.IndexExists(d), new Lucene.Net.Index.IndexWriter.MaxFieldLength(IndexWriter.DEFAULT_MAX_FIELD_LENGTH));<br /> <br />doc = new Document();<br /> doc.Add(new Field(<br /> "info",<br /> info,<br /> Field.Store.YES,<br /> Field.Index.ANALYZED));<br /> // Write the Document to the catalog<br /> indexWriter.AddDocument(doc);<br />>><br /> <br />and then initialize the NRTManager with it.<br /><<<br />static NrtManager man = new NrtManager(indexWriter);<br />>><br /><br />When I need to add a new entry to the manager I do this:<br /><<<br />doc = new Document();<br /> doc.Add(new Field(<br /> "Info",<br /> newInfo,<br /> //dr["NickName"].ToString(),<br /> Field.Store.YES,<br /> Field.Index.ANALYZED));<br /> // Write the Document to the catalog<br /> man.AddDocument(doc);<br />>><br /><br />At search I ALWAYS do this:<br /><<<br />if (man.GetSearcherManager().MaybeReopen())<br /> man.GetSearcherManager().Acquire().Searcher.IndexReader.Reopen();<br /> <br />var hits = man.GetSearcherManager().Acquire().Searcher.Search(query, 50);<br />>><br /><br />My problem is that I'm only able to get one new entry after de initial load. When I add a second entry, the search does not get me this one.<br /><br />Can you help me with this, please?<br /><br />Thanks,<br />Galder.Anonymoushttps://www.blogger.com/profile/15552613788658267271noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-62378738756851826322014-07-06T06:00:29.391-04:002014-07-06T06:00:29.391-04:00That code looks correct!
Then, for each query, yo...That code looks correct!<br /><br />Then, for each query, you determine whether it needs the "current" reader or it must wait for a specific indexing generation (because you want to ensure a certain indexing change is visible), when acquiring the searcher.Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-38968102756129768252014-06-26T09:46:32.908-04:002014-06-26T09:46:32.908-04:00Spring container was initializing the bean twice. ...Spring container was initializing the bean twice. I fixed the above issue. Could you please correct me if above implementation is correct for NRT using lucene 4.7.2?Arun BChttps://www.blogger.com/profile/05374079507703204066noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-484421291649477162014-06-26T09:15:30.512-04:002014-06-26T09:15:30.512-04:00I tried the following code,
Directory...I tried the following code,<br /><br /> Directory fsDirectory = FSDirectory.open(new File(location));<br /> Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);<br /> IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_47, analyzer);<br /> indexWriterConfig.setRAMBufferSizeMB(16);<br /> indexWriterConfig.setOpenMode(OpenMode.CREATE_OR_APPEND);<br /> <br /> indexWriter = new IndexWriter(fsDirectory, indexWriterConfig);<br /> trackingIndexWriter = new TrackingIndexWriter(indexWriter);<br /><br /> referenceManager = new SearcherManager(indexWriter, true, null);<br /><br /> controlledRealTimeReopenThread = new ControlledRealTimeReopenThread(trackingIndexWriter,<br /> referenceManager, 60, 0.1);<br /> controlledRealTimeReopenThread.setDaemon(true);<br /> controlledRealTimeReopenThread.start();<br /><br />While trying to call this during application initialization, am getting the below exception.<br />org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@C:\Users\arun.bc\lucene-home\contractrate\write.lock<br /> at org.apache.lucene.store.Lock.obtain(Lock.java:89) ~[lucene-core-4.7.2.jar:4.7.2 1586229 - rmuir - 2014-04-10 09:00:35]<br /> at org.apache.lucene.index.IndexWriter.(IndexWriter.java:707) ~[lucene-core-4.7.2.jar:4.7.2 1586229 - rmuir - 2014-04-10 09:00:35]<br /><br />Please suggest...Arun BChttps://www.blogger.com/profile/05374079507703204066noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-46364513589542168172014-06-25T08:52:36.251-04:002014-06-25T08:52:36.251-04:00It's definitely for use with NRT search, and t...It's definitely for use with NRT search, and then for real-time search for queries that require it; have a look at its unit tests in a Lucene source installation / svn checkout?Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-88546121887225660692014-06-25T05:53:33.061-04:002014-06-25T05:53:33.061-04:00In Lucene 4.7.2, I think NRTManager is replaced wi...In Lucene 4.7.2, I think NRTManager is replaced with ControlledRealTimeReopenThread. As NRTManager was not available in the current release am kind of confused. Am trying out using ControlledRealTimeReopenThread but am not sure whether it will be near real-time. Can you provide some example for near real-time search using ControlledRealTimeReopenThread or it should be not used for near real-time?Arun BChttps://www.blogger.com/profile/05374079507703204066noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-84638479448310072212014-05-28T10:37:37.980-04:002014-05-28T10:37:37.980-04:00Hi Mike,
I am implementing NRT and found that 4.4...Hi Mike,<br /><br />I am implementing NRT and found that 4.4.0 release onwards the Near Real Time Manager (org.apache.lucene.search.NRTManager) has been replaced by ControlledRealTimeReopenThread. <br /><br />Please advise should I use ControlledRealTimeReopenThread as described at http://stackoverflow.com/questions/17993960/lucene-4-4-0-new-controlledrealtimereopenthread-sample-usage?answertab=votes#tab-top.<br /><br />Thanks<br />Gaurav GuptaGaurav Guptahttps://www.blogger.com/profile/16246979856567864428noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-32423247273394407452014-05-28T10:35:44.326-04:002014-05-28T10:35:44.326-04:00This comment has been removed by the author.Gaurav Guptahttps://www.blogger.com/profile/16246979856567864428noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-59202597983580065822014-02-03T06:52:50.634-05:002014-02-03T06:52:50.634-05:00Hi Jigar,
This is useful if you have some request...Hi Jigar,<br /><br />This is useful if you have some requests that must show all deletions (such as incoming user searches) and other requests where it doesn't matter (e.g. if you have some automation scripts that run searches looking for specific SKUs or something)... in that case you can simply make two NRTManager instances. This is a fairly esoteric use case, though, and I would start by just making a single instance that always applies deletes and sharing that across both use cases, until/unless you hit performance issues.Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-54461345854937024522014-02-03T05:23:16.164-05:002014-02-03T05:23:16.164-05:00Hello Michael,
Thanks first of all, Your blogs/po...Hello Michael,<br /><br />Thanks first of all, Your blogs/posts they are very useful when i hit some problem which is internal to Lucene.<br /><br />Please if you can help me understand following line which i took from NRTManager class comment<br /><br />"You may want to create two NRTManagers, once<br />that always applies deletes on refresh and one that does<br />not. In this case you should use a single {@link<br />NRTManager.TrackingIndexWriter} instance for both."<br /><br />Does this mean one with applyDeletes=true should be used by application code which is mostly creates/updates index. and other one applyDeletes=false should be used mainly to acquire() searchers, and used by search threads in application.<br />Jigarhttps://www.blogger.com/profile/13461575627688089524noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-14902824731021937102014-02-03T05:17:31.115-05:002014-02-03T05:17:31.115-05:00This comment has been removed by the author.Jigarhttps://www.blogger.com/profile/13461575627688089524noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-70359668218380962462013-11-20T15:55:51.699-05:002013-11-20T15:55:51.699-05:00OK, that seems like a good solution (IndexWriter.d...OK, that seems like a good solution (IndexWriter.deleteAll); this way the file names will never be reused (Lucene is write-once).Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-12464149736271566802013-11-20T14:22:51.209-05:002013-11-20T14:22:51.209-05:00Hi Michael,
We were finally able to fix this issu...Hi Michael,<br /><br />We were finally able to fix this issue. This is our understanding of the problem and how we fixed it:<br />- At the code level we were deleting all (previous) documents and adding a new set.<br />- But, at the OS level we were deleting all files before thinking this was actually a safe approach.<br /><br />It turns out that because of this the new Index files ended up with the exact same name as the old one. When we copied over the files and the SearchManager loaded them up we were seeing that although a new IndexReader instance was being created, the underlying readers were still pointing to the 'old' index.<br /><br />The way we fixed this is that we stopped deleting files but rather let Lucene take care of the whole thing. After that we started to see new files being created and while the old files were still there the SearchManager was now able to fetch the new set of documents.<br /><br />Regards,<br />MVAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-16361180872832610032013-11-08T05:58:28.386-05:002013-11-08T05:58:28.386-05:00Hmm, this (that you said above) is particularly tr...Hmm, this (that you said above) is particularly troubling: "When debugging we see a new IndexReader being created. We just don't see the new documents added by the Indexer.".<br /><br />If the index was newly built and copied over, and then the old IndexReader is opened, and indeed a new instance was opened, yet you are still missing documents ... I think there must be that the documents you expected were not in fact indexed? Or, are you certain a new IndexReader was actually opened?Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-33107077812373448152013-11-07T10:34:12.073-05:002013-11-07T10:34:12.073-05:00Well, actually that was somebody else from my team...Well, actually that was somebody else from my team. Whatever I put in my original comment can shed a light on the big picture of the app we built. Like I mentioned before, reading and updating the indexes are processes done by 2 different apps running in different servers. This seems to be an atypical use case as everywhere in Lucene's docs and forums the 'normal' usage seems to be the same code handling both operations. Like my coworker explains the issue seems to be with the new index having the same filenames as the old one. This seem to cause the IndexReaders to point to the old segments.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-3266323365841684962013-11-07T06:00:27.746-05:002013-11-07T06:00:27.746-05:00Hi MV, it looks like you also asked on the Lucene ...Hi MV, it looks like you also asked on the Lucene user's list... so I replied there.Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-74817175202468329282013-11-06T17:54:12.930-05:002013-11-06T17:54:12.930-05:00Hi Michael,
We're having an issue where updat...Hi Michael,<br /><br />We're having an issue where updates are not being picked up by the IndexReader but I'm starting to think it might be related to our particular architecture. The reading is done by a Web app but index updates are done by a completely separate process (Indexer). Once that process is done we have a Unix script that cleans up the index directory (used by the web app) and copies over the new set of files generated by the indexer.<br /><br />The way we're trying to handle this on the web app side is to have a scheduled thread that wakes up every 5 mins, grabs a reference to the SearchManager (the same SearchManager used during reading) and then calls manager.maybeReopen().<br /><br />When debugging we see a new IndexReader being created. We just don't see the new documents added by the Indexer.<br /><br />I hope this explanation is clear enough. Any pointers will be greatly appreciated.<br /><br />Thanks,<br />MVMVnoreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-65022073551344329372013-05-15T07:16:04.178-04:002013-05-15T07:16:04.178-04:00Hi Anonymous,
I would definitely start with Lucen...Hi Anonymous,<br /><br />I would definitely start with Lucene 4 at this point.Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-30913157258897123132013-05-15T02:40:55.653-04:002013-05-15T02:40:55.653-04:00Mike, am a beginner in Lucene, would you suggest m...Mike, am a beginner in Lucene, would you suggest me to jump on Lucene4 or Lucene 3 ??<br /><br />going through the API's could see lot of changes in L4 than L3..<br /><br />Please suggest.<br /><br />Regards,<br />RonaldAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-29699231890042899342013-01-27T06:37:09.401-05:002013-01-27T06:37:09.401-05:00Thanks Mike. Sorry I edited the path of index file...Thanks Mike. Sorry I edited the path of index file so directory name is merged there.<br /><br /><br />Sure. I will contact the bobo browse author. I have created jira ticket in zoie projectBrijrajInSearchOfhttps://www.blogger.com/profile/09535355396041871855noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-19319771898213483122013-01-27T05:23:51.191-05:002013-01-27T05:23:51.191-05:00Hi In Search Of,
It seems likely that an IndexRea...Hi In Search Of,<br /><br />It seems likely that an IndexReader has this file open, and that causes the "Cannot overwrite" error?<br /><br />However, event_2.fdt isn't a norma Lucene index filename.<br /><br />I think you have to ask the bobo browse authors for help... I'm not sure what this code is doing.Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.com