tag:blogger.com,1999:blog-8623074010562846957.post6483188241817199309..comments2023-09-01T03:38:08.236-04:00Comments on Changing Bits: Searching relational content with Lucene's BlockJoinQueryMichael McCandlesshttp://www.blogger.com/profile/04277432937861334672noreply@blogger.comBlogger50125tag:blogger.com,1999:blog-8623074010562846957.post-88317578640258516582018-04-30T12:35:42.157-04:002018-04-30T12:35:42.157-04:00Hi Mike,
I was looking at the code for ToParentBlo...Hi Mike,<br />I was looking at the code for ToParentBlockJoinCollector. I am trying to use this feature for a related use case and saw that it uses a FixedBitSet as a parent filter. Why use a bit set at all? Isn't it the same time to navigate a bitset as it is to navigate a posting list? Wouldn't a scorer also achieve the same purpose? Bahttps://www.blogger.com/profile/07453440508992406057noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-47723331388012025652018-04-25T19:52:59.991-04:002018-04-25T19:52:59.991-04:00This comment has been removed by the author.Bahttps://www.blogger.com/profile/07453440508992406057noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-66248014138368378302017-07-14T09:05:21.900-04:002017-07-14T09:05:21.900-04:00Looks like you asked this on the Lucene user's...Looks like you asked this on the Lucene user's list and got some replies already!Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-25048991636462762892017-07-13T12:47:17.328-04:002017-07-13T12:47:17.328-04:00Hi Michael,
i have usecase. there is a two .txt ...Hi Michael,<br />i have usecase. there is a two .txt files containing millions of records. there is primarykey & foreign key relationship in data. i want to do inner join & result one single json file. please provide good solution asap. waiting for replyAnonymoushttps://www.blogger.com/profile/04559108221070791664noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-86653892036167750952016-05-12T05:58:44.712-04:002016-05-12T05:58:44.712-04:00KSV,
I believe you can only choose between basic ...KSV,<br /><br />I believe you can only choose between basic aggregations of the children scores up to the parent's score (min, max, avg, total). Better to ask on Lucene's users list for more details: java-user@lucene.apache.org.Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-76426945191768218532016-05-12T02:12:56.863-04:002016-05-12T02:12:56.863-04:00Hi Michael,
Does this support custom scoring base...Hi Michael,<br /><br />Does this support custom scoring based on parent and child fields? For example I want to rank the parent documents based on some mathematical function that would use a combination of children and parent fields. Is that supported/possible?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-7549599382653528782015-06-04T03:03:05.350-04:002015-06-04T03:03:05.350-04:00Hi Michael,
I want to search text(columns) from ea...Hi Michael,<br />I want to search text(columns) from each table which are there in different databases within the same server. For example i want to fetch brand, price, color(Which are there in different databases) for a particular mobile number. We have to implement Lucene search engine in our second hand mobile sale online project. So please help me out with code. Thank you so much.Anonymoushttps://www.blogger.com/profile/07581737938547662435noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-13858721972812379592015-01-06T03:42:38.403-05:002015-01-06T03:42:38.403-05:00Hi Mike,
I'm trying to do search parent using...Hi Mike,<br /><br />I'm trying to do search parent using two child documents. You mentioned in blog that, we can do parallel joins also. I tried to do same using two ToParentBlockJoinQuery for two different child's and ANDed them using boolean query but that wont return any result. Currently I am using lucene 4.10.2 . Nitin Kothwalhttps://www.blogger.com/profile/01744294229558832931noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-15188464485741061102014-03-10T01:18:27.475-04:002014-03-10T01:18:27.475-04:00Hi Mike,
Thanks for the reply. I can reproduce th...Hi Mike,<br /><br />Thanks for the reply. I can reproduce this issue with 4.7.0. I'm going to file the bug.Anonymoushttps://www.blogger.com/profile/07998338529682760796noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-4819866906946215892014-03-07T06:11:34.612-05:002014-03-07T06:11:34.612-05:00Hi Sally,
Hmm, maybe first verify this is still a...Hi Sally,<br /><br />Hmm, maybe first verify this is still a problem on the latest (4.7.0) release? And if so, open a Jira issue (https://issues.apache.org/jira/browse/LUCENE ) with the details? Just commenting out those lines is not right, because they we're not checking whether the document was deleted.Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-85953747499542113272014-03-07T04:14:41.607-05:002014-03-07T04:14:41.607-05:00Hi Mike,
I'm using ToChildBlockJoinQuery in l...Hi Mike,<br /><br />I'm using ToChildBlockJoinQuery in lucene 4.2.0 and encounter this problem:<br />When I have a parent document with no children I get an ArrayIndexOutOfBoundException during search. The cause is in ToChildBlockJoinQuery.java:241<br /><br />Debugging this shows that when I have a parent document without children the assert in line 239 doesn't hold and it will keep incrementing the child until it gets the exception<br />assert childDoc < parentDoc: "childDoc=" + childDoc + " parentDoc=" + parentDoc;<br /><br />To fix this I tried commenting line 223 to 225:<br /><br /> if (acceptDocs != null && !acceptDocs.get(childDoc)) {<br /> continue nextChildDoc;<br /> }<br /><br />So far it seems to work for me.<br />Is this a bug in lucene? Will my fix break something else?Anonymoushttps://www.blogger.com/profile/07998338529682760796noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-51865859284287364252013-10-07T18:07:38.217-04:002013-10-07T18:07:38.217-04:00nice post man
will save me a few days of codenice post man <br />will save me a few days of codeAnonymoushttps://www.blogger.com/profile/00441530981224031521noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-46796241583591898182013-06-03T10:13:21.684-04:002013-06-03T10:13:21.684-04:00Hi bobbytech,
That's too bad ElasticSearch do...Hi bobbytech,<br /><br />That's too bad ElasticSearch doesn't let you get at the specific child hits: this is an important capability of block join. E.g. I use this at http://jirasearch.mikemccandless.com (each comment on an issue is a child doc).<br /><br />The API certainly allows for this so I'm not sure why ElasticSearch doesn't return it ...<br /><br />There was https://issues.apache.org/jira/browse/LUCENE-4774 which Martijn fixed for Lucene 4.3, which seems relevant? It lets you sort the parents by the largest or smallest value for a given child field for all children under that parent. I would guess that Martijn did this in order to expose it in ElasticSearch ...Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-5648523469854618212013-05-29T11:35:41.799-04:002013-05-29T11:35:41.799-04:00Hey Mike,
Actually, no ES currently doesn't s...Hey Mike,<br /><br />Actually, no ES currently doesn't support this. There is an open ticket for it here:<br /><br />https://github.com/elasticsearch/elasticsearch/issues/3022<br /><br />I'm not sure if there has been any advancements in the new Lucene 4.x line that allows for nested sorting as described there. Do you know if that is currently possible? <br /><br />Thanks again!bobbytechhttps://www.blogger.com/profile/04011838989225023468noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-43749671010805673672013-05-07T15:58:06.298-04:002013-05-07T15:58:06.298-04:00Hi bobbytech,
I think ElasticSearch's nested ...Hi bobbytech,<br /><br />I think ElasticSearch's nested documents already provide this capability? Don't you only get back the children that matched?Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-33041585678484744432013-05-06T07:51:04.528-04:002013-05-06T07:51:04.528-04:00Wow, great post!. Do you know if it is possible to...Wow, great post!. Do you know if it is possible to get the BlockJoinCollector.getTopGroups semantics using ElasticSearch? We have some really large documents that have several children. If we could return only those children that matched out query, that would save on post processing and io for ES.<br /><br />Thanks for the great article!bobbytechhttps://www.blogger.com/profile/04011838989225023468noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-77597592243712880412012-11-26T07:36:58.724-05:002012-11-26T07:36:58.724-05:00Super, I'm glad that worked.
It's actuall...Super, I'm glad that worked.<br /><br />It's actually possible to make a single Term from your searchPerson.Id integer and then delete by Term instead; it should be faster (to apply the deletes) and use less RAM ... I'm not sure of the details, can you email the user's list (java-user@lucene.apache.org) if you want to explore this?Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-18327904709570848612012-11-26T04:53:43.255-05:002012-11-26T04:53:43.255-05:00Ok deleting the child documents works. I added an ...Ok deleting the child documents works. I added an extra fields (the same one on all my childs) which is named something like PersonId, with the same value in every child. On my delete request I do the following:<br />Query lQuery = NumericRangeQuery.newIntRange("PersonId", new java.lang.Integer(searchPerson.Id), <br />new java.lang.Integer(searchPerson.Id), true, true);<br />_writer.deleteDocuments(lQuery);<br /><br />And that cleans it up nicely.Anonymoushttps://www.blogger.com/profile/07793566547641011975noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-91464410218546290792012-11-25T09:34:57.047-05:002012-11-25T09:34:57.047-05:00Haven't tried yet, maybe it's best if I cr...Haven't tried yet, maybe it's best if I create a shared property like "personId: the_id" which I place on all my child documents and when I delete search on that field with the person id. I'll try it tomorow and share the result. Anonymoushttps://www.blogger.com/profile/07793566547641011975noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-57495916642096127982012-11-22T12:40:28.550-05:002012-11-22T12:40:28.550-05:00Hi Mikhail,
"Reserving" docID gaps coul...Hi Mikhail,<br /><br />"Reserving" docID gaps could in theory work, but in order to efficiently overlay the updated segment (with doc#5 in your example) I think you'd need something like stacked segments (LUCENE-4258)?<br /><br />Even with LUCENE-4258, which will entail a perf hit at search time, it's not clear how often that's a good tradeoff (vs the cost of re-indexing all children + parents for that one doc block)...Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-83450615271853639542012-11-21T14:15:31.992-05:002012-11-21T14:15:31.992-05:00Mike,
I have one obsession about isolated single c...Mike,<br />I have one obsession about isolated single child update in a block. Let's imagine that we have many small segments in index ( as we recently moved to Lucene 4.0 with fancy concurrent flush). <br />- When we add 10 docs blocks, after we inserted 9 children, let's just spin current docnum counter to 10 docs (assuming DocInvertorPerThread has such counter). <br />- What we have afterwards: 0-8 docnums are children docs, then a gap - there will no docs with 9...19 and then parent goes with docnum 20. <br />- Everything should works as-is on such index with gaps. <br />- When we need to update child 5 we can prepare segment with single document, which will substitute doc#5, and then<br />- we can merge those two segments in overlapping manner: the new doc will have one of free docNums from the gap. <br /><br />how do you think it's feasible?Anonymoushttps://www.blogger.com/profile/03731629466352186647noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-25397472327421575382012-11-21T08:25:27.388-05:002012-11-21T08:25:27.388-05:00If you also delete the child docs, does it work?If you also delete the child docs, does it work?Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-81875299874550837842012-11-21T02:38:48.745-05:002012-11-21T02:38:48.745-05:00Any work arounds for this? For me it's impossi...Any work arounds for this? For me it's impossible to index my whole table again. The rest seems to work perfect (searching, indexing, joining, ...).Anonymoushttps://www.blogger.com/profile/07793566547641011975noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-22312349493961147812012-11-20T13:50:17.488-05:002012-11-20T13:50:17.488-05:00Hmmm that sounds like a bug, that searching on a f...Hmmm that sounds like a bug, that searching on a field of the child and then joining up to the deleted parent, will return that parent document. Can you boil that down to a small test case and open an issue?<br /><br />That query should be fine for deletion, and it sounds like it's clearly succeeding in deleting the parent document...Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-76297991609206371132012-11-20T08:32:24.944-05:002012-11-20T08:32:24.944-05:00Deleting the parent document works, but the child ...Deleting the parent document works, but the child docs are still present. And if I search on a field of the parent it won't find anything, but if I search on a field of the child it will find something. Even fields that were stored in the parent document. <br /><br />I'm deleting like this:<br /><br /> Query lQuery = NumericRangeQuery.newIntRange("Id", new java.lang.Integer(searchPerson.Id), <br /> new java.lang.Integer(searchPerson.Id), true, true);<br /> _writer.deleteDocuments(lQuery);Anonymoushttps://www.blogger.com/profile/07793566547641011975noreply@blogger.com