tag:blogger.com,1999:blog-8623074010562846957.post1055245798328442797..comments2023-09-01T03:38:08.236-04:00Comments on Changing Bits: A new proximity query for Lucene, using automatonsMichael McCandlesshttp://www.blogger.com/profile/04277432937861334672noreply@blogger.comBlogger7125tag:blogger.com,1999:blog-8623074010562846957.post-42000029042703760532015-03-05T09:04:57.052-05:002015-03-05T09:04:57.052-05:00The scoring of this query is "like" a Ph...The scoring of this query is "like" a PhraseQuery in that it counts how many matches occurred in the document and uses that as the "term freq" for scoring. So I'm not sure why you see always 1.0 score ... can you make up a test case and post to the Lucene user's list?Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-15397089207760166982015-02-17T07:53:32.087-05:002015-02-17T07:53:32.087-05:00This is a great tool , especially in the way that ...This is a great tool , especially in the way that it is flexible.<br />I use this to combine fuzzy and wildcard query.<br />(for example a query on "levy" should also return "levinshtein")<br /><br />However, the scoring of documents is lost. I want the top results to be those which are more similar to the query, but i get all scores as 1.0<br />Any ideas?<br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-47615403577386954062014-09-07T20:05:21.700-04:002014-09-07T20:05:21.700-04:00Is great to hear that this is coming in Lucene 4.1...Is great to hear that this is coming in Lucene 4.10!<br />Your post `Lucene's TokenStreams are actually graphs` encouraged us at JusBrasil to start working on a custom TokenFilter/QueryParser to improve lucene's WorldDelimiter. <br />We deployed it recently, and hope open source it soon. but we are using it as a ElasticSearch plugin, so we still need to look for the best way to decouple and release it.<br /><br />We don't actually use as graph of tokens, but only label then into groups and make a boolean query with the phrase query of each group.<br />This may make it not generic enough to replace the default tokenfilter, but it was nice improvement to us, cause most of the data we index/query don't have a formal pattern, or the user aren't aware of it(e.g.: '2014.09-PE', '201409 PE', '201409/PE').<br /><br />Well, is awesome to hear that Lucene is improving in this area as well. CheersAnonymoushttps://www.blogger.com/profile/09949610284220306372noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-51998411756156357062014-09-01T05:56:07.569-04:002014-09-01T05:56:07.569-04:00I don't have any plans ... patches welcome! I...I don't have any plans ... patches welcome! In theory MTQ could just rewrite to a set of transitions from one state to another, one per term that the MTQ enumerated. Doing something like SpanNot would be trickier ... it may require a separate "special" label (like "any") that matches all but certain specified terms?Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-19122439149988146802014-08-30T15:42:00.617-04:002014-08-30T15:42:00.617-04:00Mike, this is fantastic! Any plans to add multiter...Mike, this is fantastic! Any plans to add multitermqueries and/or SpanNot? Apologies if I missed these...I've only had a chance to skim the test cases. Thank you, again!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-57023069255951024322014-08-27T08:47:42.830-04:002014-08-27T08:47:42.830-04:00You could use an automaton to hold trending phrase...You could use an automaton to hold trending phrases...Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-91426354762581901722014-08-12T09:52:22.102-04:002014-08-12T09:52:22.102-04:00Very nice article. I had a small question, is it c...Very nice article. I had a small question, is it correct to use it correct to use automaton to create something like trending phrases for example: Malaysia airline, Malaysian airline crash, Malaysian airline crash blackbox etc or is the use case totally misconstrued ??<br /><br />Thanks<br /><br />Ravi Kiran Bhaskar Anonymousnoreply@blogger.com