AnalyzingSuggester
,
FuzzySuggester
and AnalyzingInfixSuggester
.
Using an analyzer is powerful because it lets you customize exactly
how suggestions are matched: you can normalize case, apply stemming, match across
different synonym forms, etc.
One of the most common things you'll do with your analyzer is to remove stop-words using
StopFilter
. Unfortunately, if
you try this, you'll quickly notice that the stop filter is too
aggressive because it happily removes the last token even if the user
isn't done typing it yet. For example if the user has typed "a",
you'd expect suggestions like apple, aardvark, etc., but you won't get
that because StopFilter
removed the "a" token.
You could try using
StopFilter
only while indexing, which
was my first attempt with the suggestions
at jirasearch.mikemccandless.com,
but then, at least
for AnalyzingInfixSuggester
,
you'll fail to get matches when you
pass allTermsRequired=true
because the suggester then requires
that even stop words find matches.
Finally, you could use the new
StopSuggestFilter
at lookup time: this filter is just like StopFilter
except
when the token is the very last token, it checks the offset for that
token and if the offset indicates that the token has ended without any
further non-token characters, then the token is preserved. The token
is also marked as a keyword, so that any later stem filters won't change
it. This way a query "a" can find "apple", but a query "a " (with a
trailing space) will find nothing because the "a" will be removed.
I've pushed
StopSuggestFilter
to
jirasearch.mikemccandless.com
and it seems to be working well so far!
Great! I've created a similar filter, but without offsets (so I don't know whether there is a space after the word). The problem was that there are some expressions, which has a lot of stop words before normal words like "think about to be or not to be a Solr committer". If there is such a document and you try to type in this expression from the beginning to the end, you get very strange suggestions until you type "Solr". I tested it with an AnalyzingSuggester in Solr. Hope, I can test your new filter, when there is a factory for AnalyzingInfixSuggester in Solr.
ReplyDeleteThis comment has been removed by the author.
DeleteThis sounds a great addition, Mike!
ReplyDeleteI was long ago wondering if it is possible to keep stop words inside a token n-gram sequence and remove them on the boundaries. Would you have any ideas in this area too?
For reference, the question on stackoverflow (suggestions done with other means, but the principle remains):
http://stackoverflow.com/questions/4954735/autocomplete-via-shingles-and-termvector-component
Hi Dmitry,
ReplyDeleteCouldn't you make a custom stop filter that only removed stop words at the start (first token(s) seen) or end of the input (no non-stopword tokens seen after)? It'd required some buffering / state keeping (capture/restorteState) but it seem doable?
Hi Mike,
DeleteThat sounds like a reasonable idea. Thanks!
I assumed this was in the standard distribution of Solr4.5+, but no? I tried
ReplyDelete<filter class="org.apache.lucene.search.suggest.analyzing.SuggestStopFilter"
but I get a plugin init failure. The filter looks quite helpful. Thanks.
Hmm unfortunately it looks like this hasn't been exposed through Solr / as a factory. Maybe open an issue?
DeleteHi Michale,
ReplyDeleteThanks for very much informative post.
do we have any config analyzer or filter in solr to stop suggesting the phrase that ends with stopwords?
For ex:
If the suggestion are as below for query http://localhost/solr/suggest?q=jazz+a
"suggestion": [
"jazz and",
"jazz at",
"jazz at lincoln",
"jazz at lincoln center",
"jazz artists",
"jazz and classic"
]
Is there any config or solution to remove only "jazz at" and "jazz and" phrases so that the final suggestion response looks more sensible!
"suggestion": [
"jazz at lincoln",
"jazz at lincoln center",
"jazz artists",
"jazz and classic"
]
Google does this intelligently :)
I have tested with StopFilterFactory and SuggestStopFilter both of which does not this.
Do i have to come up with a custom plugin to do this in solr?
Thanks,
Rajesh.
Which suggester are you using? Free text?
DeleteHello Michael, Rajesh. I had a similar requirement to remove trailing stopwords from a shingle-based suggester, so I implemented a filter that does just that. You might want to have a look at https://github.com/spyk/shingle-stop-filter. Thanks!
Delete