This is a great showcase of a number of Lucene's important features:
- Using block join queries to model parent (the original Jira issue) and
children (each comment) documents. This basic relational structure is
also common in
e-commerce applications, where you have a product (e.g. a specific
shirt) and then individual SKUs (size/color combinations) under
- Faceting, with flat, hierarchical, and dynamic numeric range
fields. Remember you can pick multiple facet values (multi-select) with shift+click!
DrillSidewaysfacet counts, so you don't lose facet counts of other labels just because you drilled down to one of them
AnalyzingInfixSuggesterfor auto-suggest, including near-real-time updates. Suggestions are project specific: if you have drilled down to specific project/s, then the suggestions will only be from those projects, thanks to
AnalyzingInfixSuggesternow supporting contexts
- Near real time indexing and searching
WordDelimiterFilterso camel case tokens are split (try searching for infix)
- Using expressions to dynamically compute a blend of recency and relevance for the sort order score for hits
Since the initial release of Jirasearch it has seen substantial usage and interest from users and developers. Building this and keeping it running all this time has been an awesome and humbling exercise for me because I get to experience life as a "production" user of our software. At the same time, we all get a nice search UI for finding issues.
Upgrading from Lucene 4.6.x to 6.x
For the past week or so I had another similarly humbling experience,
from the very-old Lucene 4.6.x release, to the latest 6.x release.
Small (yet vital!) things changed, such as
requirement to use a special index searcher
ToParentBlockJoinQuery, which conflicts with how you
I hit this
bug in the infix suggester. Something changed about pure negative
boolean queries, but I am still not sure what (I have worked around it
I had already previously upgraded Lucene server to dimensional points so I got that "for free" for the existing numeric fields in Jirasearch.
New Jirasearch features
Besides "merely" upgrading from Lucene 4.6.x to 6.x, and switching all numeric fields to the new dimensional points, I also added some compelling user-visible improvements (thank you to Alexandre Rafalovitch for suggesting some of these, thus kick-starting my unexpectedly challenging upgrade-and-improve effort):
- email@example.com is finally presented as Doug Cutting! Plus, the auto-suggest now works if you type "Doug".
- The new Updated ago facet dimension lets you drill down to issues that have not been updated for some time.
- The new Last comment user facet dimension is the user who last commented on an issue.
- The new Committed by facet dimension lets you drill down to those issues a given developer has committed changes for.
- The Committed paths hierarchical facet dimension, letting you find issues according to which paths in the source tree were changed for that issue, was broken since we switched from Subversion to Git.
- The Infrastructure project issues are now included as well.
- The per-comment text processing sees some minor improvements, e.g. expanding
a referenced user name to their display name,
commitbotcomment link directly to the change set and including the branch name, plus a few new synonyms (try pnp!)
The new facet fields are especially fun: you can now find issues that you perhaps killed, by drilling down on Updated ago > 1 month ago and Last comment user = you (this was the use case suggested by Alexandre).
Another fun one is to see issues a given developer committed (Committed by) to an unusual part of the source tree (Committed paths), e.g. the issues where I committed changes to Solr for a Lucene Jira issue.
Open source Jirasearch
With this update I am also making all the sources behind jirasearch
open-source under the Apache 2 license, in
of the luceneserver github project.
While Luceneserver itself is entirely Java, the sources for the Jirasearch application, to extract details of all issues from the Apache Jira instance, to convert those documents into Lucene server documents, to do a full and near-real-time indexing, building suggestest, and the search UI, are entirely Python.
Please note the Python sources are not particularly pretty. Yet, they are functional, and as always: patches welcome!
It's likely I broke things during this upgrade process; please let me know (add a comment here, or shoot me an email) if so.
Thanks for maintaining Luceneserver & jirasearch, Mike!ReplyDelete
You're welcome David!Delete
Lot of helpful resources in this article. Thanks a lot.ReplyDelete
"Faceting, with flat, hierarchical, and dynamic numeric range fields."
Where did you use hierarchical facet in this page? Committed paths? Do you use taxonomy index?
If yes, Is it replaceable with flat facets (using sortedsetdocvaluefacetfield) by applying "path traversed terms" in filter?
Yes, the Committed Paths is the only hierarchical facet field here, using Lucene's taxonomy facets.
I'm not sure if you could emulate the hierarchy on top of SSDVFacets, but I agree it would be wonderful if we could improve SSDVFacets to support a hierarchy: patches welcome! Maybe open an issue so we could discuss options?
Trying to build Lucene Server. The process stop at:
init: cloning lucene branch_6x to ./lucene6x...
Where I can report this issue?
Which Lucene Server sources are you using?
I already make a pull request in GitHub's mikemccand/luceneserver.
Please, take a look!
Aha, great, I just merged it! Thank you.Delete
You are welcome!Delete
How did you combine relevance calculated by Lucene similarity and recency?
How much weightage you gave for Lucene relevance score and recency?