This is a great showcase of a number of Lucene's important features:
- Using block join queries to model parent (the original Jira issue) and
children (each comment) documents. This basic relational structure is
also common in
e-commerce applications, where you have a product (e.g. a specific
shirt) and then individual SKUs (size/color combinations) under
that shirt
- Highlighting
with
PostingsHighlighter
- Faceting, with flat, hierarchical, and dynamic numeric range
fields. Remember you can pick multiple facet values (multi-select) with shift+click!
DrillSideways
facet counts, so you don't lose facet counts of other labels just because you drilled down to one of them
AnalyzingInfixSuggester
for auto-suggest, including near-real-time updates. Suggestions are project specific: if you have drilled down to specific project/s, then the suggestions will only be from those projects, thanks toAnalyzingInfixSuggester
now supporting contexts
- Near real time indexing and searching
WordDelimiterFilter
so camel case tokens are split (try searching for infix)
- Synonyms
- Using expressions to dynamically compute a blend of recency and relevance for the sort order score for hits
Since the initial release of Jirasearch it has seen substantial usage and interest from users and developers. Building this and keeping it running all this time has been an awesome and humbling exercise for me because I get to experience life as a "production" user of our software. At the same time, we all get a nice search UI for finding issues.
Upgrading from Lucene 4.6.x to 6.x
For the past week or so I had another similarly humbling experience,
this time
upgrading Jirasearch
from the very-old Lucene 4.6.x release, to the latest 6.x release.
Small (yet vital!) things changed, such as
the new
requirement to use a special index searcher
with ToParentBlockJoinQuery
, which conflicts with how you
must use DrillSideways
.
I hit this
bug in the infix suggester. Something changed about pure negative
boolean queries, but I am still not sure what (I have worked around it
for now)!
I had already previously upgraded Lucene server to dimensional points so I got that "for free" for the existing numeric fields in Jirasearch.
New Jirasearch features
Besides "merely" upgrading from Lucene 4.6.x to 6.x, and switching all numeric fields to the new dimensional points, I also added some compelling user-visible improvements (thank you to Alexandre Rafalovitch for suggesting some of these, thus kick-starting my unexpectedly challenging upgrade-and-improve effort):
- cutting@apache.org is finally presented as Doug Cutting! Plus, the auto-suggest now works if you type "Doug".
- The new Updated ago facet dimension lets you drill down to issues that have not been updated for some time.
- The new Last comment user facet dimension is the user who last commented on an issue.
- The new Committed by facet dimension lets you drill down to those issues a given developer has committed changes for.
- The Committed paths hierarchical facet dimension, letting you find issues according to which paths in the source tree were changed for that issue, was broken since we switched from Subversion to Git.
- The Infrastructure project issues are now included as well.
- The per-comment text processing sees some minor improvements, e.g. expanding
a referenced user name to their display name,
mapping
commitbot
comment link directly to the change set and including the branch name, plus a few new synonyms (try pnp!)
The new facet fields are especially fun: you can now find issues that you perhaps killed, by drilling down on Updated ago > 1 month ago and Last comment user = you (this was the use case suggested by Alexandre).
Another fun one is to see issues a given developer committed (Committed by) to an unusual part of the source tree (Committed paths), e.g. the issues where I committed changes to Solr for a Lucene Jira issue.
Open source Jirasearch
With this update I am also making all the sources behind jirasearch
open-source under the Apache 2 license, in
the examples/jirasearch
sub-directory
of the luceneserver github project.
While Luceneserver itself is entirely Java, the sources for the Jirasearch application, to extract details of all issues from the Apache Jira instance, to convert those documents into Lucene server documents, to do a full and near-real-time indexing, building suggestest, and the search UI, are entirely Python.
Please note the Python sources are not particularly pretty. Yet, they are functional, and as always: patches welcome!
It's likely I broke things during this upgrade process; please let me know (add a comment here, or shoot me an email) if so.