Saturday, January 14, 2012

ToChildBlockJoinQuery in Lucene

In my last post I described a known limitation of BlockJoinQuery: it joins in only one direction (from child to parent documents). This can be a problem because some applications need to join in reverse (from parent to child documents) instead.

This is now fixed! I just committed a new query, ToChildBlockJoinQuery, to perform the join in the opposite direction. I also renamed the previous query to ToParentBlockJoinQuery.

You use it just like BlockJoinQuery, except in reverse: it wraps any other Query matching parent documents and translates it into a Query matching child documents. The resulting Query can then be combined with other queries against fields in the child documents, and you can then sort by child fields as well.

Using songs and albums as an example: imagine you index each song (child) and album (parent) as separate documents in a single document block. With ToChildBlockJoinQuery, you can now run queries like:
  albumName:thunder AND songName:numb
or
  albumName:thunder, sort by songTitle
Any query with constraints against album and/or song fields will work, and the returned hits will be individual songs (not grouped).

ToChildBlockJoinQuery will be available in Lucene 3.6.0 and 4.0.

4 comments:

  1. Michael,
    I did some amendments in tests for ToChildBJQ, couldn't you have a look into https://issues.apache.org/jira/browse/SOLR-3076?focusedCommentId=13226820&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13226820

    --
    Mikhail

    ReplyDelete
  2. Thanks Mikhail! I'll have a look...

    ReplyDelete
  3. It might be worth noting that you have to add the child documents *before* the parent document, otherwise an assertion will fail in the BlockJoinQuery.

    ReplyDelete
  4. Right, that's very important! I believe the Javadocs state this?

    I think it would help if we created a DocumentBlock (DocumentGroup? ParentChildDocuments?) that held the parent and chlidren... this would be much less error prone because when you add that to the index it would iterate in the proper order (children then parent).

    ReplyDelete