Tuesday, June 29, 2010

Lucene in Action 2nd Edition is done!

Lucene in Action, 2nd Edition, is finally done: the eBook is available now, and the print book should be released on July 8th!

The source code that goes along with the book is freely available and free to use (Apache Sofware License 2.0), and there are two free chapters (Chapter 1, and Chapter 3). There is also a free green paper excerpted from the book, Hot Backups with Lucene, as well as the section describing CLucene, the C/C++ port of Lucene.

Writing is the best way to learn something -- because of this book I've learned all sorts of nooks and crannies in Lucene that I otherwise would not have explored for quite some time.

5 comments:

  1. Thanks for your wonderful work!

    I have been waiting for the print book of the second edition for the past one year. I am very glad that I have it in my hands today.

    I am currently working on a biological full-text search engine called Textpresso and implementing a lucene-based system for it. I am half way done with the project. I believe the book will help add many more features and make the searches faster.

    Thanks for all your hard work on the book and contributions to lucene! Long live open source community and developers! :-)

    ReplyDelete
  2. Arun: thanks! And, thank you for using Lucene in such interesting ways -- Textspresso looks very nice. It's the interesting applications/users of Lucene that drive its relentless progress!

    ReplyDelete
  3. Hi Mike,

    I have a question about multi-word synonym search with lucene. I am not sure whether this is the right place to ask this question though, so please let me know if I need to put this elsewhere.

    In textpresso, we have so-called "categories" or "concepts". Each category is a bag of words and phrases. Textpresso allows users to search for categories separately or along with keywords. Since many of the categories are huge ("gene" category for one species has 533K entries in it), doing a search-time query expansion for them appears inefficient/infeasible. So the synonym expansion has to happen at indexing time. While lucene's SynonymAnalyzer seems to handle single-word synonyms, it looks to me like lucene does not handle multi-word/phrase synonyms. I read your comment at
    https://issues.apache.org/jira/browse/LUCENE-1622
    (though I could not understand it fully) and it appears that this is an unsolved issue.

    Can you point me to any resource that can provide help with this?

    Thanks,
    Arun.

    ReplyDelete
  4. Actually, Solr's SynonymFilter can handle multi-word synonyms; I would start with that?

    Also, it's better to ask questions on the solr/lucene user lists (solr-user@lucene.apache.org, java-user@lucene.apache.org). You'll get more responses...

    ReplyDelete
  5. Thanks for your reply. I will look into Solr.

    I will take help from the mailing lists in the future.

    ReplyDelete