tag:blogger.com,1999:blog-8623074010562846957.post8583646636049186013..comments2023-09-01T03:38:08.236-04:00Comments on Changing Bits: Finite State Transducers, Part 2Michael McCandlesshttp://www.blogger.com/profile/04277432937861334672noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-8623074010562846957.post-45054573435527103132012-11-18T18:33:08.060-05:002012-11-18T18:33:08.060-05:00Hi KP,
I would expect FSTs for key/value stores s...Hi KP,<br /><br />I would expect FSTs for key/value stores should be a great fit.<br /><br />We saw a big boost in performance when we switched to using FST for primary key lookups in our nightly benchmark ( http://people.apache.org/~mikemccand/lucenebench/PKLookup.html ) -- that H annotation was the switch to MemoryPostingsFormat which stores all terms + postings as an FST.Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-60743044849114078352012-11-18T17:21:25.093-05:002012-11-18T17:21:25.093-05:00@Mike: Thank you for the intro to FSTs. =) We did...@Mike: Thank you for the intro to FSTs. =) We didn't learn about them in school, heh.<br /><br />I'd love to hear your thoughts on using FSTs as more general in-memory indices in areas such as key/values stores, databases, etc., especially in ones where the keys are stored sorted on-disk.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-86093420179435380612011-08-01T08:29:22.262-04:002011-08-01T08:29:22.262-04:00Hi Denis,
That sounds like an exciting use for FS...Hi Denis,<br /><br />That sounds like an exciting use for FSTs!<br /><br />Unfortunately, the FST APIs are built as part of lucene's JAR; they are not separated out into a separate library / stable APIs, and likely won't be any time soon because the APIs are very much in-flux still.Michael McCandlesshttps://www.blogger.com/profile/04277432937861334672noreply@blogger.comtag:blogger.com,1999:blog-8623074010562846957.post-57100316082788251832011-07-30T23:47:49.079-04:002011-07-30T23:47:49.079-04:00It's very interesting. Is there some way to us...It's very interesting. Is there some way to use your FSM implementation without Lucene itself? I'm interested, because I am writing dictionary based lemmatizer for russian language. Stemmers works not so well for russian, because it's very complicated language with very rich flexion model. And so I need some memory efficient data structure which allows me to map char sequences to their ordinal lemma number. I think FST would help me a lot.Anonymoushttps://www.blogger.com/profile/01059366296642454123noreply@blogger.com