Have you always wanted
your very
own Lucene
finite state transducer (FST) but you couldn't figure out how to
use Lucene's crazy APIs?
Then today is your lucky day! I just built
a simple web
application that creates an FST from the input/output strings that
you enter.
If you just want a finite state automaton (no outputs) then enter only
inputs, such
as this example:
If all of your outputs
are non-negative
integers then the FST will use numeric outputs, where you sum up
the outputs as you traverse a path to get the final output:
Finally, if the outputs are non-numeric then they are treated as
strings, in which case you concatenate as you traverse the path:
The red arcs are the ones with the NEXT optimization: these arcs do
not store a pointer to a node because their to-node is the very next
node in the FST. This is a good optimization: it generally results in
a
large reduction of the FST size. The bolded arcs tell you the
next node is final; this is most interesting when a prefix of another
input is accepted, such as this example:
Here the "r" arc is bolded, telling you that "star" is accepted.
Furthermore, that node following the "r" arc has a final output,
telling you the overall output for "star" is "abc".
The web app is a simple Python WSGI app; source code
is here.
It invokes a simple Java tool as a subprocess; source code (including
generics violations!)
is here.
Hello Mr McCandless,
ReplyDeleteNice blog! Is there an email address I can contact you in private?
Thanks,
Eleftheria Kiourtzoglou
Head of Editorial Team
Java Code Geeks
email: eleftheria[dot]kiourtzoglou[at]javacodegeeks[dot]com
is it possible to optimize the chains "cde" and "xcde" of the entries "abxcdey abicdej" into one "cde"?
ReplyDeletelook: http://examples.mikemccandless.com/fst.py?terms=abxcdey%0D%0Aabicdej&cmd=Build+it%21
No, FSTs cannot do that, because those two "cde" carry different state since they complete to different suffixes.
DeleteIf they both completed to the same set of suffixes then "cde" is shared, e.g.:
http://examples.mikemccandless.com/fst.py?terms=abxcdey%0D%0Aabicdej%0D%0Aabxcdej%0D%0Aabicdey&cmd=Build+it%21
Excellent article. Any thoughts how this can be used for key/value store kind of application
ReplyDeleteFSTs can easily store key/value pairs, since they really act like a SortedMap, however this is only "useful" if 1) you can fit the entire map into RAM, and 2) the keys have commonalities to them (e.g. words from natural language) in which case they compress well.
DeleteHello Michael, how are you?
ReplyDeleteHope you can help me. In the Lucene API have the ability to change a document, removes and adds the document. I have the need to add/remove a term by docId/field. There is the possibility to perform the link between a term with a field and its existing document? (field -> terms -> term -> DocIds)
writer.removeTerm (docId, field, term);
writer.addTerm (docId, field, term);
thank you,
Marcio
Hi Marcio,
DeleteCan you send this question to java-user@lucene.apache.org instead?
Thanks.
Thanks too!
DeleteHello Mr. McCandless,
ReplyDeleteJava Tool link now points to a 404 not found page. Can you give me a valid link?
Your posts about FST really help me a lot. Thank you.
Hi Vicky,
DeleteIt seems to be working now; try again?
Hi,
ReplyDeletevery interesting article. While the web app is still working, the source code is no longer accessible (404-error). Any chance to fix that?
Thanks, Daniel
Hi Anonymous,
DeleteWoops, thanks for notifying me ... I just fixed those links so they should now work again!
nice article...btw, could you please guide me how to run the web app and java tool (both source codes are attached by you) in my Windows 7 system?...pls mail me at manabs21@gmail.com
ReplyDeleteHi manao',
ReplyDeleteI don't use Windows so I really can't help much here, but the code (Java, Python) should be portable... or you can just use the instance that I keep running at http://examples.mikemccandless.com/fst.py
then please guide me how to run that on Ubuntu(I think you use that platform) ..I would be grateful to you...btw, http://examples.mikemccandless.com/fst.py link is not working.
DeleteHave a look at fstApp.py: this is a standard Python WSGI app. You can run it directly with python for testing, or you can run it e.g. with mod_wsgi using Apache (this is how I run it on the public site).
Delete