For some reason having to do
with the medieval
French,
the scoring
system for tennis is very strange.
In actuality, the game is easy to explain: to win, you must score at
least 4 points and win by at least 2. Yet in practice, you are
supposed to use strange labels like "love" (0 points), "15" (1 point),
"30" (2 points), "40" (3 points), "deuce" (3 or more points each, and
the players are tied), "all" (players are tied) instead of simply
tracking points as numbers, as other sports do.
This is of course wildly confusing to newcomers. Fortunately, the
convoluted logic is easy to express as
a finite-state
automaton (FSA):
The game begins in the left-most (unlabeled) state, and then each time
either player 1 (red) or player 2
(blue) scores, you advance to the
corresponding state to know how to say the score properly in
tennis-speak. In each state, player 1's score is first followed by
player 2's; for example "40 30" means player 1 has scored 3 points and
player 2 has scored 2 and "15 all" means both players have scored
once. "adv 2" means player 2 is ahead by 1 point and will win if s/he
scores again.
There are only 20 states, and there are cycles
which means a tennis game can in fact go on indefinitely, if the
players pass back and forth through the "deuce" (translation: game is
tied) state.
This FSA is correct, and if you watch
a Wimbledon match,
for example, you'll see the game advance through precisely these
states.
Minimization
Yet for an FSA, merely being correct is not good enough!
It should
also strive to be minimal, and surprisingly this FSA is not: if you
build this Automaton in Lucene
and minimize it, you'll discover that there are some wasted states!
This means 20 states is overkill when deciding who won the game.
Specifically, there is no difference between the "30 all" and "deuce"
states, nor between the "30 40" and "adv 2" states, nor between the
"40 30" and "adv 1" states. From either state in each of these pairs,
there is no sequence of player 1 / player 2 scoring that will result
in a different final outcome (this is in principle how the minimization
process identifies indistinguishable states).
Therefore, there's no point in keeping those states, and you can
safely use this smaller 17-state FSA (15% smaller!) to score your
tennis games instead:
For example, from "15 30", if player 1 scores, you go straight to
"deuce" and don't bother with the redundant "30 30" state.
Another (simpler?) way to understand why these states are wasted is to
recognize that the finite state machine is tracking two different pieces
of information: first, how many points ahead player 1 is (since a
player must win by 2 points) and second, how many points have been
scored (since a player must score at least 4 points to win).
Once enough points (4 or more) have been scored by either player,
their absolute scores no longer matter. All that matters is the
relative score: whether player 1 is ahead by 1, equal, or behind by 1.
For example, we don't care if the score is 197 to 196 or 6 to 5: they
are the same thing.
Yet, early on, the FSA must also track the absolute scores, to ensure
at least 4 points were scored by the winner. With the original
20-state FSA, the crossover between these two phases was what would
have been "40 40" (each player scored 3 points). But in the minimal
machine, the crossover became "30 30" (each player scored 2 points),
which is safe since each player must still "win by 2" so if player 1
scores 2 points from "30 30", that means player 1 scored 4 points
overall.
FSA minimization saved only 3 states for the game of tennis, resulting
in a 15% smaller automaton, and maybe this simplifies keeping track of
scores in your games by a bit, but in
other FSA
applications in Lucene, such
as the
analyzing
suggester, MemoryPostingsFormat
and the terms index, minimization is vital since it saves substantial disk and RAM for
Lucene applications!
This was brilliant, I really liked this post! Awesome reduction.
ReplyDeleteVery interesting article, and nice state diagrams -- do I detect the heady aroma of AT&T Graphviz?
ReplyDeleteHi Anonymous,
DeleteIndeed, I LOVE Graphviz ... I use "dot" all the time when debugging tricky Lucene FSA/FST issues.
Awesome..MashaALLAH Nice work
ReplyDeleteHelped a lot in my assignment
beautiful article really appreciated
ReplyDeletehey can you provide me a turing machine state diagram for the same .thanks
ReplyDeleteI suspect it would look very similar to the FST above?
Delete