In actuality, the game is easy to explain: to win, you must score at least 4 points and win by at least 2. Yet in practice, you are supposed to use strange labels like "love" (0 points), "15" (1 point), "30" (2 points), "40" (3 points), "deuce" (3 or more points each, and the players are tied), "all" (players are tied) instead of simply tracking points as numbers, as other sports do.

This is of course wildly confusing to newcomers. Fortunately, the convoluted logic is easy to express as a finite-state automaton (FSA):

The game begins in the left-most (unlabeled) state, and then each time either player 1 (red) or player 2 (blue) scores, you advance to the corresponding state to know how to say the score properly in tennis-speak. In each state, player 1's score is first followed by player 2's; for example "40 30" means player 1 has scored 3 points and player 2 has scored 2 and "15 all" means both players have scored once. "adv 2" means player 2 is ahead by 1 point and will win if s/he scores again.

There are only 20 states, and there are cycles which means a tennis game can in fact go on indefinitely, if the players pass back and forth through the "deuce" (translation: game is tied) state.

This FSA is correct, and if you watch a Wimbledon match, for example, you'll see the game advance through precisely these states.

**Minimization**

Yet for an FSA, merely being correct is not good enough!

It should also strive to be minimal, and surprisingly this FSA is not: if you build this Automaton in Lucene and minimize it, you'll discover that there are some wasted states! This means 20 states is overkill when deciding who won the game.

Specifically, there is no difference between the "30 all" and "deuce" states, nor between the "30 40" and "adv 2" states, nor between the "40 30" and "adv 1" states. From either state in each of these pairs, there is no sequence of player 1 / player 2 scoring that will result in a different final outcome (this is in principle how the minimization process identifies indistinguishable states).

Therefore, there's no point in keeping those states, and you can safely use this smaller 17-state FSA (15% smaller!) to score your tennis games instead:

For example, from "15 30", if player 1 scores, you go straight to "deuce" and don't bother with the redundant "30 30" state.

Another (simpler?) way to understand why these states are wasted is to recognize that the finite state machine is tracking two different pieces of information: first, how many points ahead player 1 is (since a player must win by 2 points) and second, how many points have been scored (since a player must score at least 4 points to win).

Once enough points (4 or more) have been scored by either player, their absolute scores no longer matter. All that matters is the relative score: whether player 1 is ahead by 1, equal, or behind by 1. For example, we don't care if the score is 197 to 196 or 6 to 5: they are the same thing.

Yet, early on, the FSA must also track the absolute scores, to ensure at least 4 points were scored by the winner. With the original 20-state FSA, the crossover between these two phases was what would have been "40 40" (each player scored 3 points). But in the minimal machine, the crossover became "30 30" (each player scored 2 points), which is safe since each player must still "win by 2" so if player 1 scores 2 points from "30 30", that means player 1 scored 4 points overall.

FSA minimization saved only 3 states for the game of tennis, resulting in a 15% smaller automaton, and maybe this simplifies keeping track of scores in your games by a bit, but in other FSA applications in Lucene, such as the analyzing suggester, MemoryPostingsFormat and the terms index, minimization is vital since it saves substantial disk and RAM for Lucene applications!

This was brilliant, I really liked this post! Awesome reduction.

ReplyDeleteVery interesting article, and nice state diagrams -- do I detect the heady aroma of AT&T Graphviz?

ReplyDeleteHi Anonymous,

DeleteIndeed, I LOVE Graphviz ... I use "dot" all the time when debugging tricky Lucene FSA/FST issues.

Awesome..MashaALLAH Nice work

ReplyDeleteHelped a lot in my assignment

beautiful article really appreciated

ReplyDeletehey can you provide me a turing machine state diagram for the same .thanks

ReplyDeleteI suspect it would look very similar to the FST above?

DeleteThank you for another excellent post. Where else could anybody get that kind of info in such a perfect way of writing? I have a presentation next week, and I am on the look for such information.

ReplyDelete