Wednesday, September 30, 2009

A better grass?

I came across this article about a newly developed grass that does not require the normal intense life-support we all have come to assume is "normal".

You don't have to water it (after the initial seeding), nor apply pesticides nor fertilizer. And it only requires mowing once per month instead of the typical once per week schedule for life-support grass. It was designed to simply survive, naturally, in our challenging northeast climate.

I've always felt that such a grass must exist, but that the existing grass seed companies would not be interested in pursuing it. See, if the grass simply takes care of itself, we all will buy much less grass seed over time. The lawn care service industry will see much less business, mowing our lawns monthly instead of weekly. Manufacturers of pesticides and fertilizers and lawn care equipment will see less demand, etc. It's quite clearly not in the interest of the lawn care industry to pursue nor allow such innovation.

I sure hope this grass is successful, but the pessimist in me expects that in a few years time, either this company will have been sued out of existence, or the rights to this grass will have been purchased for a princely sum, and then promptly shelved, by one of the big established players in the grass seed industry.

For better or worse, capitalism favors waste in mature markets.

Thursday, September 24, 2009

Today, on my morning run, I saw a student walking, late for the bus. The bus saw her walking, way down the road and so stopped and waited for probably two minutes or so for her to catch up and get on.

This might seem like only reasonable behavior, on the bus driver's part. S/he was being nice, right?

As crazy as it sounds, while it was a nice thing to do, I don't think the bus should have stopped. Here's why.

It sends the message that one student's inability to be on time is allowed to cut into the time the rest of the students get at school. The needs of the one outweigh the needs of the many (thank you Spock). It's only two minutes, but if this happens a few times on the route, day in and day out, that adds up to net/net less time at school for all the kids.

The rest of the students, who made the bus on time, probably having rushed through their morning at home to do so, pay the price for those students who can't make the bus on time. They will conclude that they, too, can be a bit late and the bus will wait. Why bother rushing to be on time? Rather than being taught that they should try hard to make the bus on time, to take responsibility for not making others wait, they are taught the reverse.

Finally, seeing the bigger picture, this teaches kids that the world will stop and wait for them. Make up for their faults. Be forgiving. That you need not try very hard for things because the rest of the world will compensate. You need not take responsibility. It ties right into the dangerous sense of entitlement that many kids seem to have now. For better or worse, the world simply is not like that once you grow up.

She should have simply missed the bus and learned a good lesson.

Saturday, September 5, 2009

Fun questions

Here are two fun questions I've [temporarily] stumped my kids on:
  • How can gravity make something go up?
  • How can the moon get you wet?

Tuesday, September 1, 2009

Spell correction

Spell correction is a challenging feature for search engines. Unfortunately, it's also crucial: mis-spelling is rampant when users run searches. In part this is because we all can't remember how to spell, and that's no wonder: the number of English words today is 5X what it was in Shakespeare's time! But it's also because we are simply in a hurry, or, lazy, and make many typos.

I rely on aspell when I'm using emacs. Modern web browsers and word processors check the spelling of all text you enter. Web-side search engines have excellent spell correction; in fact, I no longer bother to correct my typos when entering a search. I've often wondered whether such "crutches" of our modern world are in fact weakening our minds and perhaps causing our language to further evolve? For example, I wonder how Microsoft Word's often wrong (in my experience) grammar checker has crimped "modern" writing.

My Chemistry teacher in high school refused to allow us to use calculators during our tests, for fear that we would lose our ability to do math with only basic tools (paper, pencil, brain, hands). My Physics teacher did the opposite, for the reverse fear that the distraction of doing basic math would take precious time and thought away from actually thinking about how to solve the problems. Who's right?

Google clearly sets the gold standard for respelling, that any search engine is now required to live up to. If you don't match that high bar, users are automatically disappointed. And you really don't want to disappoint your users: it's nearly impossible to get them to try out your new application, and, they often don't give second chances.

For most approaches to spell correction, the more data you throw at them the better they perform. If you have lots of queries coming in, you can use that as your sole source. Google of course has tons of queries to tap into. If you are less fortunate, you can use your index/documents as your source. Both of these approaches assume most people know how to spell well! The assumption seems to hold, for now, but I have to wonder, as we all lean on this crutch and become worse at spelling with time, won't this eventually undermine Google's approach? No worries; Google will adapt. This is not unlike investing in index funds: that approach only works well if relatively few people do it.

Lucene's basic spellchecker package, under contrib, which requires you to provide a Dictionary of "known words", allows you to derive these words from your search index. It has some limitations: it can only do context-free correction (one word at once, independent of all other words in the query); it doesn't take word frequency in the index into account when deriving the index (so if a typo gets into your index, which can easily happen, you could end up suggesting that typo!); etc. But it does provide a pluggable distance measure for picking the best candidate. It's a good start.

One particularly sneaky feature to get right is spell correction in the context of entitlements; my post this morning on Lucene's user list raises this problem in a real use case (single index to search multiple user's emails). Entitlements means restricting access for certain users to certain documents. For example, you could have a large search index containing all documents from your large intranet, but because of security on the intranet, only certain users are allowed to access certain documents.

Lucene makes it easy to implement entitlements during searching, by using static (based solely on what's indexed) or dynamic (based on some "live" external source at search time) filtering.

However, properly doing spell correction in the presence of entitlements is dangerous. If you build a global lexicon based on your index, that lexicon can easily "bleed" entitlements when there are terms that only occur in documents from one entitlement class. This might be acceptable for context-free spell correction, but if your spell correction has context (can suggest whole phrases at a time) you could easily bleed a very dangerous phrase (eg, "Bob was fired") by accident.

So, you might choose to splinter your spell correction dictionary by user class, but that could result in far too little data per user class. I'm not sure how to solve it, well; it's a challenging problem.

I hope I haven't mis-spelled any words here!