Changing Bits: October 2010

Monday, October 25, 2010

Our medical system is a house of cards

I just came across this great article about meta-researcher Dr. John Ioannidis. Here's the summary:

Much of what medical researchers conclude in their studies is misleading, exaggerated, or flat-out wrong. So why are doctors—to a striking extent—still drawing upon misinformation in their everyday practice? Dr. John Ioannidis has spent his career challenging his peers by exposing their bad science.

The gist is that modern medical research is deeply flawed and biased such that the "conclusions" that you and I eventually read in the news headlines are often false. I especially love his advice for us all:

Ioannidis suggests a simple approach: ignore them all

This is in fact my approach! I have a simple rule: if it tastes good it's good for you. So I eat plenty of fat, salt, sugar, cholesterol, carbs, etc. I love eggs and cheese and I always avoid low-fat or low-cholesterol foods. I get lots of sun and never use sun screen. I drink coffee and beer, daily. I drink lots of water. I get daily exercise, running and walking. And I avoid hand sanitizers like Purell (I believe commonplace dirt/germs are in fact natural and good for you). I strongly believe humans do not need pills to stay healthy. I don't take a daily vitamin. And I'm very healthy!

This short interview between Discover Magazine and Harvard clinician John Abramson echoes the same core problem. Here's a choice quote:

When you look at the highest quality medical studies, the odds that a study will favor the use of a new drug are 5.3 times higher for commercially funded studies than for noncommercially funded studies.

Unfortunately, the medical world has a deep, deep conflict of interest: healthy people do not generate profits. Capitalism is a horrible match to health care.

So, next time your doctor prescribes a fancy new cool-sounding powerful drug like Alevia or Omosia or Nanotomopia or whatever, try to remember that our medical system is really built on a house of cards. Your doctor, let alone you, cannot possibly differentiate what's true from what's false. Don't trust that large triple-blind random controlled trial that supposedly validated this cool new drug. You are the guinea pig! And it's only when these drugs cause all sorts of problems once they are really tested on the population at large that their true colors are revealed.

Sunday, October 17, 2010

Pics from BBQ after Lucene Revolution

I finally pulled the pics off my camera from last week's BBQ after Lucene Revolution in Boston, where much fun was had! See them here. It was awesome to finally meet everyone!

Saturday, October 9, 2010

Fun with flexible indexing

The Lucene Revolution conference just wrapped up yesterday. It was well attended (~300 or so people). It was great fun to hear about all the diverse ways that Lucene and Solr are being used in the real world.

I gave a talk about flexible indexing, coming in the next major release of Lucene (4.0). Slides are here.

Tuesday, October 5, 2010

Lucene's SimpleText codec

Inspired by this question on the Lucene user's list, I created a new codec in Lucene called the SimpleText codec. The best ideas come from the user's lists!

This is of course only available in Lucene's current trunk, to be eventually released as the next major release (4.0). Flexible indexing makes is easy to swap in different codecs to do the actual writing and reading of postings data to/from the index, and we have several fun codecs already available and more on the way...

Unlike all other codecs, which save the postings data in compact binary files, this codec writes all postings to a single human-readable text file, like this:


field contents
  term file
    doc 0
      pos 5
  term is
    doc 0
      pos 1
  term second
    doc 0
      pos 3
  term test
    doc 0
      pos 4
  term the
    doc 0
      pos 2
  term this
    doc 0
      pos 0
END

The codec is read/write, and fully functional. All of Lucene's unit tests pass (slowly) with this codec (which, by the way, is an awesome way to test your own codecs).

Note that the performance of SimpleText is quite poor, as expected! For example, there is no terms index for fast seeking to a specific term, no skipping data for fast seeking within a posting list, some operations require linear scanning, etc. So don't use this one in production!

But it should be useful for transparency, debugging, learning, teaching or anyone who is simply just curious about what exactly Lucene stores in its inverted index.