Wednesday, July 14, 2010

Moving readVInt to C

By far the hottest spot in Lucene during searching is the method (DataInput.readVInt) that decodes Lucene's variable-length integer representation (vInt). This method is called an insane number of times, while iterating the postings lists (docs, freqs, positions), during query execution.

This representation is not CPU friendly: for every single byte read, the CPU hits a hard-to-predict if statement (testing the high bit). Alternative block-based formats do a good job reducing branches while decoding, such as these encodings (PFOR-DELTA is currently an experimental codec attached as a patch on LUCENE-1410) or Google's Group Varint.

Anyway, I decided to test whether porting readVInt to C would gives us a performance boost. I had known the overhead of JNI was highish in the past, but I was hoping in modern JREs this had been improved.

Unfortunately, it wasn't: I see a ~10%-~40% slowdown, depending on the query. Likely hotspot's inability to inline native methods is also a factor here. Perhaps, if/when we switch to a block-based codec, we'll see a gain with a native implementation, since the amortized overhead of invoking JNI will be lower.

Tuesday, July 13, 2010

Apple censorship, again

It's sickening that Apple thinks it's OK to censor (remove) posts from their forums. It's not the first time they've done this, either.

I consider it entrapment -- Apple hosts these forums, as the obvious place where we all can go to discuss any issues we have. As with any forum, there's a natural implicit assumption of non-interference, much like I expect my ISP not to block my visits to random web-sites or my email provider to block certain emails or the local coffee shop to disallow discussions about certain topics.

But then, suddenly, Apple acts like China: they censor that which they disagree with.

C'mon Apple!

You should be above such draconian behavior. If you disagree with what's being said, play fair! Come out and discuss the topic, explain your viewpoint, convince us.

Saturday, July 10, 2010

What motivates us

Great video, from Daniel Pink, animated by RSA, summarizing his book digging into what motivates people to do things. I find it very accurate about myself, and I expect especially people working in open-source projects will resonate strongly with the surprising findings.

Encoding videos for Apple iPad/iPod, using mencoder

After much searching, poking, iterating, I found a magical mencoder command-line that encodes videos in the right format for the ipad. Debugging this is hard because, if you get something wrong, iTunes simply silently fails to open the video file, giving no specific feedback as to what's wrong. I felt like a monkey on a typewriter...

So here are the options, broken by group. Set the output format to MP4:

-of lavf -lavfopts format=mp4

Scale the video to width 1280, height to match the aspect ratio (if your source video is low resolution, eg DVD, leave this off; you can also crop (-vf crop=W:H:X:Y; run mplayer -vf cropdetect,scale to have it guess the crop for you), or de-interlance (I like -vf pp=lb), or any other video filter):

-vf scale=1280:-3

Use x264 codec for video, with a zillion options (tweak that crf=XX if you want higher quality; lower values of XX give higher quality, but take more space):

-ovc x264 -x264encopts crf=28:vbv_maxrate=1500:nocabac:global_header:frameref=3:threads=auto:bframes=0:subq=6:mixed-refs=0:weightb=0:8x8dct=1:me=umh:partitions=all:qp_step=4:qcomp=0.7:trellis=1:direct_pred=auto

Use faac codec for audio:

-oac faac -faacopts br=160:mpeg=4:object=2:raw -channels 2 -srate 48000

I'm sure many other command-line combinations would work; this was just the first combo I found that works.

If you need to encode for the iPod Touch, just change the -vf scale=1280:-3 to -vf scale=640:-3 (at least this works for recent iPod Touch models).

You can also use the nifty mp4art tool to insert cover art after the encode finishes; this lets you set the image for the movie when browsing on the iPad/iPod or in iTunes, which is vital if young kids will be launching the movies! I use PNG images, which work fine.

Friday, July 9, 2010

Old Orchard Beach

We spent the past half week in Old Orchard Beach, Maine (US), in the Crest Motel. It's a great place! The rooms are clean, the beach is right outside, and it's a short walk to Palace Playland. They have an indoor pool & hot tub, and I'm happy to report that they do not discriminate against kids. The whole area is very family friendly, and the kids had a blast!

But what I loved most was... while we were staying there, they happened to be installing Solar hot water panels! At first we saw random holes drilled in the floor and ceiling, which made all of us very curious. Then the next day these were filled in with insulated pipes. Finally, this morning, we saw the solar panels themselves, being carried up and installed on the roof. The location is excellent -- they have a large roof, and no tall trees anywhere nearby. It looked like a high capacity installation, maybe ~250 square feet. They expect to fully generate all hot water in the summer, and also a sizable portion of the (air) heat in the winter.

I have a small (600W nominal, off-grid) solar electric installation in my house, self-installed. In the summer, it easily offsets the electricity used by our home theater. I involved the kids all throughout the installation, to teach them how important it is to find renewable sources for everything. And whenever they or their friends watch movies, I remind them that it's all powered by sunshine electricity, which of course leads to a great curious discussion and more teachable moments.

So I took this chance to (again) hammer home this vital lesson. And, I'm very happy to see that others are adopting Solar energy!

Communism vs capitialism/democracy

While we in the "free" world like to believe our capitalistic, democratic way is better than the tight-fisted communist approach in, say, China, the situation is not really so clear cut.

The Chinese government has full control to make changes to nearly everything in the country. Yes, sometimes this power is used to awful ends, such as human rights violations and the great firewall of China.

But then, this same unilateral power can lead to great progress, overnight. For example, China plans to add a 5% tax on oil and gas consumption (plus other "raw materials"). The US, in contrast, has for a very long time offered massive tax breaks to the oil companies (this is why the price of our gasoline is horribly low compared to nearly every other country, encouraging us to buy massive, awfully fuel inefficient cars and trucks).

Another example: a few years back, China suddenly required that all cell phone chargers adopt the same standard, so that when people buy new phones the do not need a new charger, thus eliminating a big source of electronics waste. In contrast, in the US with our "new every 2", we have to throw away our chargers every time we upgrade since they rarely interoperate.

A third example is fuel economy standards: China has for a long time had far more stringent requirements than the US.

In contrast, accomplishing these excellent improvements in the US is nearly impossible: each small change requires a tremendous battle through our congress, many members of which are rather blatantly corrupt, accepting all sorts of creative bribes (campaign contributions) from corporations, to buy their vote. And corporate influence got even stronger thanks the recent Supreme Court landmark decision giving corporations much more freedom to influence elections. Finally, congress members, especially these days, seem to vote almost always along party lines rather than what's actually best for our country's future.

Don't get me wrong: net/net I'm quite happy I live in the US, given all the tradeoffs. But, still, we can and should do better, and copying some of China's recent changes would be a great start.

Monday, July 5, 2010

A better grass

This Pearl's premium grass looks awesome: mow only once per month; extremely drought tolerant (seldom or never water); thrives without chemicals. It's the polar opposite of life-support grass, that's the gold standard in today's yards.

Maybe, once our yard is in really bad shape, I'll try this seed, mixed with clover. I want our yard to have zero environmental cost, zero carbon footprint. We currently have just that: we haven't watered our lawn in 2 years; no fertilizer, pesticides; no dethatching, mechanical aeration; very rarely raked, mowed; etc., except, it's slowly dying, and weeds are moving in, because we inherited prior life-support grass.

I'd also like near-zero effort on our part (we don't want to hire a lawn care service because we feel that sets a bad example for our kids), and of course to have decent curb appeal. Surely this is not too much to expect?

I wish there were more companies/people developing environmentally free/friendly alternatives to the standard life-support grass. I'd do a bake-off with these alternatives on different spots in our challenging yard!

Ethernet lightning protection

I'm going to try using this APC ethernet surge suppressor to protect the important devices (desktop, laptop, filer, backups, FIOS) attached to our ethernet network.

Lucene's RAM usage for searching

For fast searching, Lucene loads certain data structures entirely into RAM:

  • The terms dict index requires substantial RAM per indexed term (by default, every 128th unique term), and is loaded when IndexReader is created. This can be a very large amount of RAM for indexes that have an unusually high number of unique terms; to reduce this, you can pass a terms index divisor when opening the reader. For example, passing 2, which loads only every other indexed term, halves the RAM required. But, in tradeoff, seeking to a given term, which is required once for every TermQuery, will become slower as Lucene must do twice as much scanning (on average) to find the term.

  • Field cache, which is used under-the-hood when you sort by a field, takes some amount of per-document RAM depending on the field type (String is by far the worst). This is loaded the first time you sort on that field.

  • Norms, which encode the a-priori document boost computed at indexing time, including length normalization and any boosting the app does, consume 1 byte per field X document used for searching. For example, if your app searches 3 different fields, such as body, title and abstract, then that requires 3 bytes of RAM, per document. These are loaded on-demand the first time that field is searched.

  • Deletions, if present, consume 1 bit per doc, created during IndexReader construction.

Warming a reader is necessary because of the data structures that are initialized lazily (norms, FieldCache). It's also useful to pre-populate the OS's IO cache with those pages that cover the frequent terms you're searching on.

With flexible indexing, available in Lucene's trunk (4.0-dev), we've made great progress on reducing the RAM required for both the terms dict index and the String index field cache (some details here). We have substantially reduced the number of objects created for these RAM resident data structures, and switched to representing all character data as UTF8, not java's char, which halves the RAM required when the character data is simple ascii.

So, I ran a quick check against a real index, created from the first 5 million documents from the Wikipedia database export. The index has a single segment with no deletions. I initialize a searcher, and then load norms for the body field, and populate the FieldCache for sorting by the title field, using JRE 1.6, 64bit:

  • 3.1-dev requires 674 MB of RAM

  • 4.0-dev requires 179 MB of RAM

That's a 73% reduction on RAM required!

However, there seems to be some performance loss when sorting by a String field, which we are still tracking down.

Note that modern OSs will happily swap out RAM from a process, in order to increase the IO cache. This is rather silly: Lucene loads these specific structures into RAM because we know we will need to randomly access them, a great many times. Other structures, like the postings data, we know we will sweep sequentially once per search, so it's less important that these structures be in RAM. When the OS swaps our RAM out in favor of IO cache, it's reversing this careful separation!

This will of course cause disastrous search latency for Lucene, since many page faults may be incurred on running a given search. On Linux, you can fix this by tuning swappiness down to 0, which I try to do on every Linux computer I touch (most Linux distros default this to a highish number). Windows also has a checkbox, under My Computer -> Properties -> Advanced -> Performance Settings -> Advanced -> Memory Usage, that lets you favor Programs or System Cache, that's likely doing something similar.