Tuesday, September 28, 2010

Track your home's live electricity usage with Python

Modern electric meters (the one attached to the outside of your house) emit an IR flash for each watt-hour consumed, through the port on the top of the meter. It turns out, it's easy to detect this flash, decode it into "live" power usage, and make cool charts like this:

That's live KW on the Y axis and time on the X axis.

The flash, at least for my meter, appears to have high temporal accuracy, meaning it flashes precisely as you cross 1.000 WH. This is great, since it enables accurate, real-time power usage. For example, when I turn on the lights in my home office, I quickly see the power jump by ~65 watts, and then drop again when I turn the lights off.

This is a fun way to track down which appliances or crazy computers or parasitic drains are using up so much electricity in your house!

I've gone through several designs for this over the years. My last attempt was destroyed when lightning hit my house (there's always a silver lining!), so, I decided to try something new this time and I'm very happy with the approach since it's much simpler, and, moves the pulse detection into Python.

I use a trivial analog circuit to detect the IR pulse, consisting of two 100K resistors in series and an IR photodiode in parallel with one of the resistors. Make sure you get the polarity right on the photodiode otherwise it won't work! Mount the photodiode on your power meter, aligned so that it "sees" each IR pulse. I also added a small (0.01 uF) ceramic capacitor in parallel with the same resistor, to suppress transient EM fields otherwise picked up by the relatively long wires out to my meter.

Finally, I use this simple USB audio adapter to move the problem into the digital domain, connecting to the ends of the two series resistors to the mic input to use it for the A/D conversion. This USB audio adapter drives the mic input with ~4.0V bias voltage, which is great since otherwise you'd need an external source.

When there's no pulse, the photo-diode acts roughly like an open switch, meaning there is a fixed 200K resistive load on the mic input. When a pulse happens (mine seem to last for ~10 msec), the photo-diode acts roughly like a closed switch, suddenly dropping the series resistance to 100K. This drop causes a very detectible pulse (strongly negative then strongly positive) on the mic input's voltage, I suspect because there's a capacitor behind the bias voltage (but: I am no analog circuit engineer, so this is speculation!).

You can plug this USB audio adapter into any computer; I use a Sheeva plug computer (delightful device, very low power -- I have three!). I record the digital samples (arecord works well, at a 2 Khz rate) and decode the samples in Python to detect a pulse whenever the value drops below -1000. You can easily compute the live KW based on the time between two adjacent pulses, push this into a database, and build graphs on top of this using Google's visualization APIs (I use dygraphs).

My last approach didn't have nearly the temporal accuracy (ie, it smoothed heavily across time), which masks interesting things. For example, now I can tell the difference between a resistive load (the coffee maker, oven, crockpot) and an inductive load (refrigerator compressor, vacuum cleaner) because the inductive load has a huge spike in the beginning, as the motor consumes lots of power trying to spin up.

Friday, September 17, 2010

Lucene's indexing is fast!


Wikipedia periodically exports all of the content on their site, providing a nice corpus for performance testing. I downloaded their most recent English XML export: it uncompresses to a healthy 21 GB of plain text! Then I fully indexed this with Lucene's current trunk (to be 4.0): it took 13 minutes and 9 seconds, or 95.8 GB/hour -- not bad!

Here are the details: I first pre-process the XML file into a single-line file, whereby each doc's title, date, and body are written to a single line, and then index from this file, so that I measure "pure" indexing cost. Note that a real app would likely have a higher document creation cost here, perhaps having to pull documents from a remote database or from separate files, run filters to extract text from PDFs or MS Office docs, etc. I use Lucene's contrib/benchmark package to do the indexing; here's the alg I used:

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer

content.source = org.apache.lucene.benchmark.byTask.feeds.LineDocSource
docs.file = /lucene/enwiki-20100904-pages-articles.txt

doc.stored = true
doc.term.vector = false
doc.tokenized = false
doc.body.stored = false
doc.body.tokenized = true

log.step.AddDoc=10000

directory=FSDirectory
compound=false
ram.flush.mb = 256

work.dir=/lucene/indices/enwiki

content.source.forever = false

CreateIndex

{ "BuildIndex"
[ { "AddDocs" AddDoc > : * ] : 6
- CloseIndex
}

RepSumByPrefRound BuildIndex

There is no field truncation taking place, since this is now disabled by default -- every token in every Wikipedia article is being indexed. I tokenize the body field, and don't store it, and don't tokenize the title and date fields, but do store them. I use StandardAnalyzer, and I include the time to close the index, which means IndexWriter waits for any running background merges to complete. The index only has 4 fields -- title, date, body, and docid.

I've done a few things to speed up the indexing:
  • Increase IndexWriter's RAM buffer from the default 16 MB to 256 MB
  • Run with 6 threads
  • Disable compound file format
  • Reuse document/field instances (contrib/benchmark does this by default)
Lucene's wiki describes additional steps you can take to speed up indexing.

Both the source lines file and index are on an Intel X25-M SSD, and I'm running it on a modern machine, with dual Xeon X5680s, overclocked to 4.0 Ghz, with 12 GB RAM, running Fedora Linux. Java is 64bit 1.6.0_21-b06, and I run as java -server -Xmx2g -Xms2g. I could certainly give it more RAM, but it's not really needed. The resulting index is 6.9 GB.

Out of curiosity, I made a small change to contrib/benchmark, to print the ingest rate over time. It looks like this (over a 100-second window):



Note that a large part (slightly over half!) of the time, the ingest rate is 0; this is not good! This happens because the flushing process, which writes a new segment when the RAM buffer is full, is single-threaded, and, blocks all indexing while it's running. This is a known issue, and is actively being addressed under LUCENE-2324.

Flushing is CPU intensive -- the decode and reencode of the great many vInts is costly. Computers usually have big write caches these days, so the IO shouldn't be a bottleneck. With LUCENE-2324, each indexing thread state will flush its own segment, privately, which will allow us to make full use of CPU concurrency, IO concurrency as well as concurrency across CPUs and the IO system. Once this is fixed, Lucene should be able to make full use of the hardware, ie fully saturate either concurrent CPU or concurrent IO such that whichever is the bottleneck in your context gates your ingest rate. Then maybe we can double this already fast ingest rate!

Monday, September 13, 2010

Fast search filters using flex

A filter in Lucene is a bit set that restricts the search space for any query; you pass it into IndexSearcher's search method. It's effective for a number of use cases, such as document security, index partitions, facet drill-down, etc.

To apply a filter, Lucene must compute the intersection of the documents matching the query against the documents allowed by the filter. Today, we do that in IndexSearcher like this:

while (true) {
if (scorerDoc == filterDoc) {
// Check if scorer has exhausted, only before collecting.
if (scorerDoc == DocIdSetIterator.NO_MORE_DOCS) {
break;
}
collector.collect(scorerDoc);
filterDoc = filterIter.nextDoc();
scorerDoc = scorer.advance(filterDoc);
} else if (scorerDoc > filterDoc) {
filterDoc = filterIter.advance(scorerDoc);
} else {
scorerDoc = scorer.advance(filterDoc);
}
}

We call this the "leapfrog approach": the query and the filter take turns trying to advance to each other's next matching document, often jumping past the target document. When both land on the same document, it's collected.

Unfortunately, for various reasons this implementation is inefficient (these are spelled out more in LUCENE-1536):
  1. The advance method for most queries is costly.
  2. The advance method for most filters is usually cheap.
  3. If the number of documents matching the query is far higher than
    the number matching the filter, or vice versa, it's better to drive
    the matching by whichever is more restrictive.
  4. If the filter supports fast random access, and is not super
    sparse, it's better to apply it during postings enumeration, like
    deleted docs.
  5. Query scorers don't have a random access API, only .advance(),
    which does unecessary extra work .next()'ing to the next matching
    document.
To fix this correctly, Lucene really needs an optimization stage, much like a modern database, which looks at the type and structure of the query and filter, as well as estimates of how many documents will match, and then picks an appropriate execution path. Likely this will be coupled with source code specilization/generation (LUCENE-1594) to write dedicated java code to execute the chosen path. Some day we'll get to that point!

Until then, there in a simple way to get a large speedup in many cases, addressing the 4th issue above. Prior to flexible indexing, when you obtained the postings enumeration for documents matching a given term, Lucene would silently filter out deleted documents. With flexible indexing, the API now allows you to pass in a bit set marking the documents to skip. Normally you'd pass in the IndexReader's deleted docs. But, with a simple subclass of FilterIndexReader, it's possible to use any filter as the documents to skip.

To test this, I created a simple class, CachedFilterIndexReader (I'll attach it to LUCENE-1536). You pass it an existing IndexReader, plus a Filter, and it creates an IndexReader that filters out both deleted documents and documents that don't match the provided filter. Basically, it compiles the IndexReader's deleted docs (if any), and the negation of the incoming filter, into a single cached bit set, and then passes that bit set as the skipDocs whenever postings are requested. You can then create an IndexSearcher from this reader, and all searches against it will be filtered according to the filter you provided.

This is just a prototype, and has certain limitations, eg it doesn't implement reopen, it's slow to build up its cached filter, etc.

Still, it works very well! I tested it on a 10M Wikipedia index, with a random filter accepting 50% of the documents:

QueryQPS DefaultQPS Flex% change
united~0.719.9519.25-3.5%
un*d43.1952.2120.9%
unit*21.5330.5241.8%
"united states"6.128.7442.9%
+united +states9.6814.2347.0%
united states7.7114.5688.9%
states15.7336.05129.2%

I'm not sure why the fuzzy query got a bit slower, but the speedups on the other queries are awesome. However, this approach is actually slower if the filter is very sparse. To test this, I ran just the TermQuery ("states"), against different filter densities:

The cutover, for TermQuery at least, is somewhere around 1.1%, meaning if the filter accepts more than 1.1% of the index, it's best to use the CachedFilterIndexReader class; otherwise it's best to use Lucene's current implementation.

Thanks to this new flex API, until we can fix Lucene to properly optimize for filter and query intersection, this class gives you a viable, fully external, means of massive speedups for non-sparse filters!