Tuesday, July 14, 2009

Sun's ZFS filesystem

I've been test driving Sun's (now Oracle's!) OpenSolaris (2009.06) and ZFS filesystem as my home filer and general development machine.

I'm impressed!

ZFS provides some incredible features. For example, taking a snapshot of your entire filesystem is wicked fast. This gives you a "point in time" copy of all files that you can keep around for as long as you want. It's very space efficient because only when a file is changed does the snapshot actually consume disk space (preserving the old copy).

From the snapshot, which is read-only, you can then make a clone that's read-write. This effectively lets you fork your filesystem, which is amazing. Sun builds on this by providing "boot environments", which let you clone your world, boot to it, do all kinds of reckless things, and if you don't like the results, switch back to your current safe world again, no harm done. I used to leave my home filers pretty much untouched once I started using them for fear of screwing something up. Now with boot environments I can freely experiment away.

I have a great many Lucene source code checkouts, to try out ideas, apply patches, etc., and by using ZFS's cloning I can now create a new checkout and apply a patch in only a few seconds. And it's very space efficient because only the changed files in the new checkout consume disk space. Since I'm using an Intel X25 SSD as my primary storage, space efficiency is important. The machine uses Intel's Core i7 920 CPU, which has fabulous concurrency and can run the Lucene unit tests 3X faster than my old machine. This all nets out to wonderful productivity gains.

ZFS also nicely decouples the raw storage device (the "pool"), from filesystems that pull from that storage. For the secondary storage I set up a RAID-Z pool (like raid5, but fixes the "write hole" problem) using 6 of the Western Digital Green Caviar 2TB drives. Be sure to use the WDTLER utility if you use these drives in a RAID array. This gives me 9TB usable space to play with; from here I've created many filesystems that all share the pool.

Performance is excellent: copying a 1TB directory on the RAID-Z pool to another directory on the same pool averages 100 MB/sec.

I also just read this morning that ZFS will add de-duping at the block level, thus making it even more space efficient.

ZFS can provide these features because it has a write-once core: no block is ever overwritten (unless it was already freed). Lucene has the same core approach: no file is ever overwritten in the index. Lucene's transactional semantics derive directly from this as well (though Lucene can't "fork" an index... maybe someday!).

Bye bye Linux, hello Solaris! I only hope this innovation continues now that Oracle has acquired Sun.

No comments:

Post a Comment