bzip2 is hideously slow; while looking around for ways to possibly speed it up I stumbled on pbzip2, which exploits the block-based nature of bzip2 compression to parallelize it for potentially linear speedups with an increased number of CPUs.
There are two downsides: first, there are some compatibility issues with third-party decompressor software (probably resolvable), and second it only scales until you run out of local CPUs.
For kicks I'm prototyping a distributed version, dbzip2. In theory, a lot of fast machines on a LAN should be able to speed things up even further.
I've been testing with a Wikipedia dump file that's about a gigabyte, compressing down to about a quarter that.
On my dual-core G5 Power Mac, regular single-threaded bzip2 compresses the file in about 356 seconds (3.122 MB/s input processing).
pbzip2 set for two threads runs about 50% faster; I clocked it at 234 seconds (4.750 MB/s).
My dbzip2 prototype does similar with two local threads, though a touch less efficiently:
- Wrote 1295 blocks in 250.9 seconds (4.430 MB/s in, 0.988 MB/s out)
And I can squeeze a few more percentage points out by using remote compressors on the other machines I had lying around:
- Wrote 1295 blocks in 188.8 seconds (5.887 MB/s in, 1.313 MB/s out)
Most of the data went to the two local threads on my Power Mac:
- local thread: processed 447 blocks in 188.7 seconds (2.033 MB/s in, 0.458 MB/s out)
- local thread: processed 444 blocks in 188.7 seconds (2.019 MB/s in, 0.451 MB/s out)
My old Linux box, an Athlon XP 2400+ on 100 Mbit ethernet, took a respectable share:
- rdaneel:12345: processed 237 blocks in 188.7 seconds (1.078 MB/s in, 0.238 MB/s out)
CPU usage was pretty high though not maxed out (80-90%), likely due to the tenth of a second spent transferring each 900 KB input block and its compressed output over a 100 Mbit network.
Running the whole file locally it can process 1.344 MB/s.
My old Powerbook (1 GHz G4) and newer Mini (1.5 GHz Core Solo) sit on wireless, which is a lot slower:
- verda-majo.local:12345: processed 61 blocks in 188.7 seconds (0.277 MB/s in, 0.060 MB/s out)
- philo.local:12345: processed 106 blocks in 188.7 seconds (0.482 MB/s in, 0.106 MB/s out)
These two were clearly limited by the slower network, with CPU usage around just 20-30%. Still, every bit helps!
This shows promise for quickly compressing large files when fast machines are available on a fast network. Slow networks strongly reduce the benefits, but on switched gigabit ethernet things should be nicely CPU-limited even with several fast machines.
The main remaining issues: