You are here
Compression & Decompress Of A Stream
So far in Python I had not found a good method / module for performing compression and decompression of data as streams; most tools required files to be compressed which has some obvious limitations. But then I saw a mention of pyLZMA roll by. It supports compression and decompression of streams using the Lempel–Ziv–Markov chain algorithm. The license of the module is LGPL-2.1; not MIT, but at least it is "Lesser" GPL'd. I've taken it for a spin and it seems to successfully compress and decompress all the data I've thrown at it (remember to always checksum your data).
import pylzma, hashlib
# Calculate the SHA checksum for our input file
i = open('Brighton.jpg', 'rb')
h1 = hashlib.sha1()
while True:
tmp = i.read(1024)
if not tmp: break
h1.update(tmp)
h1 = h1.hexdigest()
print 'Input SHA Checksum: {0}'.format(h1)
# Compress the input file (as a stream) to a file (as a stream)
o = open('compressed.lzma', 'wb')
i.seek(0)
s = pylzma.compressfile(i)
while True:
tmp = s.read(1)
if not tmp: break
o.write(tmp)
o.close()
i.close()
# Decomrpess the file (as a stream) to a file (as a stream)
i = open('compressed.lzma', 'rb')
o = open('decompressed.raw', 'wb')
s = pylzma.decompressobj()
while True:
tmp = i.read(1)
if not tmp: break
o.write(s.decompress(tmp))
o.close()
i.close()
# Check the decompressed file
i = open('decompressed.raw', 'rb')
h2 = hashlib.sha1()
while True:
tmp = i.read(1024)
if not tmp: break
h2.update(tmp)
h2 = h2.hexdigest()
print 'Result SHA Checksum: {0}'.format(h2)
if (h1 == h2): print 'OK!'
Of course a JPEG file doesn't compress much, but that makes it an even better test case.