I'm glad you found LZ4 guy's blog -- the both of you have been an inspiration to me (I've started up my fastest-decompression-on-a-4MHz-8088 hobby again).
Just to clarify the license situation : LZ4 code is indeed BSD, and this should be clearly written at the top of each source file (LZ4.h and LZ4.c).
Alongside with LZ4 is provided an "example demo program" (main.c), which is not part of LZ4. It just sits there to provide a quick example to any programmer willing to experiment. The demo program license is GPL.
Cool, thanks for clarifying, BSD is a good license.
May 21, 2011 at 10:27 AM
Anonymous said...
Anyway, I don't think you actually want to use it because it's hard to imagine a scenario where it's the right choice; if you want fast symmetric coding, LZP is probably better; if you don't care so much about encode time then something that spends more time on finding good matches would be a better choice. Good comparisom with other fast codecs LZ4 has wicked fast decompression and OK strength.
Yeah, you're right, I'll amend that, it actually is on the "pareto frontier" for space/speed.
May 24, 2011 at 12:20 PM
LZP = String match compression using some predictive context to reduce the set of strings to match
LZP1 = variant of LZP without any entropy coding
I've just done a bunch of LZP1 variants and I want to quickly describe them for my reference.
In general LZP works thusly :
Make some context from previous bytes
Use context to look in a table to see a set of previously seen pointers in that context
(often only one, but maybe more)
Encode a flag for whether any match, which one, and the length
If no match, send a literal
Typically the context is made by hashing some previous bytes, usually with some kind of shift-xor hash.
As always, larger hashes generally mean more compression at the cost of more memory. I usually use a 15 bit
hash, which means 64k memory use if the table stores 16 bit offsets rather than pointers.
Because there's no entropy coding in LZP1, literals are always sent in 8 bits.
Generally in LZP the hash table of strings is only updated at literal/match decision points - not for all bytes
inside the match. This helps speed and doesn't hurt compression much at all.
Most LZP variants benefit slightly from "lazy parsing" (that is, when you find a match in the encoder, see if it's better
to instead send a literal and take the match at the next byte) , but this hurts encoder speed.
LZP1a : Match/Literal flag is 1 bit (eight of them are sent in a byte). Single match option only. 4 bit match length, if match length is >= 16
then send full bytes for additional match length. This is the variant of LZP1 that I did for Clariion/Data General
for the Pentium Pro.
LZP1b : Match/Literal is encoded as 0 = LL, 10 = LM, 11 = M (this is the ideal encoding if literals are twice
as likely as matches) ; match length is encoded as 2 bits, then if it's >= 4 , 3 more bits, then 5 more bits,
then 8 bits (and after that 8 more bits as needed). This variant of LZP1 was the one published back in 1995.
LZP1c : Hash table index is made from 10 bits of backwards hash and 5 bits of forward hash (on the byte to be
compressed). Match/Literal is a single bit. If a match is made, a full byte is sent, containing the 5 bits
of forward hash and 3 bits of length (4 bits of forward hash and 4 bits of length is another option, but is
generally slightly worse). As usual if match length exceeds 3 bits, another 8 bits is sent. (this is a bit like LZRW3,
except that we use some backward context to reduce the size of the forward hash that needs to be sent).
LZP1d : string table contains 2 pointers per hash (basically a hash with two "ways").
Encoder selects the best match of the two and send a 4 bit match nibble consisting of 1 selection bit and
3 bits of length. Match flag is one bit. Hash way is the bottom bit of the position, except that when a
match is made the matched-from pointer is not replaced. More hash "ways" provide more compression at the
cost of more memory use and more encoder time (most LZP's are symmetric, encoder and decoder time is the same,
but this one has a slower encoder) (nowadays this is called ROLZ).
LZP1e : literal/match is sent as run len; 4 bit nibble is divided as 0-4 = literal run length, 5-15 = match length.
(literal run length can be zero, but match length is always >= 1, so if match length >= 11 additional bytes are
sent). This variant benefits a lot from "Literal after match" - after a match a literal is always written without
flagging it.
LZP1f is the same as LZP1c.
LZP1g : like LZP1a except maximum match length is 1, so you only flag literal/match, you don't send a length. This is
"Predictor" or "Finnish" from the ancient days. Hash table stores chars instead of pointers or offsets.
Obviously there are a lot of ways that these could all be modifed to get more compression (*), but it's rather pointless to go
down that path because then you should just use entropy coding.
(* a few ways : combine the forward hash of lzp1c with the "ways" of lzp1d ; if the first
hash fails to match escape down to a lower order hash (such as maybe just order-1 plus 2 bits of position) before
outputting a literal ; output literals in 7 bits instead of 8 by using something like an MTF code ; write match lengths
and flags with a tuned variable-bit code like lzp1b's ; etc. )
Side note : while writing this I stumbled on
LZ4 . LZ4 is almost exactly "LZRW1". It uses a hash table (hashing the
bytes to match, not the previous bytes like LZP does) to find matches, then sends the offset
(it's a normal LZ77, not an LZP). It encodes as 4 bit literal run lens and 4 bit match lengths.
There is some weird/complex stuff in the LZ4 literal run len code which is designed to prevent it from getting super slow on random
data - basically if it is sending tons of literals (more than 128) it starts stepping by multiple bytes in the encoder
rather than stepping one byte at a time. If you never/rarely compress random data then it's probably better to
remove all that because it does add a lot of complexity.
REVISED : Yann has clarified LZ4 is BSD so you can use it.
Also, the code is PC only because he makes heavy use of unaligned dword access. It's a nice little simple coder, and the speed/compression
tradeoff is good. It only works well on reasonably large data chunks though (at least 64k).
If you don't care so much about encode time then something that spends more time on finding good matches would be
a better choice. (like LZ4-HC, but it seems the LZ4-HC code is not in the free distribution).
He has a clever way of handling the decoder string copy issue where you can have overlap when the offset is less than
the length :
U32 dec[4]={0, 3, 2, 3};
// copy repeated sequence
cpy = op + length;
if (op-ref < 4)
{
*op++ = *ref++;
*op++ = *ref++;
*op++ = *ref++;
*op++ = *ref++;
ref -= dec[op-ref];
}
while(op < cpy) { *(U32*)op=*(U32*)ref; op+=4; ref+=4; }
op=cpy; // correction
This is something I realized as well when doing my LZH decoder optimization for SPU : basically a string copy
with length > offset is really a repeating pattern, repeating with period "offset". So offset=1 is AAAA,
offset=2 is ABAB, offset=3 is ABCABC. What that means is once you have copied the pattern a few times the
slow way (one byte at a time), then you can step back your source pointer by any multiple of the offset that
you want. Your goal is to step it back enough so that the separation between dest and source is bigger than
your copy quantum size. Though I should note that there are several faster ways to handle this issue (the key
points are these : 1. you're already eating a branch to identify the overlap case, you may as well have custom
code for it, and 2. the single repeating char situation (AAAA) is by far more likely than any other).
ADDENDUM : I just found the LZ4 guy's blog (Yann Collet, who also did the fast "LZP2"), there's some good stuff on there. One I like is his
compressor ranking .
He does the right thing ( I
wrote about here ) which is to measure the
total time to encode,transmit,decode, over a limitted channel. Then you look at various channel speeds and you can
see in what domain a compressor might be best. But he does it with nice graphs which is totally the win.
"05-20-11 - LZP1 Variants"
5 Comments -
I'm glad you found LZ4 guy's blog -- the both of you have been an inspiration to me (I've started up my fastest-decompression-on-a-4MHz-8088 hobby again).
May 20, 2011 at 11:26 PM
Hi Charles
Just to clarify the license situation : LZ4 code is indeed BSD, and this should be clearly written at the top of each source file (LZ4.h and LZ4.c).
Alongside with LZ4 is provided an "example demo program" (main.c), which is not part of LZ4. It just sits there to provide a quick example to any programmer willing to experiment.
The demo program license is GPL.
May 21, 2011 at 12:34 AM
Cool, thanks for clarifying, BSD is a good license.
May 21, 2011 at 10:27 AM
Anyway, I don't think you actually want to use it because it's hard to imagine a scenario where it's the right choice; if you want fast symmetric coding, LZP is probably better; if you don't care so much about encode time then something that spends more time on finding good matches would be a better choice.
Good comparisom with other fast codecs
LZ4 has wicked fast decompression and OK strength.
May 24, 2011 at 3:02 AM
Yeah, you're right, I'll amend that, it actually is on the "pareto frontier" for space/speed.
May 24, 2011 at 12:20 PM