Googles appar
Huvudmeny

Post a Comment On: cbloom rants

"08-11-10 - ROLZ and Links"

4 Comments -

1 – 4 of 4
Blogger ryg said...

The blended probability estimator is really neat and simple. Never thought about that either.

One note about this type of estimator in general: there's a somewhat slicker way to write the update rule that usually leads to better code.

The general form for the usual expression is:

if (bit)
p += (max - p) >> n;
else
p -= p >> n;

You can munge the second expression a bit to make it look more like the first:

p -= p >> n;
<=> p += -(p >> n);
<=> p += (bias - p) >> n; (note this is an arithmetic right shift now!)

where bias = (1 << n) - 1 is a constant. So you write the whole update rule as:

p += ((bit ? max : bias) - p) >> n;

and compilers will usually be able to transform that into branchless code (either with cmovs or bit-fu). Fairly easy, but I never saw this mentioned anywhere and most code you find has the other expression (which is admittedly way easier to understand). I used this when size-optimizing the depacker for my executable compressor "kkrunchy".

Some bit of fairly useless trivia: When you're writing depacker code in x86 ASM and end up reading individual bits, the awesome thing for your GetBit function/macro to do is to just return the bit you just read in the carry flag. It's usually easy enough to arrange the code so this happens more or less automatically, and it's very useful to have:
- Carry is untouched by most non-arithmetic ops and inc/dec, so it's a fairly decent flag to return results in (unlike e.g. the zero flag)
- You can jump directly based on the carry flag (jc / jnc)
- You can materialize the bit as integer in an arbitrary register if you want: setc (for byte regs), xor reg, reg / adc reg, reg otherwise.
- You can turn it into a mask with one instruction (sbb reg, reg)

Another staple of tiny LZ decoders is the reordered gamma / Exp-Golomb code: instead of first encoding the length then the "value" bits, you interleave the two. The decoding loop is just:

int val = 1;
while (getbit()) val = val * 2 + getbit();
val--;

or the variant

val = 1;
do
val = val * 2 + getbit();
while (getbit());
val -= 2; // or just skip it

Complete x86 ASM code for this: (with getbit-returns-in-carry)

xor ecx, ecx
inc ecx
nextbit:
call getbit
adc ecx, ecx
call getbit
jc nextbit

which is pretty much the perfect way to encode match lens (and offsets for that matter) for tiny simple LZs. Both APack (DOS EXE-packer in the mid-90s) and NRV/UCL (the UPX back end) are based on this.

August 16, 2010 at 12:34 AM

Blogger ryg said...

Holy shit I want a preformatted/code tag...

August 16, 2010 at 12:35 AM

Blogger cbloom said...

"One note about this type of estimator in general: there's a somewhat slicker way to write the update rule that usually leads to better code. ...
and compilers will usually be able to transform that into branchless code "

Yeah, but in real code you almost always put that probability update right inside your arithmetic decode.

I guess you could actually make the whole arithmetic decode + update branchless, but then you often branch on the result of the decode anyway.

Certainly you should only have one branch for decode + prob update + act on decode result. People often have two or three or four branches to do that.

August 16, 2010 at 9:59 AM

Blogger cbloom said...

"Some bit of fairly useless trivia: When you're writing depacker code in x86 ASM and end up reading individual bits, the awesome thing for your GetBit function/macro to do is to just return the bit you just read in the carry flag"

Oh yeah, I've always wanted that. Unfortunately it's pretty inexpressible in C :(

I'm finding your tidbits really interesting, you should write this stuff up.

August 16, 2010 at 10:00 AM

You can use some HTML tags, such as <b>, <i>, <a>

This blog does not allow anonymous comments.

Comment moderation has been enabled. All comments must be approved by the blog author.

You will be asked to sign in after submitting your comment.