Even most people who work in compression don't realize this, but in fact in most cases Adaptive Models and Static Models can achieve exactly
the same amount of compression.
Let me now try to make that note more precise :
With an adaptive model to really do things right you must :
1. Initialize to a smart/tuned initial condition (not just 50/50 probabilities or an empty model)
2. Train the model with carefully trained rates; perhaps faster learning at first then slowing down;
perhaps different rates in different parts of the model
3. Reset the model smartly at data-change boundaries, or perhaps have multiple learning scales
4. Be careful of making the adaptive model too big for your data; eg. don't use a huge model space
that will be overly sparse on small files, but also don't use a tiny model that can't learn about
big files
With a static model to do things right you must :
1. Transmit the model very compactly, using assumptions about what the model is like typically;
transmit model refreshes as deltas
2. Send model refreshes in the appropriate places; the encoder must optimally choose model refresh points
3. Be able to send variable amounts of model; eg. with order-1 huffman decide which contexts get their
own statistics and which go into a shared group
4. Be able to send the model with varying degrees of precision; eg. be able to approximate when that's better
for the overall size(model) + size(coded bits)
We've seen over and over in compression that these can be the same. For example with linear-prediction lossless
image compression, assuming you are doing LSQR fits to make predictors, you can either use the local neighborhood
and generate an LSQR in the decoder each time, or you can transmit the LSQR fits at the start of the file.
It turns out that either way compression is about the same (!!* BUT only if the encoder in the latter case is quite
smart about deciding how many fits to send and how precise they are and what pixels they apply to).
Same thing with coding residuals of predictions in images. You can either do an adaptive coder (which needs to be
pretty sophisticated these days; it should have variable learning rates and tiers, ala the Fenwick symbol-rank work;
most people do this without realizing it just by having independent statistics for the low values and the high values)
or you can create static shaped laplacian models and select a model for each coefficient. It turns out they are about
the same.
The trade off is that the static model way needs a very sophisticated encoder which can optimize the total size
(sizeof(transmitted model) + sizeof(coded bits)) , but then it gets a simpler decoder.
(caveat : this is not applicable to compressors where the model is huge, like PPM/CM/etc.)
A lot of people incorrectly think that adaptive models offer better compression. That's not really true, but it
is *much* easier to write a compressor that achieves good compression with an adaptive model. With static models,
there is a huge over-complete set of ways to encode the data, and you need a very complex optimizing encoder to find
the smallest rep. (see, eg. video coders).
Even something as simple as doing order-0 Huffman and choosing the optimal points to retransmit the model is a
very hard unsolved problem. And that's just the very tip of the iceberg for static models; even just staying with
order-0 Huffman you could do much more; eg. instead of retransmitting a whole model, send a delta instead. Instead
of sending the delta to the ideal code lens, instead send a smaller delta to non-ideal codelens (that makes a smaller
total len); instead of sending new code lens, select from one of your previous huffmans. Perhaps have 16 known huffmans
that you can select from and not transmit anything (would help a lot for small buffers). etc. etc. It's very complex.
Another issue with static models is that you really need to boil the data down to its simplest form for static models
to work well. For example with images you want to be in post-predictor space with bias adjusted and all that gubbins
before using a static model; on text you want to be in post-BWT space or something like that; eg. you want to get as close
to decorrelated as possible. With adaptive models it's much easier to just toss in some extra context bits and let the
model do the decorrelation for you. Put another way, static models need much more human guidance in their creation and
study about how to be minimal, whereas adaptive models work much better when you treat them badly.
"10-02-12 - Small note on Adaptive vs Static Modeling"
No comments yet. -