I wrote previously about Lagrange Rate control for video :
01-12-10 - Lagrange Rate Control Part 1 01-12-10 - Lagrange Rate Control Part 2 01-13-10 - Lagrange Rate Control Part 3
I was thinking about it recently, and I wanted to do some vague rambling about the overall issue.
First of all, the lagrange method in general for image & video coding is just totally wrong.
The main problem is that it assumes every coding decision is independent. That distortions are isolated and additive, which they aren't.
The core of the lagrange method is that you set a "bit usefulness" value (lambda) and then you make each independent coding decision based
on whether more bits improve D by lambda or more.
But that's just wrong, because distortions are *not* localized and independent. I've mentioned a few times recently the issue of quality
variation; if you make one block in image blurry, and leave others at high detail, that looks far worse than the localized D value tells you,
because it's different and stands out. If you have a big patch of similar blocks, then making them different in any way is very noticeable
and ugly. There are simple non-local effects, like if the current block is part of a smooth/gradient area, then blending smoothly with neighbors
in the output is crucial, and the localized D won't tell you that. There are difficult non-local effects, like if the exact same kind of
texture occurs in multiple parts of the image, then coding them differently makes the viewer go "WTF", it's a quality penalty worse than the
local D would tell you.
In video, the non-local D effects are even more extreme due to temporal coherence. Any change of quality over time that's due to the coder
(and not due to motion) is very ugly (like I frames coming in with too few bits and then being corrected over time, or even worse the horrible
MPEG pop if the I-frame doesn't match the cut). Flickering of blocks if they change coding quality over time is horrific. etc. etc. None of
this is measurable in a localized lagrangian decision.
(I'm even ignoring for the moment the fact that the encoding itself is non-local; eg. coding of the current block affects the coding of
future blocks, either due to context modeling or value prediction or whatever; I'm just talking about the fact that D is highly non-local).
The correct thing to do is to have a total-image (or total-video) perceptual quality metric, and make each coding decision based on how it
affects the total quality. But this is impossible.
Okay.
So the funny thing is that the lagrange method actually gets you some global perceptual quality by accident.
Assume we are using quite a simple local D metric like SSD or SAD possibly with SATD or something.
Just in images, perceptually what you want is for smooth blocks to be preserved quite well, and very noisey/random blocks to have more
error. Constant quantizer doesn't do that, but constant lambda does! Because the random-ish blocks are much harder to code, they cost
more bits per quality, they will be coded at lower quality.
In video, it's even more extreme and kind of magical. Blocks with a lot of temporal change are not as important visually - it's okay to
have high error where there's major motion, and they are harder to code so they get worse quality. Blocks that stay still are important to
have high quality, but they are also easier to code so that happens automatically.
That's just within a frame, but frame-to-frame, which is what I was talking about as "lagrange rate control" the same magic sort of comes out.
Frames with lots of detail and motion are harder to code, so get lower quality. Chunks of the video that are still are easier to code, so
get higher quality. The high-motion frames will still get more bits than the low-motion frames, just not as many more bits as they would at
constant-quality.
It can sort of all seem well justified.
But it's not. The funny thing is that we're optimizing a non-perceptual local D. This D is not taking into account things like the fact
that high motion block errors are less noticeable. It's just a hack that by optimizing for a non-perceptual D we wind up with a pretty good
perceptual optimization.
Lagrange rate control is sort of neat because it gets you started with pretty good bit allocation without any obvious heuristic tweakage.
But that goes away pretty fast. You find that using L1 vs. L2 norm for D makes a big difference in perceptual quality; maybe L1 squared?
other powers of D change bit allocation a lot. And then you want to do something like MB-tree to push bits backward; for example the I frame
at a cut should get a bigger chunk of bits so that quality pops in rather than trickles in, etc.
I was thinking of this because I mentioned to ryg the other day that I never got B frames working well in my video coder.
They worked, and they helped in terms of naive distortion measures, but they created an ugly perceptual quality problem - they had a slightly
different look and quality than the P frames, so in a PBPBP sequence you would see a pulsing of quality that was really awful.
The problem is they didn't have uniform perceptual quality. There were a few nasty issues.
One is that at low bit rates, the "B skip" block becomes very desirable in B frames. (for me "B skip" = send no movec or residual; use
predicted movec to future and past frames to make an interpolated output block). The "B skip" is very cheap to send, and has pretty decent
quality. As you lower bit rate, suddenly the B frames start picking "B skip" all over, and they actually have lower quality than the P frames.
This is an example of a problem I mentioned in the PVQ posts - if you don't have a very smooth contimuum of R/D choices, then an RD optimizing
coder will get stuck in some holes and there will be sudden pops of quality that are very ugly.
At higher bit rates, the B frames are easier to code to high quality, (among other things, the P frame is using mocomp from further in the past),
so the pulsing of quality is high quality B's and lower quality P's.
It's just an issue that lagrange rate control can't handle. You either need a very good real perceptual quality metric to do B-P rate control,
or you just need well tweaked heuristics, which seems to be what most people do.
"12-27-14 - Lagrange Rate Control Part 4"
No comments yet. -