Googles appar
Huvudmeny

Post a Comment On: cbloom rants

"01-23-15 - LZA New Optimal Parse"

3 Comments -

1 – 3 of 3
Blogger Fabian Giesen said...

One-paragraph description for those who don't feel like reading through the encode.ru thread:

LZSS-style is backwards parse: for every position bigger than cur_pos, you know the optimal way to the end. To score a coding option, you take "cost(option) + cost_till_end(cur_pos + num_bytes_encoded_by_option)". That's a candidate for cost_till_end(cur_pos). For each position you look at your options and keep the best, going from back to front. At the end you have the cheapest way to code the entire data.

This new optimal parse is "transposed" (in a precise sense - we're looking at the same edges, just in the other direction). For every position, you keep track of the cheapest way to *arrive* there. For each coding option, you compute "cost_to_arrive_at(cur_pos) + cost(option)", and that's a candidate for "cost_to_arrive_at(cur_pos + num_bytes_encoded_by_option)". You go over the stream in increasing position order. By the time you're at say cur_pos=10, you must have considered all ways to arrive there.

In a LZSS-able coding scheme, the two ways to do the dynamic programming solve are equivalent. But for modern LZs with extra state such as repeated match offsets, the transposed way is *much* nicer: going front to back, you know the repeat match offsets (and you can store that kind of state in the "arrival" node if you found a good candidate). Going back to front, that bit of state is a giant headache.

January 23, 2015 at 4:25 PM

Blogger cbloom said...

Yes, just that. Well said.

January 23, 2015 at 10:07 PM

Blogger cbloom said...

The biggest advantage of the forward parse is that it's easy to carry forward adaptive state like the last-offset set or the LZMA-style "markov" history of coding events.

But there are other advantages.

One big one is that because you're scanning forward one position at a time, it's easy to run the match-finder in the same loop as the parser. (LZSS style requires you to find the matches for at least a chunk ahead and then reverse parse that chunk)

It's just all win.

January 25, 2015 at 8:33 AM

You can use some HTML tags, such as <b>, <i>, <a>

This blog does not allow anonymous comments.

Comment moderation has been enabled. All comments must be approved by the blog author.

You will be asked to sign in after submitting your comment.