Googles appar
Huvudmeny

Post a Comment On: cbloom rants

"09-12-10 - Challenges in Data Compression 1 - Finite State Correlations"

4 Comments -

1 – 4 of 4
Blogger Shelwien said...

http://encode.ru/threads/1127-Structure-detection

September 13, 2010 at 12:07 AM

Blogger Matt Mahoney said...

Of course the general problem is not computable. However, PAQ is able to find the record length in many files with fixed sized records. It looks for characters or pairs of characters that repeat at regular intervals. For example, if it finds "X" at offsets 0, 72, 144, and 216, it would guess that the record length is 72 and then model appropriately using contexts like offset mod 72 and neighboring contexts in 2-D.

September 13, 2010 at 6:00 PM

Blogger cbloom said...

This is "recordModel" in PAQ8 I assume?

It is a form of what I describe as hard-coding a certain type of structure and then seeing if your data matches that structure.

PAQ is pretty strong in that way, because you can hard code a variety of structures and then the mixer will pick the ones that fit the data.

I haven't figured out everything that's in PAQ8 yet, there's a lot!

September 13, 2010 at 6:11 PM

Blogger Matt Mahoney said...

Yes. Once RecordModel figures out the cycle length, it uses combinations of the bytes to the left and above and the cycle position as context. Figuring out the cycle length has a lot of heuristics and doesn't work perfectly.

September 14, 2010 at 12:01 PM

You can use some HTML tags, such as <b>, <i>, <a>

This blog does not allow anonymous comments.

Comment moderation has been enabled. All comments must be approved by the blog author.

You will be asked to sign in after submitting your comment.