Googles appar
Huvudmeny

Post a Comment On: cbloom rants

"08-19-10 - Fuzz Testing"

11 Comments -

1 – 11 of 11
Blogger ryg said...

"To prevent actual trashing then, I just make the encoder never emit a match within 8 bytes of the end of a chunk."
Erm, that's no good. What if random corruption causes something within the last 8 bytes of a chunk to be decoded as a match?

August 19, 2010 at 11:25 PM

Anonymous Anonymous said...

I think what he means is that a valid file will never write past boundary (into the last 8 bytes), so he doesn't worry about having to fixup legitimate writes into the last 8 bytes when he does wraparound processing, because there are no such writes to fixup.

August 20, 2010 at 3:07 AM

Blogger cbloom said...

"Erm, that's no good. What if random corruption causes something within the last 8 bytes of a chunk to be decoded as a match? "

Err, yeah, this is a good point, which I should clarify about :

I do not actually necessarily *detect* corruption - I just don't crash.

So, I always have 8 bytes of pad allocation at the end of a buffer, so that if I do get corruption that causes a match in there, I don't go stomp into other memory.

However, if my buffer is at the end of a sliding window or next to another portion of LZ buffer that's being decoded on another thread, then that match at the end might step into the next guy's buffer. That will make the results invalid, but won't cause a crash.

August 20, 2010 at 8:50 AM

Blogger won3d said...

I wonder how well this works:

http://klee.llvm.org/

From the abstract:

"We present a new symbolic execution tool, KLEE, capable of automatically generating tests that achieve high coverage on a diverse set of complex and environmentally-intensive programs. We used KLEE to thoroughly check all 89 stand-alone programs in the GNU COREUTILS utility suite, which form the core user-level environment installed on millions of Unix systems, and arguably are the single most heavily tested set of open-source programs in existence. KLEE-generated tests achieve high line coverage — on average over 90% per tool (median: over 94%) — and significantly beat the coverage of the developers' own hand-written test suites. When we did the same for 75 equivalent tools in the BUSYBOX embedded system suite, results were even better, including 100% coverage on 31 of them. We also used KLEE as a bug finding tool, applying it to 452 applications (over 430K total lines of code), where it found 56 serious bugs, including three in COREUTILS that had been missed for over 15 years. Finally, we used KLEE to cross-check purportedly identical BUSY-BOX and COREUTILS utilities, finding functional correctness errors and a myriad of inconsistencies."

August 20, 2010 at 9:51 AM

Blogger won3d said...

Oh, and one cooky way to have crash-hardened decompression would be to do the wacky bijective/one-to-one compression by D, A. Scott. He seems to think that it is better for crypto for intuitively appealing but actually crackpotty reasons. I mean, it seems that if this kind of compression increases security, it would be for the hardness (no buffer overruns due to manipulated headers/payloads) rather than the miniscule extra randomness you might get.

August 20, 2010 at 10:02 AM

Blogger cbloom said...

"Oh, and one cooky way to have crash-hardened decompression would be to do the wacky bijective/one-to-one compression by D, A. Scott."

Yeah, to some extent that's what I mean by "setting up the huffman decoder so that every bit stream makes valid output" - a bit prefix code like Huffman *is* bijective. (ignoring the issue of headers and missing symbols and termination, which DAS deals with but are not actually important).

" He seems to think that it is better for crypto for intuitively appealing but actually crackpotty reasons. "

Indeed. He also makes claims that algorithms need to be bijective to maximize compression; while in theory that is true (non-bijective means wasted code space), in practice we are nowhere near that razor edge of closeness to optimal.

August 20, 2010 at 10:34 AM

Blogger cbloom said...

Klee is interesting. LLVM is very cool.

But actually generating data for all possible branches would get out of hand pretty quick. In fact all you need is one loop :

int i = input();
while( i --)
{
// do stuff
}

you suddenly have 2^32 test cases. Put a loop inside that loop and you have 2^64.

You have to fall back to sampling rather than exhausting enumeration pretty fast.

August 20, 2010 at 2:23 PM

Anonymous Anonymous said...

Of course if you're going to mention CRC-ish codes without tables, you should weigh in on Adler32, which is used by zlib for compression verification, and is also used by ZFS which needs a fast checksum for its always-on validation.

August 22, 2010 at 1:35 AM

Blogger cbloom said...

Okie doke. Coming up...

August 22, 2010 at 12:24 PM

Blogger won3d said...

...or Fletcher32, which is probably better than Adler32.

August 22, 2010 at 3:42 PM

Blogger cbloom said...

Oh yeah, Fletcher certainly should be faster, and supposedly quality is about the same. However this is seriously fucked up :

"The Fletcher checksum cannot distinguish between blocks of all 0 bits and blocks of all 1 bits. For example, if a 16-bit block in the data word changes from 0x0000 to 0xFFFF, the Fletcher-32 checksum remains the same. This also means a sequence of all 00 bytes has the same checksum as a sequence (of the same size) of all FF bytes."

August 22, 2010 at 5:07 PM

You can use some HTML tags, such as <b>, <i>, <a>

This blog does not allow anonymous comments.

Comment moderation has been enabled. All comments must be approved by the blog author.

You will be asked to sign in after submitting your comment.