Finally for completeness, some of the matchers from Tornado in FreeArc. These are basically all standard "cache table" style
matchers, originally due to LZRW, made popular by LZP and LZO. The various Tornado settings select different amounts of hash rows
and ways.
As they should, they have very constant time operation that goes up pretty steadily from Tornado -3 to -7, because there's a
constant number of hash probes per match attempt.
totals : match len : clocks
Test_MMC1 : 0.663028 , 231.254818
Test_Hash1 : 0.662718 , 64.888003
Test_Tornado_3 : 0.630377 , 19.658834
Test_Tornado_4 : 0.593174 , 28.456055
Test_Tornado_5 : 0.586540 , 40.546146
Test_Tornado_6 : 0.580042 , 56.841156
Test_Tornado_7 : 0.596584 , 141.432393
There may be something wrong with my Tornado wrapper as the -3 matcher actually finds the longest total length. I dunno.
The speeds look reasonable. I don't really care much about these approximate matchers because the loss is hard to
quantify, so there you go (normally when I see an anomaly like that I would investigate it to make sure I understand
why it's happening).
[Image]
[Image]
0 = ares_merged.gr2_sec_5.dat
1 = bigship1.x
2 = BOOK1
3 = combined.gr2_sec_3.dat
4 = GrannyRocks_wot.gr2
5 = Gryphon_stripped.gr2
6 = hetero50k
7 = lzt20
8 = lzt21
9 = lzt22
10 = lzt23
11 = lzt24
12 = lzt25
13 = lzt27
14 = lzt28
15 = lzt29
16 = movie_headers.bin
17 = orange_purple.BMP
18 = predsave.bin
Conclusion : I've got to get off string matching so this is probably the end of posts on this topic.
MMC looks promising but has some flaws. There are some cases that trigger a slowness spike in it. Also it
has some bad O(N^2) with unbounded match length ("MMC2") so I have to run it with a limit ("MMC1") which removes
some of its advantage over LzFind and Hash1 and other approximate matchers. (without the limit it has the
advantage of being exact). It's also a GPL at the moment which is a killer.
LzFind doesn't have anything going for it really.
For approximate/small-window matching I don't see any reason to not use the classic Zip hash chain method. I tried a few variants
of this, like doing a hash chain to match the first 4 bytes and then link listing off that, and all the variants were worse than
the classic way.
For large window / exact matching / optimal parsing, a correct O(N) matcher is the way to go. The suffix-array based matcher is by
far the easiest for your kids to implement at home.
"09-30-11 - String Match Results Part 5 + Conclusion"
No comments yet. -