Googles appar
Huvudmeny

Post a Comment On: cbloom rants

"08-12-13 - Cuckoo Cache Tables - Failure Report"

1 Comment -

1 – 1 of 1
Blogger Paul W. said...

A couple of hardware cache designs seem relevant: skewed associative caches and direct-mapped caches with small victim caches.

A skewed-associative cache is a multiway cache (maybe just 2-way) with a different hash function for each way, so that things that compete for the same cache row in one way will not compete in the other. What collides in one way generally doesn't collide in the other.

A victim cache is a small cache of stuff that's been recently evicted from the main cache. Even a very small cache can usually hang onto the stuff that's been evicted from the main cache solely due to random hash collisions. (Unless you have a hash function that's pathological for your data.) It can be tiny in comparison to the main cache, and still get much of the benefit---if you only have a few random hot spots in the main table, you only need a few entries in your victim cache.

I'm wondering if a good cache table design would be to combine these ideas and use a large direct-mapped main table with a much smaller direct-mapped victim table. (E.g., 64K-entry main table with a 1K-entry victim table, which can be mostly hardware cache-resident but too small to matter much to hardware cache performance.)

For normal hardware caches, victim caches can be a nice win during phases (at the right timescale) where most of your data fit in memory, and most of your misses in a direct-mapped cache are due to collisions. They don't help as much during transients where you're evicting lots of old data and bringing in fresh data, but they still help---e.g., the top of the activation stack will typically stay cached because you keep hitting it while you're hitting other (new) stuff---it may get bounced to the victim cache when you hit some new data with the same hash, but it will likely get bounced back to the main cache before it gets evicted. (And in the case of almost-all-new data, most of your stuff that's evicted would be evicted anyhow, because you're just touching too much data, and keeping a few things that stay hot is a little better.)

For text compression, I'd think the same thing more or less applies. For example, if you're compressing English text, you get transients where the subject terms and mode-of-speech phrases change, but very common and dispersed English phrases (" if the", " and a", " to the", etc.) usually keep appearing. You don't want to simply evict those from the cache table, and giving them a brief second chance with a smallish victim table seems good.

The fact that small dictionaries of 1000 common words and phrases generally work pretty well for text compression suggests that a victim cache table with a comparable number of entries would help with hashing---it would be enough to keep most of the stably common things cached.

For binary data, the situation seems less clear. With binary data you often have a whole lot more unique hashes because e.g., the low bits of numbers keep changing and generating distinct hashes. You're often seeing a flood of "new" data, putting much more pressure on your victim cache table. (But then, that's going to put pressure on whatever mechanism you use to deal with collisions.)







May 4, 2015 at 9:23 AM

You can use some HTML tags, such as <b>, <i>, <a>

This blog does not allow anonymous comments.

Comment moderation has been enabled. All comments must be approved by the blog author.

You will be asked to sign in after submitting your comment.