Googles appar
Huvudmeny

Post a Comment On: cbloom rants

"09-30-11 - Don't use memset to zero"

7 Comments -

1 – 7 of 7
Blogger won3d said...

http://stackoverflow.com/questions/2688466/why-mallocmemset-slower-than-calloc

It looks like glibc likely uses virtual memory tricks for calloc. This isn't surprising since it will use mremap for large, aligned reallocs.

September 30, 2011 at 12:50 PM

Blogger cbloom said...

word.

September 30, 2011 at 5:00 PM

Blogger jsgf said...

Relying on the OS to clear pages is very expensive if it results in a pagefault on every initial page acess. It's much cheaper to use hot pages already in your heap than to map new ones in that case, even if you have to memset them yourself.

memset is not bad if you're going to use the memory soon afterwards anyway, because it just warms the cache for you.
(BTW, offline page zeroing daemons are often a bad idea because of this; they zero the pages then let them cool down before using them; much better to zero the page just before you're going to use it.)

The benchmark in the stackoverflow question is completely stupid because it doesn't actually touch the memory after calloc. If it did, it would be about the same as the memset one because of all the pagefaults.

October 2, 2011 at 8:30 AM

Blogger cbloom said...

"memset is not bad if you're going to use the memory soon afterwards anyway, because it just warms the cache for you."

This is only true for tiny allocations. And I'm clearly not talking about tiny allocations.


BTW It's particularly compelling for uses like "cache tables" ala LZRW or LZP matching, because you might allocate a 512MB cache table, and then large sections of it might never be touched at all. (especially on small files)

October 2, 2011 at 9:53 AM

Blogger Cyan said...

Fast match-tables can be small enough to fit into the stack, since they are designed to use L1 cache.

In such case, i'm not sure memset() can be avoided. Stack memory is typically not zeroed by OS.

October 3, 2011 at 11:55 AM

Blogger cbloom said...

Sure sure; but if it fits in L1 it's tiny tiny. You can't rely on L1 being bigger than 32k or so, and you have to share that with lots of other stuff, so you can only fit maybe a 4k table reliably in L1.

Also, I try to always write code that doesn't use the stack because unfortunately other platforms are not as stack-friendly as Windows.

October 3, 2011 at 12:00 PM

Blogger cbloom said...

Also, if you look at the example in the original post it's a malloc of 20 MB. I'm really talking about large tables here.

October 3, 2011 at 12:01 PM

You can use some HTML tags, such as <b>, <i>, <a>

This blog does not allow anonymous comments.

Comment moderation has been enabled. All comments must be approved by the blog author.

You will be asked to sign in after submitting your comment.