Googles appar
Huvudmeny

Post a Comment On: cbloom rants

"05-31-12 - On C++ Atomic Fences Part 2"

6 Comments -

1 – 6 of 6
Anonymous Anonymous said...

Does this mean any number of people are going to write a lot of atomic-fenced C++ code that works on all current machines, but doesn't obey the spec and thus could break on future machines?

If so, hooray.

June 1, 2012 at 10:28 AM

Blogger cbloom said...

Certainly. Though it's no worse than in the past when people would just write lockfree code against x86 which would rely on all kinds of subtle details of that platform to work.

Really anybody writing lockfree code should have a DEC Alpha or some similar super-weakly-ordered machine to do all their testing on.

But more generally, just because you write your lockfree code in C++0x doesn't mean it's portable. It's easy to rely on behavior which happens to be true but is not guaranteed.

Of course the same thing is true of C, and the C standards committees have always been totally unhelpful about making real code work, and making portability more robust.

For example in plain C, I constantly assume linear addressing in a single address space; according to the standard I can't assume that and a future machine or compiler could break it. I should be allowed to do something like :

#requires(linear_addressing)

and then the assumptions/requirements of the code are clear.

Similarly for lockfree code it would be nice to be able to say

#requires(causal_store_order)

then you get a compile error on platforms that don't give that.

June 1, 2012 at 11:27 AM

Anonymous Anonymous said...

Yeah, I agree all that stuff is busted.

I just feel like we should be trying to have less of the "not really portable" standards, not more of them--like C99 deciding to have an explicit rule for integer division rounding.

So it just seems weird for them to pick a brand new convention, especially one that's a little more broken. But yeah, even if they used traditional fences you could still have the DEC Alpha semantic sort of problem go uncaught, so maybe it's not so much C++0x's fault.

June 1, 2012 at 4:28 PM

Blogger cbloom said...

The whole idea of undefined behavior, without any constraints to keep you inside the defined behavior, is totally broken.

The only way to actually make that work would be to provide a compiler + VM that strictly enforces the standard. There should have been a C reference VM that told you any time you used undefined behavior.

The plus side for C++0x is that there is lots of good work on simulators and static race analysis and such going on right now. So I think it will be possible to reasonably test if code is correct according to the standard (then you just have to pray the simulator is actually coded correctly, and the compiler on your new platform implements the standard correctly).

The idea that programmers in the real world will read the standards docs and know what they can or cannot rely on is just insane. But if they have a black box that they can put programs in and get a red light or green light, that's reasonable and real humans will actually do it.

June 1, 2012 at 7:53 PM

Anonymous Anonymous said...

Interesting that a CPU fence creates a sync point for all processors.

However, in the "old-school publication" example, I can't get past the fact that the compiler could reorder the flag load with the loads it's supposed to protect. There should at least be a compiler barrier in there, I think. Or publish a pointer.

In C++11 I believe a compiler barrier can be implemented as std::atomic_signal_fence(std::memory_order_acq_rel), but of course, that doesn't still doesn't make the example valid in C++11.

June 3, 2012 at 4:52 AM

Blogger cbloom said...

"Interesting that a CPU fence creates a sync point for all processors."

The exact details of this are highly processor dependent, and I encourage noone to rely on this or to even pursue this line of thought much further.

However, for the record, some more rambling on this topic.

On some platforms the MB actually is an entry in the seq_cst total order. Even though the MB itself is not an observable event on other processors, the other processors can still observe ordering relations to the MB, such as "I know this must be after the MB" and that gives them a transitive way to tap into the total order.

(there's a funny thing about these ops that sometimes just the existence of strictly sequenced ops can affect your core even if you cannot observe them, if you can say that other ops relate to them)

On other processors the MB is not an entry in the total order, however it can still act as a sync point for other processors. The reason for this is more subtle and where I might go wrong a bit. (again I encourage everyone to not read this comment and just do things the safe modern way) (also I ask if there are any cpu experts out there you could fill in some details for me).

The crucial thing is that on all current real cores, the MB communicates via cache line messages. On the core that executes the MB, the store buffer is drained, all previous dirty lines are flushed before any future action can happen. The other cores then all receive these cache line update messages, and you can use these to create a happens relationship; eg. "happens after the cache line messages from the MB".

It's pretty trivial to see that this works on in-order cores. The subtle bit is with out-of-order cores that allow speculative execution. The crucial thing that makes it work on real world chips, is that all major chips at the moment invalidate speculative reads when they receive a cache line invalidate message.

June 8, 2012 at 7:43 AM

You can use some HTML tags, such as <b>, <i>, <a>

This blog does not allow anonymous comments.

Comment moderation has been enabled. All comments must be approved by the blog author.

You will be asked to sign in after submitting your comment.