I thought the GCC barrier was: asm volatile("":::"memory");
August 2, 2010 at 1:13 PM
Anonymous said...
asm volatile("":::"memory") will indeed force the barrier, but it also seems to discard previously loaded memory and to reload them, degrading performances. There might be a non-documented asm volatile("") - see http://gcc.gnu.org/ml/gcc/2003-04/msg01180.html
So what semantics do you want for such a statement? A full memory barrier is fine, it has well defined semantics. Just not reordering statements but allowing the compiler to still use values it has stored has pretty messy semantics.
" So what semantics do you want for such a statement? A full memory barrier is fine, it has well defined semantics. Just not reordering statements but allowing the compiler to still use values it has stored has pretty messy semantics. "
The "memory" mark on an asm volatile is not a "memory barrier" ; it tells the compiler that all memory may have been trashed so everything has to be reloaded. That is needlessly inefficient. Basically it makes all your variables into volatiles. Note that it's also *insufficient* , if you actually want a real memory barrier, it does not generate one. So it's hard for me to imagine when you would want that behavior.
What I want is simply a "no compiler reorder". Of course that's pretty useless when you're just writing standard C since it has undefined memory semantics, but when I'm doing things that I know have very specific actual memory semantics, all I need to know is that the compiler won't reorder the things that I have very carefully ordered.
For example, LoadAcquire on x86 can be implemented as
So what you really want is to denote a set of data that you need to explicitly control the access order to (and for which your ordering rules should apply to) and then provide barriers that limit reordering of accesses to just that state and no other state. On 32-bit x86, is the "memory" designation really that expensive. At worse you will spill a handful of registers...
" So what you really want is to denote a set of data that you need to explicitly control the access order to (and for which your ordering rules should apply to) and then provide barriers that limit reordering of accesses to just that state and no other state. "
Well, yes, in theory, though in practice that's too much intellectual burden, so we just say "order everything".
" On 32-bit x86, is the "memory" designation really that expensive. At worse you will spill a handful of registers. "
Yeah I suspect that in fact the memory volatile is not a huge hit, but more investigation is needed.
If you don't want to make the compiler spill all of the state out of registers at a scheduling barrier (i.e. "memory" designation), you have to tell it what state matters.
Otherwise, it might inline some other methods or something and the CSE pass might then eliminate your read by reusing a stored value.
I'm probably misunderstanding what semantics you want then. You don't want something that prevents moving reads forward. What you really want is a barrier that prevents moving stores down or up, pushing reads down, or moving volatile reads up?
The real problem with GCC is it wasn't designed to support this and we're really just subverting the mechanism for inline assembly to kind of get what we want.
August 5, 2010 at 12:47 PM
When implementing lock-free threading, you sometimes need a compiler scheduling barrier, which is weaker
than a CPU instruction scheduling barrier, or a cache temporal ordering memory barrier.
There's a common belief that an empty volatile asm in GCC is a scheduling barrier :
__asm__ volatile("")
but that appears to not actually be the case (* or rather, it is actually the case, but not according to spec).
What I believe it does do is it splits the "basic blocks"
for compilation, but then after initial optimization there's another merge pass where basic blocks are
combined and they can in fact then schedule against each other.
The GCC devs seem to be specifically defending their right to schedule across asm volatile by refusing to
give gaurantees :
gcc.gnu.org/bugzilla/show_bug.cgi?id=17884
or
gcc-patches/2004-10/msg01048.html .
In fact it did appear (in an unclear way) in the docs that they wouldn't schedule across asm volatile,
but they claim that's a documentation error.
Now they also have the
built-in "__sync" stuff .
But I don't see any simple compiler scheduling barrier there, and in fact __sync has problems.
__sync is defined to match the Itanium memory model (!?) but then was ported to other platforms. They also
do not document the semantics well. They say :
"In most cases, these builtins are considered a full barrier"
What do you mean by "full barrier" ? I think they mean LL,LS,SL,SS , but do they also mean Seq_Cst ?
( there also seem to be some bugs in some of the __sync implementations
bugzilla/show_bug.cgi?id=36793 )
For example on PS3 __sync_val_compare_and_swap appears to be :
sync
lwarx
.. loop ..
stwcx
isync
which means it is a full Seq_Cst operation like a lock xchg on x86.
That would be cool if I actually knew it always was that on all platforms, but failing to clearly
document the gauranteed memory semantics of the __sync operations makes them dangerous.
(BTW as an aside, it looks like the GCC __sync intrinsics are generating isync for Acquire
unlike Xenon
which uses lwsync in that spot).
(note of course PS3 also has the atomic.h platform-specific implementation; the atomic.h has no barriers at all,
which pursuant to previous blog
Mystery - Does the Cell PPU need Memory Control
might actually be the correct thing).
I also stumbled on this thread where
Linus rants about GCC .
I think the example
someone gives in there about signed int wrapping is pretty terrible - doing a loop like that is fucking bananas and you
deserve to suffer for it. However, signed int wrapping does come up in other nasty unexpected places. You actually
might be wrapping signed ints in your code right now and not know about it. Say I have some code like :
char x;
x -= y;
x += z;
What if x = -100 , y = 90 , and z = 130 ? You should have -100 - 90 + 130 = -60. But your intermediate was -190
which is too small for char. If your compiler just uses an 8 bit machine register it will wrap and it will all do
the right thing, but under gcc that is not gauranteed.
See
details on signed-integer-overflow in GCC and
notes about wrapv and
wrapv vs strict-overflow
"07-31-10 - GCC Scheduling Barrier"
15 Comments -
BTW the most compelling argument for signed int wrapping that I know is simply to write this :
a + b + c
what does that do? Is
(a + b) + c
the same as
a + (b + c)
? If signed ints don't wrap, the answer is no.
July 31, 2010 at 8:36 PM
This comment has been removed by the author.
August 1, 2010 at 8:25 AM
This comment has been removed by the author.
August 1, 2010 at 8:29 AM
I've never noticed this phenomenon on PowerPC architectures. Of course I've never looked for it either. I have done tons of inline assembly though.
I take it that this has occurred for you and is causing problems?
August 1, 2010 at 8:46 AM
"I take it that this has occurred for you and is causing problems? "
No, so far as I can tell *in practice* on our current consoles, it actually is a scheduling barrier.
However, it disturbs me to make that part of my library when the GCC devs specifically say it is not a barrier.
I can just imagine the nightmare threading bugs that would occur if an instruction was moved there.
Sigh.
August 1, 2010 at 10:47 AM
I thought the GCC barrier was:
asm volatile("":::"memory");
August 2, 2010 at 1:13 PM
asm volatile("":::"memory") will indeed force the barrier, but it also seems to discard previously loaded memory and to reload them, degrading performances. There might be a non-documented asm volatile("") - see http://gcc.gnu.org/ml/gcc/2003-04/msg01180.html
August 2, 2010 at 1:28 PM
sylvain, in that very thread they say :
[quote]
> I observed that simply asm volatile (""), i.e., "old style asm" *does*
> do what I want.
That seems wrong to me. In any case, you definitely shouldn't
rely on this.
[/quote]
which is what I've been seeing. Currently asm volatile *does* seem to be what we want, but they are very defensive about not gauranteeing it. Sigh.
August 2, 2010 at 1:57 PM
So what semantics do you want for such a statement? A full memory barrier is fine, it has well defined semantics. Just not reordering statements but allowing the compiler to still use values it has stored has pretty messy semantics.
August 3, 2010 at 12:53 AM
" So what semantics do you want for such a statement? A full memory barrier is fine, it has well defined semantics. Just not reordering statements but allowing the compiler to still use values it has stored has pretty messy semantics. "
The "memory" mark on an asm volatile is not a "memory barrier" ; it tells the compiler that all memory may have been trashed so everything has to be reloaded. That is needlessly inefficient. Basically it makes all your variables into volatiles. Note that it's also *insufficient* , if you actually want a real memory barrier, it does not generate one. So it's hard for me to imagine when you would want that behavior.
What I want is simply a "no compiler reorder". Of course that's pretty useless when you're just writing standard C since it has undefined memory semantics, but when I'm doing things that I know have very specific actual memory semantics, all I need to know is that the compiler won't reorder the things that I have very carefully ordered.
For example, LoadAcquire on x86 can be implemented as
x = *ptr
CompilerReadBarrier
August 3, 2010 at 1:03 AM
So what you really want is to denote a set of data that you need to explicitly control the access order to (and for which your ordering rules should apply to) and then provide barriers that limit reordering of accesses to just that state and no other state.
On 32-bit x86, is the "memory" designation really that expensive. At worse you will spill a handful of registers...
August 3, 2010 at 11:56 AM
" So what you really want is to denote a set of data that you need to explicitly control the access order to (and for which your ordering rules should apply to) and then provide barriers that limit reordering of accesses to just that state and no other state. "
Well, yes, in theory, though in practice that's too much intellectual burden, so we just say "order everything".
" On 32-bit x86, is the "memory" designation really that expensive. At worse you will spill a handful of registers. "
Yeah I suspect that in fact the memory volatile is not a huge hit, but more investigation is needed.
August 4, 2010 at 10:00 AM
If you don't want to make the compiler spill all of the state out of registers at a scheduling barrier (i.e. "memory" designation), you have to tell it what state matters.
Otherwise, it might inline some other methods or something and the CSE pass might then eliminate your read by reusing a stored value.
August 4, 2010 at 11:03 PM
No.
August 5, 2010 at 8:59 AM
I'm probably misunderstanding what semantics you want then. You don't want something that prevents moving reads forward. What you really want is a barrier that prevents moving stores down or up, pushing reads down, or moving volatile reads up?
The real problem with GCC is it wasn't designed to support this and we're really just subverting the mechanism for inline assembly to kind of get what we want.
August 5, 2010 at 12:47 PM