Googles appar
Huvudmeny

Post a Comment On: cbloom rants

"07-18-10 - Mystery - Why no isync for Acquire on Xenon -"

7 Comments -

1 – 7 of 7
Blogger ryg said...

Don't see anything wrong with using isync for read-acquire. But it's hard to say. Maybe email the authors of the example POWER C++ memory model implementation? All recent POWER cores are in-order like the Xenon/PPU.

"is there a way to make "volatile" act like old fashioned volatile, not new MSVC volatile? eg. if I just want to force the compiler to actually do a memory load or store (and not optimize it out or get from register or whatever), but don't care about it being acquire or release memory ordered."
The Lockless programming article says that "on Xbox 360 the compiler does not insert any instructions to prevent the CPU from reordering reads and writes". So that's not real acquire/release semantics either.

What the compiler actually seems to do (consistently across all architectures) is to compile volatile loads as "load; CompilerReadWriteBarrier;" and volatile stores as "CompilerReadWriteBarrier; store;". On x86, that gives you load-acquire and store-release semantics. But on PPC, it's just a relaxed store with a compiler barrier.

I don't think there's a way to get guaranteed loads/stores without compiler barriers from VC++ 2005 onwards.

July 18, 2010 at 3:37 PM

Blogger cbloom said...

"All recent POWER cores are in-order like the Xenon/PPU."

Oh, right. isync is probably necessary/cheaper on out-of-order cores. On in-order cores maybe lwsync and isync are the same cost so it's simpler just to use lwsync all the time.

"The Lockless programming article says that "on Xbox 360 the compiler does not insert any instructions to prevent the CPU from reordering reads and writes". So that's not real acquire/release semantics either."

Oh shit, I didn't notice that little caveat. So the special 2005+ MSVC volatile is *not* for Xbox 360.

Which as you point out means that maybe MSVC volatile actually *never* generates special instructions, and in fact makes their volatile not so weird and different. They say it has "acquire/release" semantics, but of course on x86 all loads/stores do. So basically they're just saying the compiler won't muck it up. (maybe on Itanium they actually generate code for acquire/release memory ordering).

July 18, 2010 at 3:49 PM

Blogger cbloom said...

"So the special 2005+ MSVC volatile is *not* for Xbox 360."

In the sense that it's not actually acquire/release. But it seems to generate the same thing as the x86 compiler , which is just a load/store and a compiler barrier. I need to investigate this a bit more to make sure I have all my facts right.

July 18, 2010 at 3:59 PM

Anonymous Anonymous said...

Tangential or relevant I dunno, since I'm a lazy fuck, but I believe POWER and PowerPC are not the same architecture.

This came up, for instance, in looking for that rotate-and-insert instruction, which exists in POWER but not PowerPC.

July 18, 2010 at 4:57 PM

Blogger ryg said...

It's ridiculous case-sensitive naming.

POWER = IBM POWER processors (server)
POWER architecture = Architecture of same
Power architecture = The whole family including POWER, PowerPC, Cell and related. This covers the ISA and common parts between all of them.

POWER used to be slightly different from PowerPC, but they unified the ISAs some time ago. I think Cell/Xenon are already in the unified branch. The newer server processors definitely are.

"This came up, for instance, in looking for that rotate-and-insert instruction, which exists in POWER but not PowerPC."
There's multiple, and the most useful variant definitely exists on PPC: rlwimi/rldimi ("Rotate left [double]word immediate then mask insert"). I think that's been in there since the beginning (okay, the d variant is only in 64-bit PPCs).

Not sure if they removed or renamed the POWER-exclusive instructions for the unified ISA. As long as they didn't reuse the encoding space, they can just handle the "unknown instruction" traps and have the OS emulate the instructions to maintain backwards compatibility.

July 18, 2010 at 5:46 PM

Anonymous Anonymous said...

it's the "immediate" in that that's the issue -- the non-immediate one is lacking.

We were looking at code where if we could use that, we could cut back from two variable-length shifts to one.

July 19, 2010 at 1:11 PM

Anonymous Anonymous said...

rlmi

"Note: the rlmi instruction is supported only in the POWER family architecture."

This was, as you mentioned in the other thread, for bitstream parsing.

July 19, 2010 at 1:31 PM

You can use some HTML tags, such as <b>, <i>, <a>

This blog does not allow anonymous comments.

Comment moderation has been enabled. All comments must be approved by the blog author.

You will be asked to sign in after submitting your comment.