Googles appar
Huvudmeny

Post a Comment On: cbloom rants

"11-12-14 - Intel TSX notes"

1 Comment -

1 – 1 of 1
Blogger Fabian Giesen said...

TSX would be even nicer if you could currently buy a CPU that supports it correctly. :) (Intel issued a microcode update for all shipping Haswell/Broadwell parts that disables TSX a few months back, because they found a bug.)

A while ago I spent some time trying to figure out what the x86 memory model actually means for the underlying implementation. Long story short, the main thing you seem to need for the x86 memory model that you don't need at that level of sophistication for weaker memory models is the ability to detect and rewind past "causality violations". OoO/speculation needs most of the machinery for this already, but if sync points are relatively rare, you can implement things like PPC "lwsync" / ARM "dmb" by actually stalling until all prior mem ops have completed. In x86, where every load is "acquire" and every store is "release", that's not viable. What it boils down to is that given an incoming L1 cache line invalidation, you need to know whether any pending load/store depended on that cache line's previous value. That probably boils down to a bunch of extra bits per line in the L1 cache, plus some extra metadata in the memory ordering buffer.

TSX can reuse the same infrastructure. "Begin" needs to checkpoint, and it needs to flip one big switch, which is that while you're in a transaction, no pending loads/stores may retire until the transaction is done (because loads/stores not being retired is what allows the rewinding in the first place). You could OoO that state too but it's probably a full memory fence / LS pipe flush (which would explain high overhead). Limiting factor for # of outstanding ops would be either reorder buffer, memory ordering buffer or L1D$, whichever gives out first. You grab (and keep) all cache lines you write to in E. Without TSX, you can make sure that you only physically modify the L1D$ lines once you know the store is gonna go through. With TSX, you speculatively update and don't know if it's gonna take until the corresponding store is retired. Thus you can *some* of the transaction updates applied to cache lines in E state without the transaction committed. The original contents are gone (you didn't snapshot them) which is why you need to bounce lines to I if that happens. If you do make it to the end, you need to wait until all stores have made it to L1, and once they do, you can batch-retire everything (=your commit). At this point, all the data is where it needs to be already; all that's left to do is update the "current instruction to retire" counter.

That's the high concept anyway. In practice there's definitely a lot more extra wrinkles that I'm missing.

December 1, 2014 at 5:22 PM

You can use some HTML tags, such as <b>, <i>, <a>

This blog does not allow anonymous comments.

Comment moderation has been enabled. All comments must be approved by the blog author.

You will be asked to sign in after submitting your comment.