顺序和屏障:x86 上“lwsync”的等效指令是什么?在 PowerPC 上?
我的代码很简单,如下所示。我找到了rmb和wmb用于读写,但没有找到通用的。lwsync在PowerPC上可用,但是 x86 的替代品是什么?提前致谢。
#define barrier() __asm__ volatile ("lwsync")
...
lock()
if(!pInst);
{
T* temp=new T;
barrier();
pInst=temp;
}
unlock();
My code is simple as below.I found rmb and wmb for read and write,but found no general one.lwsync is available on PowerPC,but what is the replacement for x86?Thanks in advance.
#define barrier() __asm__ volatile ("lwsync")
...
lock()
if(!pInst);
{
T* temp=new T;
barrier();
pInst=temp;
}
unlock();
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
rmb()
和 wmb() 是 Linux 内核函数。还有mb()
。x86 指令为
lfence
、sfence
和mfence
、IIRC。rmb()
and wmb() are the Linux kernel functions. There is alsomb()
.The x86 instructions are
lfence
,sfence
, andmfence
, IIRC.Cilk 运行时中有一个您可能会感兴趣的特定文件,即 cilk-sysdep.h,其中包含与内存屏障相关的系统特定映射。我提取了一小部分关于 x86 即 i386 的问题,
我喜欢这个的原因是 xchgl 似乎更快:),尽管您应该真正实现它们并检查一下。
There's a particular file in the Cilk runtime you might find interesting i.e. cilk-sysdep.h where it contains system specific mappings w.r.t memory barriers. I extract a small section w.r.t ur question on x86 i.e. i386
What i liked about this is the fact that xchgl appears to be faster :) though you should really implement them and check it out.
您没有确切说明这段代码中的锁定和解锁是什么。我假设它们是互斥操作。在 powerpc 上,互斥锁获取函数将使用 isync(如果没有它,硬件可能会在 lock() 之前评估您的 if (!pInst)),并且在 unlock() 中将有一个 lwsync(如果您的互斥锁实现是古老的,则为同步) 。
因此,假设您对 pInst 的所有访问(读和写)都受到您的锁定和解锁方法的保护,那么您的屏障使用是多余的。解锁将具有足够的屏障,以确保 pInst 存储在解锁操作完成之前可见(以便在任何后续锁获取后它都可见,假设使用相同的锁)。
在 x86 和 x64 上,您的 lock() 将使用某种形式的 LOCK 前缀指令,该指令自动具有双向防护行为。
您在 x86 和 x64 上的解锁只需是存储指令(除非您在 CS 中使用某些特殊字符串指令,在这种情况下您将需要 SFENCE)。
手册:
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462。 pdf
提供了有关所有围栏以及 LOCK 前缀的影响(以及何时隐含)的详细信息。
附:在你的解锁代码中,你还必须有一些强制编译器排序的东西(所以如果它只是一个存储零,你还需要像 GCC 风格 asm _易失性< /em>_ ("" ::: "内存"))。
You don't say exactly what lock and unlock are in this code. I'm presuming they are mutex operations. On powerpc a mutex acquire function will use an isync (without which the hardware may evaluate your if (!pInst) before the lock()), and will have an lwsync (or sync if your mutex implementation is ancient) in the unlock().
So, presuming all your accesses (both read and write) to pInst are guarded by your lock and unlock methods your barrier use is redundant. The unlock will have a sufficient barrier to ensure that the pInst store is visible before the unlock operation completes (so that it will be visible after any subsequent lock acquire, presuming the same lock is used).
On x86 and x64 your lock() will use some form of LOCK prefixed instruction, which automatically has bidirectional fencing behaviour.
Your unlock on x86 and x64 only has to be a store instruction (unless you use some of the special string instructions within your CS, in which case you'll need an SFENCE).
The manual:
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
has good information on all the fences as well as the effects of the LOCK prefix (and when that is implied).
ps. In your unlock code you'll also have to have something that enforces compiler ordering (so if it is just a store zero, you'll also need something like the GCC style asm _volatile_ ( "" ::: "memory" ) ).