顺序和屏障：x86 上“lwsync”的等效指令是什么？在 PowerPC 上？

发布于 2024-09-14 16:33:56 字数 325 浏览 13 评论 0原文

我的代码很简单，如下所示。我找到了rmb和wmb用于读写，但没有找到通用的。lwsync在PowerPC上可用，但是 x86 的替代品是什么？提前致谢。

#define barrier() __asm__ volatile ("lwsync")
...
    lock()
    if(!pInst);
    {
        T* temp=new T;
        barrier();
        pInst=temp;
    }
    unlock();

原文

My code is simple as below.I found rmb and wmb for read and write,but found no general one.lwsync is available on PowerPC,but what is the replacement for x86?Thanks in advance.

#define barrier() __asm__ volatile ("lwsync")
...
    lock()
    if(!pInst);
    {
        T* temp=new T;
        barrier();
        pInst=temp;
    }
    unlock();

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

阳光的暖冬 2024-09-21 16:33:56

rmb() 和 wmb() 是 Linux 内核函数。还有mb()。

x86 指令为 lfence、sfence 和 mfence、IIRC。

回复收藏 0 原文

家住魔仙堡 2024-09-21 16:33:56

Cilk 运行时中有一个您可能会感兴趣的特定文件，即 cilk-sysdep.h，其中包含与内存屏障相关的系统特定映射。我提取了一小部分关于 x86 即 i386 的问题，

    file:-- cilk-sysdep.h (the numbers on the LHS are actually line numbers)

    252      * We use an xchg instruction to serialize memory accesses, as can
    253      * be done according to the Intel Architecture Software Developer's
    254      * Manual, Volume 3: System Programming Guide
    255      * (http://www.intel.com/design/pro/manuals/243192.htm), page 7-6,
    256      * "For the P6 family processors, locked operations serialize all
    257      * outstanding load and store operations (that is, wait for them to
    258      * complete)."  The xchg instruction is a locked operation by
    259      * default.  Note that the recommended memory barrier is the cpuid
    260      * instruction, which is really slow (~70 cycles).  In contrast,
    261      * xchg is only about 23 cycles (plus a few per write buffer
    262      * entry?).  Still slow, but the best I can find.  -KHR 
    263      *
    264      * Bradley also timed "mfence", and on a Pentium IV xchgl is still quite a bit faster
    265      *   mfence appears to take about 125 ns on a 2.5GHZ P4
    266      *   xchgl  apears  to take about  90 ns on a 2.5GHZ P4
    267      * However on an opteron, the performance of mfence and xchgl are both *MUCH MUCH   BETTER*.
    268      *   mfence takes 8ns on a 1.5GHZ AMD64 (maybe this is an 801)
    269      *   sfence takes 5ns
    270      *   lfence takes 3ns
    271      *   xchgl  takes 14ns
    272      * see mfence-benchmark.c
    273      */
    274     int x=0, y;
    275     __asm__ volatile ("xchgl %0,%1" :"=r" (x) :"m" (y), "0" (x) :"memory");
    276    }

我喜欢这个的原因是 xchgl 似乎更快:)，尽管您应该真正实现它们并检查一下。

There's a particular file in the Cilk runtime you might find interesting i.e. cilk-sysdep.h where it contains system specific mappings w.r.t memory barriers. I extract a small section w.r.t ur question on x86 i.e. i386

    file:-- cilk-sysdep.h (the numbers on the LHS are actually line numbers)

    252      * We use an xchg instruction to serialize memory accesses, as can
    253      * be done according to the Intel Architecture Software Developer's
    254      * Manual, Volume 3: System Programming Guide
    255      * (http://www.intel.com/design/pro/manuals/243192.htm), page 7-6,
    256      * "For the P6 family processors, locked operations serialize all
    257      * outstanding load and store operations (that is, wait for them to
    258      * complete)."  The xchg instruction is a locked operation by
    259      * default.  Note that the recommended memory barrier is the cpuid
    260      * instruction, which is really slow (~70 cycles).  In contrast,
    261      * xchg is only about 23 cycles (plus a few per write buffer
    262      * entry?).  Still slow, but the best I can find.  -KHR 
    263      *
    264      * Bradley also timed "mfence", and on a Pentium IV xchgl is still quite a bit faster
    265      *   mfence appears to take about 125 ns on a 2.5GHZ P4
    266      *   xchgl  apears  to take about  90 ns on a 2.5GHZ P4
    267      * However on an opteron, the performance of mfence and xchgl are both *MUCH MUCH   BETTER*.
    268      *   mfence takes 8ns on a 1.5GHZ AMD64 (maybe this is an 801)
    269      *   sfence takes 5ns
    270      *   lfence takes 3ns
    271      *   xchgl  takes 14ns
    272      * see mfence-benchmark.c
    273      */
    274     int x=0, y;
    275     __asm__ volatile ("xchgl %0,%1" :"=r" (x) :"m" (y), "0" (x) :"memory");
    276    }

What i liked about this is the fact that xchgl appears to be faster :) though you should really implement them and check it out.

回复收藏 0 原文