LWARX 和 STWCX 的 x86 等效项

发布于 2024-07-27 13:22:13 字数 1088 浏览 3 评论 0原文

我正在寻找 LWARX 和 STWCX 的等效项（如 PowerPC 处理器上的那样）或在 x86 平台上实现类似功能的方法。另外，哪里是了解此类内容的最佳地点（即有关无锁/无等待编程的好文章/网站/论坛）。

编辑
我想我可能需要提供更多细节，因为假设我只是在寻找 CAS（比较和交换）操作。我想做的是实现一个带有智能指针的无锁引用计数系统，可以由多个线程访问和更改。我基本上需要一种在 x86 处理器上实现以下功能的方法。

int* IncrementAndRetrieve(int **ptr)
{
  int val;
  int *pval;
  do
  {
    // fetch the pointer to the value
    pval = *ptr;

    // if its NULL, then just return NULL, the smart pointer
    // will then become NULL as well
    if(pval == NULL)
      return NULL;

    // Grab the reference count
    val = lwarx(pval);

    // make sure the pointer we grabbed the value from
    // is still the same one referred to by  'ptr'
    if(pval != *ptr)
      continue;

    // Increment the reference count via 'stwcx' if any other threads
    // have done anything that could potentially break then it should
    // fail and try again
  } while(!stwcx(pval, val + 1));
  return pval;
}

我确实需要一些能够相当准确地模仿 LWARX 和 STWCX 的东西来实现这一目标（我无法找到一种方法来使用迄今为止为 x86 找到的 CompareExchange、交换或添加功能来实现此目的）。

谢谢

原文

I'm looking for an equivalent of LWARX and STWCX (as found on the PowerPC processors) or a way to implement similar functionality on the x86 platform. Also, where would be the best place to find out about such things (i.e. good articles/web sites/forums for lock/wait-free programing).

Edit
I think I might need to give more details as it is being assumed that I'm just looking for a CAS (compare and swap) operation. What I'm trying to do is implement a lock-free reference counting system with smart pointers that can be accessed and changed by multiple threads. I basically need a way to implement the following function on an x86 processor.

int* IncrementAndRetrieve(int **ptr)
{
  int val;
  int *pval;
  do
  {
    // fetch the pointer to the value
    pval = *ptr;

    // if its NULL, then just return NULL, the smart pointer
    // will then become NULL as well
    if(pval == NULL)
      return NULL;

    // Grab the reference count
    val = lwarx(pval);

    // make sure the pointer we grabbed the value from
    // is still the same one referred to by  'ptr'
    if(pval != *ptr)
      continue;

    // Increment the reference count via 'stwcx' if any other threads
    // have done anything that could potentially break then it should
    // fail and try again
  } while(!stwcx(pval, val + 1));
  return pval;
}

I really need something that mimics LWARX and STWCX fairly accurately to pull this off (I can't figure out a way to do this with the CompareExchange, swap or add functions I've so far found for the x86).

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

独留℉清风醉 2024-08-03 13:22:14

正如 Michael 提到的，您可能正在寻找的是 cmpxchg 指令。

需要指出的是，完成此操作的 PPC 方法称为加载链接/ Store Conditional (LL/SC)，而 x86 架构则使用比较和交换（CAS）。 LL/SC 的语义比 CAS 更强，因为对条件地址处的值的任何更改都将导致存储失败，即使其他更改将该值替换为与负载条件相同的值。另一方面，CAS 在这种情况下会成功。这称为 ABA 问题（有关更多信息，请参阅 CAS 链接）。

如果您需要 x86 架构上更强的语义，可以使用 x86 双宽比较和交换 (DWCAS) 指令 cmpxchg8b 或 cmpxchg16b 来近似它x86_64。这允许您一次原子地交换两个连续的“自然大小”单词，而不仅仅是通常的单词。基本思想是两个词之一包含感兴趣的值，另一个包含始终递增的“突变计数”。尽管这在技术上并不能消除问题，但突变计数器在尝试之间回绕的可能性非常低，因此对于大多数用途而言，它是一个合理的替代品。

回复收藏 0 原文

鹤舞 2024-08-03 13:22:14

x86 并不像 PPC 那样直接支持“乐观并发”——相反，x86 对并发的支持基于“锁前缀”，请参阅此处。（一些所谓的“原子”指令，例如 XCHG，实际上是通过内在地断言 LOCK 前缀来获得其原子性，无论汇编代码程序员是否实际对其进行了编码）。从外交角度来说，这并不完全是“防弹”的（事实上，我想说，这很容易发生事故；-）。

回复收藏 0 原文

柳若烟 2024-08-03 13:22:14

您可能正在寻找 cmpxchg 系列指令。

您需要在这些指令之前加上锁定指令才能获得等效的行为。

看看这里快速了解可用内容。

您可能会得到与此类似的结果：

mov ecx,dword ptr [esp+4]
mov edx,dword ptr [esp+8]
mov eax,dword ptr [esp+12]
lock cmpxchg dword ptr [ecx],edx
ret 12

您应该阅读本文...

编辑

针对更新后的问题，您是否想做类似的事情提升shared_ptr？如果是这样，请查看该代码和该目录中的文件 - 它们肯定会帮助您入门。

You're probably looking for the cmpxchg family of instructions.

You'll need to precede these with a lock instruction to get equivalent behaviour.

Have a look here for a quick overview of what's available.

You'll likely end up with something similar to this:

mov ecx,dword ptr [esp+4]
mov edx,dword ptr [esp+8]
mov eax,dword ptr [esp+12]
lock cmpxchg dword ptr [ecx],edx
ret 12

You should read this paper...

Edit

In response to the updated question, are you looking to do something like the Boost shared_ptr? If so, have a look at that code and the files in that directory - they'll definitely get you started.

回复收藏 0 原文

煞人兵器 2024-08-03 13:22:14

如果您使用 64 位并限制自己使用 1tb 堆，则可以将计数器打包到 24 个未使用的最高位中。如果您有字对齐指针，则底部 5 位也可用。

int* IncrementAndRetrieve(int **ptr)
{
  int val;
  int *unpacked;
  do
  {   
    val = *ptr;
    unpacked = unpack(val);

    if(unpacked == NULL)
      return NULL;
    // pointer is on the bottom
  } while(!cas(unpacked, val, val + 1));
  return unpacked;
}

if you are on 64 bits and limit yourself to say 1tb of heap, you can pack the counter into the 24 unused top bits. if you have word aligned pointers the bottom 5 bits are also available.

int* IncrementAndRetrieve(int **ptr)
{
  int val;
  int *unpacked;
  do
  {   
    val = *ptr;
    unpacked = unpack(val);

    if(unpacked == NULL)
      return NULL;
    // pointer is on the bottom
  } while(!cas(unpacked, val, val + 1));
  return unpacked;
}

回复收藏 0 原文

濫情▎り 2024-08-03 13:22:14

不知道 LWARX 和 STWCX 是否会使整个缓存线无效，CAS 和 DCAS 会这样做。这意味着，除非您愿意丢弃大量内存（每个独立的“可锁定”指针 64 字节），否则如果您真的将软件推向压力，您将不会看到太大的改进。到目前为止，我看到的最好的结果是人们有意识地放弃 64b，围绕它规划他们的结构（打包不会成为争用主题的内容），使所有内容都在 64b 边界上对齐，并使用显式的读写数据屏障。缓存行失效可能会花费大约 20 到 100 个周期，这使其成为比锁避免更大的实际性能问题。

此外，您还必须规划不同的内存分配策略来管理受控泄漏（如果您可以将代码划分为逻辑“请求处理” - 一个请求“泄漏”，然后在最后释放其所有内存块）或数据分配管理这样，一个处于争用状态的结构永远不会接收到由同一结构/集合的元素释放的内存（以防止 ABA）。其中一些可能非常违反直觉，但要么就是这样，要么为 GC 付出了代价。

回复收藏 0 原文