WinAPI _Interlocked* char、short 的内部函数

发布于 2024-10-18 15:30:45 字数 1498 浏览 6 评论 0原文

我需要在 char 或 Short 上使用 _Interlocked*** 函数,但它需要长指针作为输入。似乎有函数 _InterlockedExchange8,我没有看到任何相关文档。看起来这是一个未记录的功能。此外,编译器无法找到 _InterlockedAdd8 函数。 我将不胜感激有关该功能的任何信息、使用/不使用的建议以及其他解决方案。

更新1

我会尝试简化问题。 我怎样才能做到这一点?

struct X
{
    char data;
};

X atomic_exchange(X another)
{
    return _InterlockedExchange( ??? );
}

我看到两种可能的解决方案

  1. 使用 _InterlockedExchange8
  2. another 转换为 long,进行交换并将结果转换回 X

第一个显然是糟糕的解决方案。 第二个看起来更好,但是如何实现呢?

更新2

您对这样的事情有何看法?

template <typename T, typename U>
class padded_variable
{
public:
    padded_variable(T v): var(v) {}
    padded_variable(U v): var(*static_cast<T*>(static_cast<void*>(&v))) {}
    U& cast()
    {
        return *static_cast<U*>(static_cast<void*>(&var));
    }
    T& get()
    {
        return var;
    }
private:
    T var;
    char padding[sizeof(U) - sizeof(T)];
};

struct X
{
    char data;
};

template <typename T, int S = sizeof(T)> class var;
template <typename T> class var<T, 1>
{
public:
    var(): data(T()) {}
    T atomic_exchange(T another)
    {
        padded_variable<T, long> xch(another);
        padded_variable<T, long> res(_InterlockedExchange(&data.cast(), xch.cast()));
        return res.get();
    }
private:
    padded_variable<T, long> data;
};

谢谢。

I need to use _Interlocked*** function on char or short, but it takes long pointer as input. It seems that there is function _InterlockedExchange8, I don't see any documentation on that. Looks like this is undocumented feature. Also compiler wasn't able to find _InterlockedAdd8 function.
I would appreciate any information on that functions, recommendations to use/not to use and other solutions as well.

update 1

I'll try to simplify the question.
How can I make this work?

struct X
{
    char data;
};

X atomic_exchange(X another)
{
    return _InterlockedExchange( ??? );
}

I see two possible solutions

  1. Use _InterlockedExchange8
  2. Cast another to long, do exchange and cast result back to X

First one is obviously bad solution.
Second one looks better, but how to implement it?

update 2

What do you think about something like this?

template <typename T, typename U>
class padded_variable
{
public:
    padded_variable(T v): var(v) {}
    padded_variable(U v): var(*static_cast<T*>(static_cast<void*>(&v))) {}
    U& cast()
    {
        return *static_cast<U*>(static_cast<void*>(&var));
    }
    T& get()
    {
        return var;
    }
private:
    T var;
    char padding[sizeof(U) - sizeof(T)];
};

struct X
{
    char data;
};

template <typename T, int S = sizeof(T)> class var;
template <typename T> class var<T, 1>
{
public:
    var(): data(T()) {}
    T atomic_exchange(T another)
    {
        padded_variable<T, long> xch(another);
        padded_variable<T, long> res(_InterlockedExchange(&data.cast(), xch.cast()));
        return res.get();
    }
private:
    padded_variable<T, long> data;
};

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

×眷恋的温暖 2024-10-25 15:30:45

制作 8 位和 16 位互锁函数非常容易,但它们不包含在 WinAPI 中的原因是由于 IA64 可移植性。如果您想支持 Win64,则汇编器不能内联,因为 MSVC 不再支持它。作为使用 MASM64 的外部函数单元,它们不会像内联代码或内在函数那么快,因此您更明智的做法是研究推广算法以使用 32 位和 64 位原子操作。

互锁 API 实现示例:intrin.asm

It's pretty easy to make 8-bit and 16-bit interlocked functions but the reason they're not included in WinAPI is due to IA64 portability. If you want to support Win64 the assembler cannot be inline as MSVC no longer supports it. As external function units, using MASM64, they will not be as fast as inline code or intrinsics so you are wiser to investigate promoting algorithms to use 32-bit and 64-bit atomic operations instead.

Example interlocked API implementation: intrin.asm

掐死时间 2024-10-25 15:30:45

为什么要使用较小的数据类型?那么你可以将一堆它们放入一个很小的内存空间中吗?这只会导致错误共享和缓存行争用。

无论您使用锁定算法还是无锁算法,理想的做法是将数据存储在至少 128 字节的块中(或 CPU 上的任何缓存行大小),并且一次仅由一个线程使用。

Why do you want to use smaller data types? So you can fit a bunch of them in a small memory space? That's just going to lead to false sharing and cache line contention.

Whether you use locking or lockless algorithms, it's ideal to have your data in blocks of at least 128 bytes (or whatever the cache line size is on your CPU) that are only used by a single thread at a time.

救星 2024-10-25 15:30:45

好吧,你必须凑合使用可用的功能。 _InterlockedIncrement 和 `_InterlockedCompareExchange 有 16 位和 32 位变体(后者也有 64 位变体),也许其他一些互锁内在函数也有 16 位版本,但 InterlockedAdd 似乎不是,并且似乎根本没有字节大小的 Interlocked 内在函数/函数。

所以...您需要退后一步,弄清楚如何在没有 IntrinsicAdd8 的情况下解决您的问题。

无论如何,为什么要使用单个字节?坚持使用 int 大小的对象,除非你有充分的理由使用更小的对象。

Well, you have to make do with the functions available. _InterlockedIncrement and `_InterlockedCompareExchange are available in 16 and 32-bit variants (the latter in a 64-bit variant as well), and maybe a few other interlocked intrinsics are available in 16-bit versions as well, but InterlockedAdd doesn't seem to be, and there seem to be no byte-sized Interlocked intrinsics/functions at all.

So... You need to take a step back and figure out how to solve your problem without an IntrinsicAdd8.

Why are you working with individual bytes in any case? Stick to int-sized objects unless you have a really good reason to use something smaller.

匿名。 2024-10-25 15:30:45

创建一个新答案,因为您的编辑改变了一些内容:

  • 使用_InterlockedExchange8
  • 将另一个转换为 long,进行交换并将结果转换回 X

第一个根本不起作用。即使该函数存在,它也允许您一次原子地更新一个字节。这意味着对象作为一个整体将通过一系列步骤进行更新,而这些步骤不会是原子的。

第二个也不起作用,除非 Xlong 大小的 POD 类型。 (除非它在 ​​sizeof(long) 边界上对齐,并且除非它的大小与 long 相同)

为了解决这个问题,您需要缩小范围确定 X 可能是什么类型。首先当然是保证是POD类型吗?如果不是,您将遇到完全不同的问题,因为您无法安全地将非 POD 类型视为原始内存字节。

其次,X 可以有哪些尺寸?互锁功能可以处理 16、32 位宽度,根据具体情况,也可以处理 64 甚至 128 位宽度。

这是否涵盖了您可能遇到的所有情况?

如果没有,您可能不得不放弃这些原子操作,并接受普通的旧锁。锁定互斥体以确保一次只有一个线程接触这些对象。

Creating a new answer because your edit changed things a bit:

  • Use _InterlockedExchange8
  • Cast another to long, do exchange and cast result back to X

The first simply won't work. Even if the function existed, it would allow you to atomically update a byte at a time. Which means that the object as a whole would be updated in a series of steps which wouldn't be atomic.

The second doesn't work either, unless X is a long-sized POD type. (and unless it is aligned on a sizeof(long) boundary, and unless it is of the same size as a long)

In order to solve this problem you need to narrow down what types X might be. First, of course, is it guaranteed to be a POD type? If not, you have an entirely different problem, as you can't safely treat non-POD types as raw memory bytes.

Second, what sizes may X have? The Interlocked functions can handle 16, 32 and, depending on circumstances, maybe 64 or even 128 bit widths.

Does that cover all the cases you can encounter?

If not, you may have to abandon these atomic operations, and settle for plain old locks. Lock a Mutex to ensure that only one thread touches these objects at a time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文