c++98 中关于 __sync_synchronize() 的 C++0x 原子实现问题

发布于 2024-08-25 05:53:05 字数 1648 浏览 2 评论 0原文

我编写了以下原子模板,旨在模仿即将在即将推出的 c++0x 标准中提供的原子操作。

但是,我不确定围绕底层值的返回进行的 __sync_synchronize() 调用是否必要。

根据我的理解, __sync_synchronize() 是一个完整的内存屏障,我不确定在返回对象值时是否需要如此昂贵的调用。

我很确定在值的设置周围需要它,但我也可以使用程序集来实现它。

__asm__ __volatile__ ( "rep;nop": : :"memory" );

有谁知道我是否确实需要在返回对象时使用synchronize()。

M.

template < typename T >
struct atomic
{
private:
    volatile T obj;

public:
    atomic( const T & t ) :
        obj( t )
    {
    }

    inline operator T()
    {
        __sync_synchronize();   // Not sure this is overkill
        return obj;
    }

    inline atomic< T > & operator=( T val )
    {
        __sync_synchronize();   // Not sure if this is overkill
        obj = val;
        return *this;
    }

    inline T operator++()
    {
        return __sync_add_and_fetch( &obj, (T)1 );
    }

    inline T operator++( int )
    {
        return __sync_fetch_and_add( &obj, (T)1 );
    }

    inline T operator+=( T val )
    {
        return __sync_add_and_fetch( &obj, val );
    }

    inline T operator--()
    {
        return __sync_sub_and_fetch( &obj, (T)1 );
    }

    inline T operator--( int )
    {
        return __sync_fetch_and_sub( &obj, (T)1 );
    }

    inline T operator-=( T )
    {
        return __sync_sub_and_fetch( &obj, val );
    }

    // Perform an atomic CAS operation
    // returning the value before the operation
    inline T exchange( T oldVal, T newVal )
    {
        return __sync_val_compare_and_swap( &obj, oldval, newval );
    }

};

更新:我想确保由于编译器优化而导致读/写重新排序时操作是一致的。

I have written the followin atomic template with a view to mimicing the atomic operations which will be available in the upcoming c++0x standard.

However, I am not sure that the __sync_synchronize() call I have around the returning of the underlying value are necessary.

From my understanding, __sync_synchronize() is a full memory barrier and I'm not sure I need such a costly call when returning the object value.

I'm pretty sure it'll be needed around the setting of the value but I could also implement this with the assembly ..

__asm__ __volatile__ ( "rep;nop": : :"memory" );

Does anyone know wether I definitely need the synchronize() on return of the object.

M.

template < typename T >
struct atomic
{
private:
    volatile T obj;

public:
    atomic( const T & t ) :
        obj( t )
    {
    }

    inline operator T()
    {
        __sync_synchronize();   // Not sure this is overkill
        return obj;
    }

    inline atomic< T > & operator=( T val )
    {
        __sync_synchronize();   // Not sure if this is overkill
        obj = val;
        return *this;
    }

    inline T operator++()
    {
        return __sync_add_and_fetch( &obj, (T)1 );
    }

    inline T operator++( int )
    {
        return __sync_fetch_and_add( &obj, (T)1 );
    }

    inline T operator+=( T val )
    {
        return __sync_add_and_fetch( &obj, val );
    }

    inline T operator--()
    {
        return __sync_sub_and_fetch( &obj, (T)1 );
    }

    inline T operator--( int )
    {
        return __sync_fetch_and_sub( &obj, (T)1 );
    }

    inline T operator-=( T )
    {
        return __sync_sub_and_fetch( &obj, val );
    }

    // Perform an atomic CAS operation
    // returning the value before the operation
    inline T exchange( T oldVal, T newVal )
    {
        return __sync_val_compare_and_swap( &obj, oldval, newval );
    }

};

Update: I want to make sure that the operations are consistent in the face of read/write re-ordering due to compiler optimisations.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

秋日私语 2024-09-01 05:53:05

首先,说几句小话:

volatile T obj;

挥发性在这里毫无用处,更何况你自己把所有的屏障都做了。

inline T operator++( int )

inline 是不需要的,因为当该方法在类内部定义时就隐含了它。

Getters 和 Setters:

inline operator T()
{
    __sync_synchronize();   // (I)
    T tmp=obj;
    __sync_synchronize();   // (II)
    return tmp;
}

inline atomic< T > & operator=( T val )
{
    __sync_synchronize();   // (III)
    obj = val;
    __sync_synchronize();   // (IV)
    return *this;
}

为了确保读取和写入时内存访问的总顺序,每次访问都需要两个屏障(如下所示)。我会很高兴只使用障碍(II)和(III),因为它们足以满足我想到的某些用途(例如,指针/布尔值表示数据在那里,自旋锁),但是,除非另有说明,我不会忽略其他障碍,因为有人可能需要它们(如果有人证明你可以省略一些障碍而不限制可能的用途,那就太好了,但我认为这是不可能的)。

当然,这会变得不必要的复杂和缓慢。

也就是说,我只是放弃障碍,甚至在类似模板的任何地方使用障碍的想法。请注意:

  • 该接口的排序语义都是由您定义的;如果你认为界面在这里或那里有障碍,那么它们一定在这里或那里,就这样。如果您不定义它,您可以提出更有效的设计,因为特定问题可能不需要所有障碍,甚至不是完整的障碍。
  • 通常,如果您有可以为您带来性能优势的无锁算法,则可以使用原子;这意味着过早地悲观访问的接口可能无法用作其构建块,因为它会妨碍性能本身。
  • 无锁算法通常包含无法由一种原子数据类型封装的通信,因此您需要知道算法中发生了什么,以将屏障精确地放置在它们所属的位置(例如,在实现锁时,您需要一个屏障在您获得它之后,但在您释放它之前,这都是写入,至少原则上是这样)
  • 如果您不想遇到问题,并且不确定是否要放置算法中明确存在障碍,只需使用基于锁的算法。这没什么不好的。

顺便说一句,c++0x 接口允许您指定精确的内存排序约束。

First, some petty remarks:

volatile T obj;

volatile is useless here, even more that you make all the barriers yourself.

inline T operator++( int )

inline is unneeded, since it is implied when the method is defined inside the class.

Getters and setters:

inline operator T()
{
    __sync_synchronize();   // (I)
    T tmp=obj;
    __sync_synchronize();   // (II)
    return tmp;
}

inline atomic< T > & operator=( T val )
{
    __sync_synchronize();   // (III)
    obj = val;
    __sync_synchronize();   // (IV)
    return *this;
}

To assure total ordering of the memory accesses on read and write, you need two barriers on each access (like this). I would be happy with only barriers (II) and (III) as they suffice for some uses I came up with (eg. pointer/boolean saying data is there, spinlock), but, unless specified otherwise, I would not omit the others, because someone might need them (it would be nice if someone showed you can omit some of the barriers without restricting possible uses, but I don't think it's possible).

Of course, this would be unnecessarily complicated and slow.

That said, I would just dump the barriers, and even the idea of using the barriers in any place of a similar template. Note that:

  • the ordering semantics of that interface is all defined by you; and if you decide the interface has the barriers here or there, they must be here or there, period. If you don't define it, you can come up with more efficient design, because not all barriers, or even not full barriers, might be needed for a particular problem.
  • usually, you use atomics if you have a lock-free algorithm that could give you a performance advantage; this means an interface that prematurely pessimizes the accesses will probably be unusable as a building block of it, as it will hamper the performance itself.
  • lock-free algorithms typically contain communication that cannot be encapsulated by one atomic data type, so you need to know what's happening in the algorithm to place the barriers precisely where they belong (eg. when implementing a lock, you need a barrier after you've acquired it, but before you release it, which are both writes, at least in principle)
  • if you don't wanna have problems, and are not sure about placing the barriers explicitly in the algorithm, just use lock-based algorithms. There's nothing bad about it.

BTW, the c++0x interface allows you to specify precise memory ordering constraints.

爱,才寂寞 2024-09-01 05:53:05
inline operator T()
{
    __sync_synchronize();   // Not sure this is overkill
    return obj;
}

简短版本:这太过分了。

长版:

为什么你想将这个类实现为模板?这是没有意义的,因为原子操作只允许在 1-8 字节的整数类型上进行,您甚至无法确定所有平台都支持 8 字节整数。

您应该将原子类实现为非模板版本,并使用硬件/系统的“本机”整数类型。 32 位处理器/操作系统上为 int32_t,64 位系统上为 int64_t。例如:

#ifdef ...
typedef ... native_int_type;
#endif
// verify that you choosed the correct integer type
BOOST_STATIC_ASSERT(sizeof(native_int_type) == sizeof(void*));

BOOST_STATIC_ASSERT 直接从 C++0x 到“static_assert()”。

如果您使用“完美适合”整数类型,您可以像这样编写运算符:

operator native_int_type() { return obj; }

因为 obj 是易失性的,所以保证获取该值并且不返回任何缓存的值。而且因为您使用的是“本机”整数类型,所以您可以确定读取这样的值是原子的。

atomic& operator=( native_integer_type val )

同样,如果您使用正确的整数类型,则不需要同步。在 intel 32 位系统上读取/设置 int32 是原子的,在 64 位系统上读取/设置 int64 也是原子的。

我没有看到将原子作为模板实现有任何好处。原子操作依赖于平台。最好提供一个“atomic_int”类,它只保证至少有 4 个字节(如果您支持 32 位和 64 位系统)和一个“atomic_pointer”(如果您需要)。这样,类的名称也暗示了语义和目的。

如果您只使用“原子”,人们可能会想:“哇,我只需要把我的字符串类放入这个模板中,然后它就是线程安全的!”。


编辑:回答您的更新:“我想确保由于编译器优化而导致读/写重新排序时操作是一致的。”

为了防止编译器和 CPU 重新排序读/写操作,您需要 __sync_synchronize()。

但请注意,获取/释放语义可能会比完全屏障产生更好的性能。


Edit2:

inline atomic< T > & operator=( T val )
{
    __sync_synchronize();   // Not sure if this is overkill
    obj = val;
    return *this;
}

你想阻止什么重新排序?在大多数情况下你想写成:

    obj = val;
    __sync_synchronize();

。因为您希望确保从函数返回后该值已被写入。

inline operator T()
{
    __sync_synchronize();   // Not sure this is overkill
    return obj;
}

Short version: This IS overkill.

Long version:

Why do you want to implement this class as a template at all? It doesn't make sense because the atomic operations are allowed only on integer types from 1-8 bytes and you even can't be sure that an 8 byte integer is supported on all platforms.

You should implement your atomic class as a non-template version and use the "native" integer type of your hardware/system. This is int32_t on 32 bit processors/os and int64_t on a 64 bit systems. e.g.:

#ifdef ...
typedef ... native_int_type;
#endif
// verify that you choosed the correct integer type
BOOST_STATIC_ASSERT(sizeof(native_int_type) == sizeof(void*));

BOOST_STATIC_ASSERT is straight forward to "static_assert()" from C++0x.

If you're using the "perfect fit" integer type you can write the operator just like that:

operator native_int_type() { return obj; }

Because obj is volatile it's guaranteed to fetch the value and not return any cached value. And because you're using the "native" integer type you can be sure that reading such a value is atomic.

atomic& operator=( native_integer_type val )

Again, you don't need a synchronize if you're using the right integer type. Read/Set an int32 on a intel 32 bit system is atomic and so is Read/Set an int64 on a 64 bit system.

I don't see any benefit from implementing atomic as a template. Atomic operations are platform dependent. It's better to offer an "atomic_int" class that just guarantees to has at least 4 byte (if you're supporting 32bit and 64bit systems) and an "atomic_pointer" if you need it. This way the name of the class also implies a semantic and a purpose.

If you just use "atomic" than one could think: "Wow, i just have to put my string class in this template and then it's thread safe!".


Edit: to Answer your update: "I want to make sure that the operations are consistent in the face of read/write re-ordering due to compiler optimisations."

To prevent compiler and cpu to reorder read/write operations you need the __sync_synchronize().

But note that acquire/release semantics may yield better performance than full barriers.


Edit2:

inline atomic< T > & operator=( T val )
{
    __sync_synchronize();   // Not sure if this is overkill
    obj = val;
    return *this;
}

What do you want prevent from reordering? In most cases you want to write this:

    obj = val;
    __sync_synchronize();

instead. Because you want to be sure that the value is written, once you returned from the function.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文