c++98 中关于 __sync_synchronize() 的 C++0x 原子实现问题
我编写了以下原子模板,旨在模仿即将在即将推出的 c++0x 标准中提供的原子操作。
但是,我不确定围绕底层值的返回进行的 __sync_synchronize() 调用是否必要。
根据我的理解, __sync_synchronize() 是一个完整的内存屏障,我不确定在返回对象值时是否需要如此昂贵的调用。
我很确定在值的设置周围需要它,但我也可以使用程序集来实现它。
__asm__ __volatile__ ( "rep;nop": : :"memory" );
有谁知道我是否确实需要在返回对象时使用synchronize()。
M.
template < typename T >
struct atomic
{
private:
volatile T obj;
public:
atomic( const T & t ) :
obj( t )
{
}
inline operator T()
{
__sync_synchronize(); // Not sure this is overkill
return obj;
}
inline atomic< T > & operator=( T val )
{
__sync_synchronize(); // Not sure if this is overkill
obj = val;
return *this;
}
inline T operator++()
{
return __sync_add_and_fetch( &obj, (T)1 );
}
inline T operator++( int )
{
return __sync_fetch_and_add( &obj, (T)1 );
}
inline T operator+=( T val )
{
return __sync_add_and_fetch( &obj, val );
}
inline T operator--()
{
return __sync_sub_and_fetch( &obj, (T)1 );
}
inline T operator--( int )
{
return __sync_fetch_and_sub( &obj, (T)1 );
}
inline T operator-=( T )
{
return __sync_sub_and_fetch( &obj, val );
}
// Perform an atomic CAS operation
// returning the value before the operation
inline T exchange( T oldVal, T newVal )
{
return __sync_val_compare_and_swap( &obj, oldval, newval );
}
};
更新:我想确保由于编译器优化而导致读/写重新排序时操作是一致的。
I have written the followin atomic template with a view to mimicing the atomic operations which will be available in the upcoming c++0x standard.
However, I am not sure that the __sync_synchronize() call I have around the returning of the underlying value are necessary.
From my understanding, __sync_synchronize() is a full memory barrier and I'm not sure I need such a costly call when returning the object value.
I'm pretty sure it'll be needed around the setting of the value but I could also implement this with the assembly ..
__asm__ __volatile__ ( "rep;nop": : :"memory" );
Does anyone know wether I definitely need the synchronize() on return of the object.
M.
template < typename T >
struct atomic
{
private:
volatile T obj;
public:
atomic( const T & t ) :
obj( t )
{
}
inline operator T()
{
__sync_synchronize(); // Not sure this is overkill
return obj;
}
inline atomic< T > & operator=( T val )
{
__sync_synchronize(); // Not sure if this is overkill
obj = val;
return *this;
}
inline T operator++()
{
return __sync_add_and_fetch( &obj, (T)1 );
}
inline T operator++( int )
{
return __sync_fetch_and_add( &obj, (T)1 );
}
inline T operator+=( T val )
{
return __sync_add_and_fetch( &obj, val );
}
inline T operator--()
{
return __sync_sub_and_fetch( &obj, (T)1 );
}
inline T operator--( int )
{
return __sync_fetch_and_sub( &obj, (T)1 );
}
inline T operator-=( T )
{
return __sync_sub_and_fetch( &obj, val );
}
// Perform an atomic CAS operation
// returning the value before the operation
inline T exchange( T oldVal, T newVal )
{
return __sync_val_compare_and_swap( &obj, oldval, newval );
}
};
Update: I want to make sure that the operations are consistent in the face of read/write re-ordering due to compiler optimisations.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,说几句小话:
挥发性在这里毫无用处,更何况你自己把所有的屏障都做了。
inline 是不需要的,因为当该方法在类内部定义时就隐含了它。
Getters 和 Setters:
为了确保读取和写入时内存访问的总顺序,每次访问都需要两个屏障(如下所示)。我会很高兴只使用障碍(II)和(III),因为它们足以满足我想到的某些用途(例如,指针/布尔值表示数据在那里,自旋锁),但是,除非另有说明,我不会忽略其他障碍,因为有人可能需要它们(如果有人证明你可以省略一些障碍而不限制可能的用途,那就太好了,但我认为这是不可能的)。
当然,这会变得不必要的复杂和缓慢。
也就是说,我只是放弃障碍,甚至在类似模板的任何地方使用障碍的想法。请注意:
顺便说一句,c++0x 接口允许您指定精确的内存排序约束。
First, some petty remarks:
volatile is useless here, even more that you make all the barriers yourself.
inline is unneeded, since it is implied when the method is defined inside the class.
Getters and setters:
To assure total ordering of the memory accesses on read and write, you need two barriers on each access (like this). I would be happy with only barriers (II) and (III) as they suffice for some uses I came up with (eg. pointer/boolean saying data is there, spinlock), but, unless specified otherwise, I would not omit the others, because someone might need them (it would be nice if someone showed you can omit some of the barriers without restricting possible uses, but I don't think it's possible).
Of course, this would be unnecessarily complicated and slow.
That said, I would just dump the barriers, and even the idea of using the barriers in any place of a similar template. Note that:
BTW, the c++0x interface allows you to specify precise memory ordering constraints.
简短版本:这太过分了。
长版:
为什么你想将这个类实现为模板?这是没有意义的,因为原子操作只允许在 1-8 字节的整数类型上进行,您甚至无法确定所有平台都支持 8 字节整数。
您应该将原子类实现为非模板版本,并使用硬件/系统的“本机”整数类型。 32 位处理器/操作系统上为 int32_t,64 位系统上为 int64_t。例如:
BOOST_STATIC_ASSERT 直接从 C++0x 到“static_assert()”。
如果您使用“完美适合”整数类型,您可以像这样编写运算符:
因为 obj 是易失性的,所以保证获取该值并且不返回任何缓存的值。而且因为您使用的是“本机”整数类型,所以您可以确定读取这样的值是原子的。
同样,如果您使用正确的整数类型,则不需要同步。在 intel 32 位系统上读取/设置 int32 是原子的,在 64 位系统上读取/设置 int64 也是原子的。
我没有看到将原子作为模板实现有任何好处。原子操作依赖于平台。最好提供一个“atomic_int”类,它只保证至少有 4 个字节(如果您支持 32 位和 64 位系统)和一个“atomic_pointer”(如果您需要)。这样,类的名称也暗示了语义和目的。
如果您只使用“原子”,人们可能会想:“哇,我只需要把我的字符串类放入这个模板中,然后它就是线程安全的!”。
编辑:回答您的更新:“我想确保由于编译器优化而导致读/写重新排序时操作是一致的。”
为了防止编译器和 CPU 重新排序读/写操作,您需要 __sync_synchronize()。
但请注意,获取/释放语义可能会比完全屏障产生更好的性能。
Edit2:
你想阻止什么重新排序?在大多数情况下你想写成:
。因为您希望确保从函数返回后该值已被写入。
Short version: This IS overkill.
Long version:
Why do you want to implement this class as a template at all? It doesn't make sense because the atomic operations are allowed only on integer types from 1-8 bytes and you even can't be sure that an 8 byte integer is supported on all platforms.
You should implement your atomic class as a non-template version and use the "native" integer type of your hardware/system. This is int32_t on 32 bit processors/os and int64_t on a 64 bit systems. e.g.:
BOOST_STATIC_ASSERT is straight forward to "static_assert()" from C++0x.
If you're using the "perfect fit" integer type you can write the operator just like that:
Because obj is volatile it's guaranteed to fetch the value and not return any cached value. And because you're using the "native" integer type you can be sure that reading such a value is atomic.
Again, you don't need a synchronize if you're using the right integer type. Read/Set an int32 on a intel 32 bit system is atomic and so is Read/Set an int64 on a 64 bit system.
I don't see any benefit from implementing atomic as a template. Atomic operations are platform dependent. It's better to offer an "atomic_int" class that just guarantees to has at least 4 byte (if you're supporting 32bit and 64bit systems) and an "atomic_pointer" if you need it. This way the name of the class also implies a semantic and a purpose.
If you just use "atomic" than one could think: "Wow, i just have to put my string class in this template and then it's thread safe!".
Edit: to Answer your update: "I want to make sure that the operations are consistent in the face of read/write re-ordering due to compiler optimisations."
To prevent compiler and cpu to reorder read/write operations you need the __sync_synchronize().
But note that acquire/release semantics may yield better performance than full barriers.
Edit2:
What do you want prevent from reordering? In most cases you want to write this:
instead. Because you want to be sure that the value is written, once you returned from the function.