线程安全的数据结构设计
我必须设计一个要在多线程环境中使用的数据结构。基本 API 很简单:插入元素、删除元素、检索元素、检查元素是否存在。该结构的实现使用隐式锁定来保证单个 API 调用的原子性。在我实现这个之后,很明显,我真正需要的是多个 API 调用之间的原子性。例如,如果调用者需要在尝试插入元素之前检查该元素是否存在,那么即使每个 API 调用都是原子的,他也无法原子地执行此操作:
if(!data_structure.exists(element)) {
data_structure.insert(element);
}
该示例有些尴尬,但基本原理如下:要点是,从原子上下文返回后,我们不能再相信“exists”调用的结果(生成的程序集清楚地显示两个调用之间上下文切换的可能性很小)。
我目前想要解决这个问题的是通过数据结构的公共 API 公开锁。这样,客户端将必须显式锁定事物,但至少他们不必创建自己的锁。对于此类问题,是否有更好的众所周知的解决方案?只要我们这样做,您能推荐一些有关线程安全设计的优秀文献吗?
编辑:我有一个更好的例子。假设元素检索返回指向存储元素的引用或指针,而不是其副本。如何保护调用者在调用返回后安全地使用此指针\引用?如果您认为不返回副本是一个问题,那么请考虑深层复制,即还应该复制它们内部指向的另一个对象的对象。
谢谢。
I have to design a data structure that is to be used in a multi-threaded environment. The basic API is simple: insert element, remove element, retrieve element, check that element exists. The structure's implementation uses implicit locking to guarantee the atomicity of a single API call. After i implemented this it became apparent, that what i really need is atomicity across several API calls. For example if a caller needs to check the existence of an element before trying to insert it he can't do that atomically even if each single API call is atomic:
if(!data_structure.exists(element)) {
data_structure.insert(element);
}
The example is somewhat awkward, but the basic point is that we can't trust the result of "exists" call anymore after we return from atomic context (the generated assembly clearly shows a minor chance of context switch between the two calls).
What i currently have in mind to solve this is exposing the lock through the data structure's public API. This way clients will have to explicitly lock things, but at least they won't have to create their own locks. Is there a better commonly-known solution to these kinds of problems? And as long as we're at it, can you advise some good literature on thread-safe design?
EDIT: I have a better example. Suppose that element retrieval returns either a reference or a pointer to the stored element and not it's copy. How can a caller be protected to safely use this pointer\reference after the call returns? If you think that not returning copies is a problem, then think about deep copies, i.e. objects that should also copy another objects they point to internally.
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您要么提供外部锁定机制(不好),要么重新设计 API,例如
putIfAbsent
。后一种方法例如用于 Java 的 并发数据结构。并且,当涉及到此类基本集合类型时,您应该检查您选择的语言是否已在其标准库中提供它们。
[编辑]澄清一下:外部锁定对类的用户不利,因为它引入了潜在错误的另一个来源。是的,有时,对于并发数据结构来说,性能考虑确实比外部同步数据结构更糟糕,但这种情况很少见,而且通常只能由比我拥有更多知识/经验的人来解决/优化。
一个可能很重要的性能提示可以在下面的Will 的回答中找到。
[/edit]
[edit2]鉴于您的新示例:基本上您应该尝试保持集合的同步和元素的同步尽可能地分开。如果元素的生命周期与其在一个集合中的存在绑定在一起,那么您将遇到问题;当使用GC时,这种问题实际上变得更简单。否则,您将不得不使用一种代理而不是原始元素来放入集合中;在最简单的 C++ 情况下,您可以使用 boost::shared_ptr,它使用原子引用计数。在此插入通常的性能免责声明。当您使用 C++ 时(正如我怀疑您谈论指针和引用一样),boost::shared_ptr 和 boost::make_shared 的组合应该足以满足一段时间。
[/编辑2]
You either provide a mechanism for external locking (bad), or redesign the API, like
putIfAbsent
. The latter approach is for instance used for Java's concurrent data-structures.And, when it comes to such basic collection types, you should check-out whether your language of choice already offers them in its standard library.
[edit]To clarify: external locking is bad for the user of the class, as it introduces another source of potential bugs. Yes, there are times, when performance considerations indeed make matters worse for concurrent data-structures than externally synchronized one, but those cases are rare, and then they usually can only be solved/optimized by people with far more knowledge/experience than me.
One, maybe important, performance hint is found in Will's answer below.
[/edit]
[edit2]Given your new example: Basically you should try to keep the synchronization of the collection and the of the elements separated as much as possible. If the lifetime of the elements is bound to its presence in one collection, you will run into problems; when using a GC this kind of problem actually becomes simpler. Otherwise you will have to use a kind of proxy instead of raw elements to be in the collection; in the simplest case for C++ you would go and use
boost::shared_ptr
, which uses an atomic ref-count. Insert usual performance disclaimer here. When you are using C++ (as I suspect as you talk about pointers and references), the combination ofboost::shared_ptr
andboost::make_shared
should suffice for a while.[/edit2]
有时创建要插入的元素的成本很高。在这些情况下,您实际上无法定期创建可能已经存在的对象,以防万一它们存在。
一种方法是让
insertIfAbsent()
方法返回一个被锁定的“光标” - 它将一个占位符插入到内部结构中,这样其他线程就不会相信它不存在,但插入新对象。占位符可以包含一个锁,以便其他想要访问该特定元素的线程必须等待它被插入。在像 C++ 这样的 RAII 语言中,您可以使用智能堆栈类来封装返回的游标,以便在调用代码未提交时它会自动回滚。在 Java 中,使用
finalize()
方法会延迟一些,但仍然可以工作。另一种方法是调用者创建不存在的对象,但如果另一个线程“赢得了比赛”,则在实际插入中偶尔会失败。例如,内存缓存更新就是这样完成的。它可以很好地工作。
Sometimes its expensive to create an element to be inserted. In these scenarios you can't really afford to routinely create objects that might already exist just in case they do.
One approach is for the
insertIfAbsent()
method to return a 'cursor' that is locked - it inserts a place-holder into the internal structure so that no other thread can believe it is absent, but does not insert the new object. The placeholder can contain a lock so that other threads that want to access that particular element must wait for it to be inserted.In an RAII language like C++ you can use a smart stack class to encapsulate the returned cursor so that it automatically rolls-back if the calling code does not commit. In Java its a bit more deferred with the
finalize()
method, but can still work.Another approach is for the caller to create the object that isn't present, but that to occasionally fail in the actual insertion if another thread has 'won the race'. This is how, for example, memcache updates are done. It can work very well.
将存在性检查移至
.insert()
方法中怎么样?客户端调用它,如果它返回false
,您就知道出了问题。很像malloc()
在普通旧 C 中所做的事情——如果失败则返回NULL
,设置ERRNO
。显然,您也可以返回异常或对象的实例,并使您的生活从此变得复杂。
但是请不要依赖用户设置自己的锁。
What about moving the existance check into the
.insert()
method? A client calls it and if it returnsfalse
you know that something went wrong. Much like whatmalloc()
does in plain old C -- returnNULL
if failed, setERRNO
.Obviously you can also return an exception, or an instance of an object, and complicate your life up from there..
But please, don't rely on the user setting their own locks.
在 RAII 风格的方式中,您可以创建访问器/句柄对象(不知道它是如何调用的,可能存在这种模式),例如列表:
您复制(甚至三次)公共接口,但为用户提供根据需要,可以在内部锁定和安全舒适的外部锁定之间进行选择。
In an RAII style fashion you could create accessor/handle objects (don't know how its called, there probably exists a pattern of this), e.g. a List:
You duplicate (or even triplicate) the public interface, but you give users the choice to choose between internal and safe and comfortable external locking wherever it is required.
首先,你应该真正区分你的担忧。您需要担心两件事:
我强烈建议您使用代表您正在实现的数据结构类型的接口或虚拟基类。创建一个根本不执行任何锁定的简单实现。然后,创建第二个实现来包装第一个实现并在其上添加锁定。这将允许在不需要锁定的情况下实现更高性能的实现,并且将大大简化您的代码。
看起来你正在实现某种字典。您可以做的一件事是提供具有与组合语句等效的语义的方法。例如,
setdefault
是一个合理的函数,仅当字典中不存在相应的键时才会设置一个值。换句话说,我的建议是找出哪些方法组合经常一起使用,然后简单地创建以原子方式执行该操作组合的 API 方法。
First of all, you should really separate your concerns. You have two things to worry about:
I highly suggest you use an interface or virtual base class that represents the type of datastructure you are implementing. Create a simple implementation that does not do any locking, at all. Then, create a second implementation that wraps the first implementation and adds locking on top of it. This will allow a more performant implementation where locking isn't needed and will greatly simplify your code.
It looks like you are implementing some sort of dictionary. One thing you can do is provide methods that have semantics that are equivalent to the combined statement. For example
setdefault
is a reasonable function to provide that will set a value only if the corresponding key does not already exist in the dictionary.In other words, my recommendation would be to figure out what combinations of methods are frequently used together, and simply create API methods that perform that combination of operations atomically.