线程安全的数据结构设计

发布于 2024-08-25 23:46:23 字数 639 浏览 8 评论 0原文

我必须设计一个要在多线程环境中使用的数据结构。基本 API 很简单：插入元素、删除元素、检索元素、检查元素是否存在。该结构的实现使用隐式锁定来保证单个 API 调用的原子性。在我实现这个之后，很明显，我真正需要的是多个 API 调用之间的原子性。例如，如果调用者需要在尝试插入元素之前检查该元素是否存在，那么即使每个 API 调用都是原子的，他也无法原子地执行此操作：

if(!data_structure.exists(element)) {
   data_structure.insert(element);
}

该示例有些尴尬，但基本原理如下：要点是，从原子上下文返回后，我们不能再相信“exists”调用的结果（生成的程序集清楚地显示两个调用之间上下文切换的可能性很小）。

我目前想要解决这个问题的是通过数据结构的公共 API 公开锁。这样，客户端将必须显式锁定事物，但至少他们不必创建自己的锁。对于此类问题，是否有更好的众所周知的解决方案？只要我们这样做，您能推荐一些有关线程安全设计的优秀文献吗？

编辑：我有一个更好的例子。假设元素检索返回指向存储元素的引用或指针，而不是其副本。如何保护调用者在调用返回后安全地使用此指针\引用？如果您认为不返回副本是一个问题，那么请考虑深层复制，即还应该复制它们内部指向的另一个对象的对象。

谢谢。

原文

I have to design a data structure that is to be used in a multi-threaded environment. The basic API is simple: insert element, remove element, retrieve element, check that element exists. The structure's implementation uses implicit locking to guarantee the atomicity of a single API call. After i implemented this it became apparent, that what i really need is atomicity across several API calls. For example if a caller needs to check the existence of an element before trying to insert it he can't do that atomically even if each single API call is atomic:

if(!data_structure.exists(element)) {
   data_structure.insert(element);
}

The example is somewhat awkward, but the basic point is that we can't trust the result of "exists" call anymore after we return from atomic context (the generated assembly clearly shows a minor chance of context switch between the two calls).

What i currently have in mind to solve this is exposing the lock through the data structure's public API. This way clients will have to explicitly lock things, but at least they won't have to create their own locks. Is there a better commonly-known solution to these kinds of problems? And as long as we're at it, can you advise some good literature on thread-safe design?

EDIT: I have a better example. Suppose that element retrieval returns either a reference or a pointer to the stored element and not it's copy. How can a caller be protected to safely use this pointer\reference after the call returns? If you think that not returning copies is a problem, then think about deep copies, i.e. objects that should also copy another objects they point to internally.

Thank you.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

晚风撩人 2024-09-01 23:46:23

您要么提供外部锁定机制（不好），要么重新设计 API，例如 putIfAbsent。后一种方法例如用于 Java 的并发数据结构。

并且，当涉及到此类基本集合类型时，您应该检查您选择的语言是否已在其标准库中提供它们。

[编辑]澄清一下：外部锁定对类的用户不利，因为它引入了潜在错误的另一个来源。是的，有时，对于并发数据结构来说，性能考虑确实比外部同步数据结构更糟糕，但这种情况很少见，而且通常只能由比我拥有更多知识/经验的人来解决/优化。

一个可能很重要的性能提示可以在下面的Will 的回答中找到。
[/edit]

[edit2]鉴于您的新示例：基本上您应该尝试保持集合的同步和元素的同步尽可能地分开。如果元素的生命周期与其在一个集合中的存在绑定在一起，那么您将遇到问题；当使用GC时，这种问题实际上变得更简单。否则，您将不得不使用一种代理而不是原始元素来放入集合中；在最简单的 C++ 情况下，您可以使用 boost::shared_ptr，它使用原子引用计数。在此插入通常的性能免责声明。当您使用 C++ 时（正如我怀疑您谈论指针和引用一样），boost::shared_ptr 和 boost::make_shared 的组合应该足以满足一段时间。
[/编辑2]

回复收藏 0 原文

享受孤独 2024-09-01 23:46:23

有时创建要插入的元素的成本很高。在这些情况下，您实际上无法定期创建可能已经存在的对象，以防万一它们存在。

一种方法是让 insertIfAbsent() 方法返回一个被锁定的“光标” - 它将一个占位符插入到内部结构中，这样其他线程就不会相信它不存在，但插入新对象。占位符可以包含一个锁，以便其他想要访问该特定元素的线程必须等待它被插入。

在像 C++ 这样的 RAII 语言中，您可以使用智能堆栈类来封装返回的游标，以便在调用代码未提交时它会自动回滚。在 Java 中，使用 finalize() 方法会延迟一些，但仍然可以工作。

另一种方法是调用者创建不存在的对象，但如果另一个线程“赢得了比赛”，则在实际插入中偶尔会失败。例如，内存缓存更新就是这样完成的。它可以很好地工作。

回复收藏 0 原文

调妓 2024-09-01 23:46:23

将存在性检查移至 .insert() 方法中怎么样？客户端调用它，如果它返回 false，您就知道出了问题。很像 malloc() 在普通旧 C 中所做的事情——如果失败则返回 NULL，设置 ERRNO。

显然，您也可以返回异常或对象的实例，并使您的生活从此变得复杂。

但是请不要依赖用户设置自己的锁。

回复收藏 0 原文

美男兮 2024-09-01 23:46:23

在 RAII 风格的方式中，您可以创建访问器/句柄对象（不知道它是如何调用的，可能存在这种模式），例如列表：

template <typename T>
class List {
    friend class ListHandle<T>;
    // private methods use NO locking
    bool _exists( const T& e ) { ... }
    void _insert( const T& e ) { ... }
    void _lock();
    void _unlock();
public:
    // public methods use internal locking and just wrap the private methods
    bool exists( const T& e ) {
        raii_lock l;
        return _exists( e );
    }
    void insert( const T& e ) {
        raii_lock l;
        _insert( e );
    }
    ...
};

template <typename T>
class ListHandle {
    List<T>& list;
public:
    ListHandle( List<T>& l ) : list(l) {
        list._lock();
    }
    ~ListHandle() {
        list._unlock();
    }
    bool exists( const T& e ) { return list._exists(e); }
    void insert( const T& e ) { list._insert(e); }
};


List<int> list;

void foo() {
    ListHandle<int> hndl( list ); // locks the list as long as it exists
    if( hndl.exists(element) ) {
        hndl.insert(element);
    }
    // list is unlocked here (ListHandle destructor)
}

您复制（甚至三次）公共接口，但为用户提供根据需要，可以在内部锁定和安全舒适的外部锁定之间进行选择。

In an RAII style fashion you could create accessor/handle objects (don't know how its called, there probably exists a pattern of this), e.g. a List:

template <typename T>
class List {
    friend class ListHandle<T>;
    // private methods use NO locking
    bool _exists( const T& e ) { ... }
    void _insert( const T& e ) { ... }
    void _lock();
    void _unlock();
public:
    // public methods use internal locking and just wrap the private methods
    bool exists( const T& e ) {
        raii_lock l;
        return _exists( e );
    }
    void insert( const T& e ) {
        raii_lock l;
        _insert( e );
    }
    ...
};

template <typename T>
class ListHandle {
    List<T>& list;
public:
    ListHandle( List<T>& l ) : list(l) {
        list._lock();
    }
    ~ListHandle() {
        list._unlock();
    }
    bool exists( const T& e ) { return list._exists(e); }
    void insert( const T& e ) { list._insert(e); }
};


List<int> list;

void foo() {
    ListHandle<int> hndl( list ); // locks the list as long as it exists
    if( hndl.exists(element) ) {
        hndl.insert(element);
    }
    // list is unlocked here (ListHandle destructor)
}

You duplicate (or even triplicate) the public interface, but you give users the choice to choose between internal and safe and comfortable external locking wherever it is required.

回复收藏 0 原文