并发软件中读写的真实示例

发布于 2024-08-31 14:02:09 字数 140 浏览 9 评论 0原文

我正在寻找需要在并发系统中读取和写入访问相同值的真实示例。

在我看来,存在许多信号量或锁是因为没有已知的替代方案(对于实现者而言),但是您是否知道需要互斥体的任何模式?

在某种程度上,我正在寻找现实世界中并发软件的标准难题集的候选者。

I'm looking for real world examples of needing read and write access to the same value in concurrent systems.

In my opinion, many semaphores or locks are present because there's no known alternative (to the implementer,) but do you know of any patterns where mutexes seem to be a requirement?

In a way I'm asking for candidates for the standard set of HARD problems for concurrent software in the real world.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

小瓶盖 2024-09-07 14:02:09

使用哪种锁取决于多个线程如何访问数据。如果您可以微调用例,有时可以完全消除对独占锁的需要。

仅当您的用例要求共享数据必须始终 100% 准确时才需要独占锁。这是大多数开发人员开始时的默认设置,因为这就是我们通常思考数据的方式。

但是,如果您使用数据的目的可以容忍一定的“松散性”,则可以使用多种技术在线程之间共享数据,而无需在每次访问时使用独占锁。

例如,如果您有一个数据链表,并且您对该链表的使用不会因在列表遍历中多次看到同一节点而感到不安,并且如果在插入后没有立即看到插入也不会感到不安(或类似的工件),您可以使用原子指针交换执行列表插入和删除,而无需在插入或删除操作周围使用完全停止的互斥锁。

另一个例子:如果您有一个数组或列表对象,主要由线程读取,并且仅偶尔由主线程更新,则可以通过维护列表的两个副本来实现无锁更新:一个是“活动”的,另一个是“活动”的线程可以读取另一个“离线”线程,您可以在自己的线程的隐私中写入该线程。要执行更新,请将“实时”列表的内容复制到“离线”列表中,对离线列表执行更新,然后使用原子指针交换将离线列表指针交换到实时列表指针。然后,您将需要某种机制来让读者从现在离线的列表中“流失”。在垃圾收集系统中,您只需释放对离线列表的引用即可 - 当最后一个消费者使用完它时,它将被 GC 回收。在非 GC 系统中,您可以使用引用计数来跟踪有多少读者仍在使用该列表。对于本示例,只有一个线程被指定为列表更新程序是理想的。如果需要多个更新程序,则需要在更新操作周围放置一个锁,但仅用于序列化更新程序 - 无锁,并且不会对列表的读取者产生性能影响。

我所知道的所有无锁资源共享技术都需要使用原子交换(又名 InterlockedExchange)。这通常会在很短的时间内转化为 CPU 中的特定指令和/或硬件总线锁定(x86 汇编器中读取或写入操作码上的锁定前缀)。在多进程系统上,原子交换可能会强制其他处理器上的缓存失效(双进程奔腾 II 上就是这种情况),但我认为这在当前的多核芯片上不是一个大问题。即使有这些性能警告,无锁的运行速度也比采用完全停止内核事件对象要快得多。仅调用内核 API 函数就需要数百个时钟周期(切换到内核模式)。

现实场景的示例:

  1. 生产者/消费者工作流程。 Web 服务接收数据的 http 请求,将请求放入内部队列,工作线程从队列中提取工作项并执行工作。队列是读/写的并且必须是线程安全的。
  2. 随着所有权的变化,线程之间共享数据。线程 1 分配一个对象,将其扔给线程 2 进行处理,并且再也不想看到它。线程2负责处理该对象。内存管理系统(malloc/free)必须是线程安全的。
  3. 文件系统。这几乎总是操作系统服务并且已经完全线程安全,但值得将其包含在列表中。
  4. 引用计数。当引用数降至零时释放资源。递增/递减/测试操作必须是线程安全的。这些通常可以使用原子原语而不是完全停止的内核互斥锁来实现。

What kind of locks are used depends on how the data is being accessed by multiple threads. If you can fine tune the use case, you can sometimes eliminate the need for exclusive locks completely.

An exclusive lock is needed only if your use case requires that the shared data must be 100% exact all the time. This is the default that most developers start with because that's how we think about data normally.

However, if what you are using the data for can tolerate some "looseness", there are several techniques to share data between threads without the use of exclusive locks on every access.

For example, if you have a linked list of data and if your use of that linked list would not be upset by seeing the same node multiple times in a list traversal and would not be upset if it did not see an insert immediately after the insert (or similar artifacts), you can perform list inserts and deletes using atomic pointer exchange without the need for a full-stop mutex lock around the insert or delete operation.

Another example: if you have an array or list object that is mostly read from by threads and only occasionally updated by a master thread, you could implement lock-free updates by maintaining two copies of the list: one that is "live" that other threads can read from and another that is "offline" that you can write to in the privacy of your own thread. To perform an update, you copy the contents of the "live" list into the "offline" list, perform the update to the offline list, and then swap the offline list pointer into the live list pointer using an atomic pointer exchange. You will then need some mechanism to let the readers "drain" from the now offline list. In a garbage collected system, you can just release the reference to the offline list - when the last consumer is finished with it, it will be GC'd. In a non-GC system, you could use reference counting to keep track of how many readers are still using the list. For this example, having only one thread designated as the list updater would be ideal. If multiple updaters are needed, you will need to put a lock around the update operation, but only to serialize updaters - no lock and no performance impact on readers of the list.

All the lock-free resource sharing techniques I'm aware of require the use of atomic swaps (aka InterlockedExchange). This usually translates into a specific instruction in the CPU and/or a hardware bus lock (lock prefix on a read or write opcode in x86 assembler) for a very brief period of time. On multiproc systems, atomic swaps may force a cache invalidation on the other processors (this was the case on dual proc Pentium II) but I don't think this is as much of a problem on current multicore chips. Even with these performance caveats, lock-free runs much faster than taking a full-stop kernel event object. Just making a call into a kernel API function takes several hundred clock cycles (to switch to kernel mode).

Examples of real-world scenarios:

  1. producer/consumer workflows. Web service receives http requests for data, places the request into an internal queue, worker thread pulls the work item from the queue and performs the work. The queue is read/write and has to be thread safe.
  2. Data shared between threads with change of ownership. Thread 1 allocates an object, tosses it to thread 2 for processing, and never wants to see it again. Thread 2 is responsible for disposing the object. The memory management system (malloc/free) must be thread safe.
  3. File system. This is almost always an OS service and already fully thread safe, but it's worth including in the list.
  4. Reference counting. Releases the resource when the number of references drops to zero. The increment/decrement/test operations must be thread safe. These can usually be implemented using atomic primitives instead of full-stop kernal mutex locks.
掐死时间 2024-09-07 14:02:09

大多数现实世界的并发软件在某种程度上都有某种形式的同步要求。通常,编写得更好的软件会煞费苦心地减少所需的锁定量,但在某些时候仍然需要它。

例如,我经常进行模拟,其中发生某种形式的聚合操作。通常,有多种方法可以在模拟阶段本身防止锁定(即:使用线程本地状态数据等),但实际聚合部分通常需要在最后进行某种形式的锁定。

幸运的是,这变成了每个线程的锁,而不是每个工作单元的锁。就我而言,这很重要,因为我通常对数十万或数百万个工作单元进行操作,但大多数时候,它发生在具有 4-16 个 PE 的系统上,这意味着我通常会限制类似数量的执行单元。通过使用这种类型的机制,您仍然会锁定,但是您会在数十个元素之间锁定,而不是潜在的数百万个元素。

Most real world, concurrent software, has some form of requirement for synchronization at some level. Often, better written software will take great pains to reduce the amount of locking required, but it is still required at some point.

For example, I often do simulations where we have some form of aggregation operation occurring. Typically, there are ways to prevent locking during the simulation phase itself (ie: use of thread local state data, etc), but the actual aggregation portion typically requires some form of lock at the end.

Luckily, this becomes a lock per thread, not per unit of work. In my case, this is significant, since I'm typically doing operations on hundreds of thousands or millions of units of work, but most of the time, it's occuring on systems with 4-16 PEs, which means I'm usually restricting to a similar number of units of execution. By using this type of mechanism, you're still locking, but you're locking between tens of elements instead of potentially millions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文