在汇编语言级别如何实现线程同步？

发布于 2024-08-23 08:49:30 字数 408 浏览 2 评论 0原文

虽然我熟悉并发编程概念，例如互斥体和信号量，但我从未理解它们是如何在汇编语言级别实现的。

我想象有一组内存“标志”说：

锁 A 由线程 1 持有
锁 B 由线程 3 持有
锁 C 不由任何线程持有
等

但是如何在线程之间同步访问这些标志？像这个简单的例子只会产生竞争条件：

  mov edx, [myThreadId]
wait:
  cmp [lock], 0
  jne wait
  mov [lock], edx
  ; I wanted an exclusive lock but the above 
  ; three instructions are not an atomic operation :(

原文

While I'm familiar with concurrent programming concepts such as mutexes and semaphores, I have never understood how they are implemented at the assembly language level.

I imagine there being a set of memory "flags" saying:

lock A is held by thread 1
lock B is held by thread 3
lock C is not held by any thread
etc

But how is access to these flags synchronized between threads? Something like this naive example would only create a race condition:

  mov edx, [myThreadId]
wait:
  cmp [lock], 0
  jne wait
  mov [lock], edx
  ; I wanted an exclusive lock but the above 
  ; three instructions are not an atomic operation :(

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

高速公鹿 2024-08-30 08:49:30

在实践中，这些往往通过 CAS 和 LL/SC。
（...以及在放弃线程的时间片之前进行一些旋转 - 通常通过调用切换上下文的内核函数。）
如果您只需要 spinlock，维基百科为您提供了一个示例，在 x86/x64 上用 CAS 换取前缀为 xchg 的锁。因此，从严格意义上来说，创建自旋锁并不需要 CAS——但仍然需要某种原子性。在这种情况下，它利用原子操作将寄存器写入内存并在一步中返回该内存槽的先前内容。（进一步澄清一下：lock 前缀断言 #LOCK 信号，确保当前 CPU 具有对内存的独占访问权限。在当今的 CPU 上，不一定以这种方式执行，但效果是一样的。通过使用 xchg，我们可以确保在读取和写入之间不会被抢占，因为指令不会被中途中断。因此，如果我们有一个虚构的锁定 mov。 reg0, mem / lock mov mem, reg1 对（我们不这样做），这不会完全相同 - 它可以在两个 mov 之间被抢占。）
在当前架构上，正如在评论中，您通常最终会使用 CPU 的原子原语和内存子系统提供的一致性协议。
因此，您不仅必须使用这些原语，还要考虑架构保证的缓存/内存一致性。
实施上也可能存在细微差别。考虑例如自旋锁：
- 您可能应该使用例如 TTAS 自旋锁，而不是简单的实现有一些指数退避，
- 在超线程 CPU 上，您可能应该发出暂停指令，作为您正在旋转的提示 - 以便您运行的核心可以在此期间执行一些有用的操作
- 你真的应该在一段时间后放弃旋转并将控制权让给其他线程
- 等等...
这仍然是用户模式 - 如果您正在编写内核，您可能还有一些其他可以使用的工具（因为您是调度线程并处理/启用/禁用中断的人）。

In practice, these tend to be implemented with CAS and LL/SC.
(...and some spinning before giving up the time slice of the thread - usually by calling into a kernel function that switches context.)
If you only need a spinlock, wikipedia gives you an example which trades CAS for lock prefixed xchg on x86/x64. So in a strict sense, a CAS is not needed for crafting a spinlock - but some kind of atomicity is still required. In this case, it makes use of an atomic operation that can write a register to memory and return the previous contents of that memory slot in a single step. (To clarify a bit more: the lock prefix asserts the #LOCK signal that ensures that the current CPU has exclusive access to the memory. On todays CPUs it is not necessarily carried out this way, but the effect is the same. By using xchg we make sure that we will not get preempted somewhere between reading and writing, since instructions will not be interrupted half-way. So if we had an imaginary lock mov reg0, mem / lock mov mem, reg1 pair (which we don't), that would not quite be the same - it could be preempted just between the two movs.)
On current architectures, as pointed out in the comments, you mostly end up using the atomic primitives of the CPU and the coherency protocols provided by the memory subsystem.
For this reason, you not only have to use these primitives, but also account for the cache/memory coherency guaranteed by the architecture.
There may be implementation nuances as well. Considering e.g. a spinlock:
- instead of a naive implementation, you should probably use e.g. a TTAS spin-lock with some exponential backoff,
- on a Hyper-Threaded CPU, you should probably issue pause instructions that serve as hints that you're spinning - so that the core you are running on can do something useful during this
- you should really give up on spinning and yield control to other threads after a while
- etc...
this is still user mode - if you are writing a kernel, you might have some other tools that you can use as well (since you are the one that schedules threads and handles/enables/disables interrupts).