在汇编语言级别如何实现线程同步?

发布于 2024-08-23 08:49:30 字数 408 浏览 2 评论 0原文

虽然我熟悉并发编程概念,例如互斥体和信号量,但我从未理解它们是如何在汇编语言级别实现的。

我想象有一组内存“标志”说:

  • 锁 A 由线程 1 持有
  • 锁 B 由线程 3 持有
  • 锁 C 不由任何线程持有

但是如何在线程之间同步访问这些标志?像这个简单的例子只会产生竞争条件:

  mov edx, [myThreadId]
wait:
  cmp [lock], 0
  jne wait
  mov [lock], edx
  ; I wanted an exclusive lock but the above 
  ; three instructions are not an atomic operation :(

While I'm familiar with concurrent programming concepts such as mutexes and semaphores, I have never understood how they are implemented at the assembly language level.

I imagine there being a set of memory "flags" saying:

  • lock A is held by thread 1
  • lock B is held by thread 3
  • lock C is not held by any thread
  • etc

But how is access to these flags synchronized between threads? Something like this naive example would only create a race condition:

  mov edx, [myThreadId]
wait:
  cmp [lock], 0
  jne wait
  mov [lock], edx
  ; I wanted an exclusive lock but the above 
  ; three instructions are not an atomic operation :(

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

高速公鹿 2024-08-30 08:49:30
  • 在实践中,这些往往通过 CASLL/SC
    (...以及在放弃线程的时间片之前进行一些旋转 - 通常通过调用切换上下文的内核函数。)
  • 如果您只需要 spinlock,维基百科为您提供了一个示例,在 x86/x64 上用 CAS 换取前缀为 xchg 的锁。因此,从严格意义上来说,创建自旋锁并不需要 CAS——但仍然需要某种原子性。在这种情况下,它利用原子操作将寄存器写入内存并在一步中返回该内存槽的先前内容。 (进一步澄清一下:lock 前缀断言 #LOCK 信号,确保当前 CPU 具有对内存的独占访问权限。在当今的 CPU 上,不一定以这种方式执行,但效果是一样的。通过使用 xchg,我们可以确保在读取和写入之间不会被抢占,因为指令不会被中途中断。因此,如果我们有一个虚构的锁定 mov。 reg0, mem / lock mov mem, reg1 对(我们不这样做),这不会完全相同 - 它可以在两个 mov 之间被抢占。)
  • 在当前架构上,正如在评论中,您通常最终会使用 CPU 的原子原语和内存子系统提供的一致性协议。
  • 因此,您不仅必须使用这些原语,还要考虑架构保证的缓存/内存一致性。
  • 实施上也可能存在细微差别。考虑例如自旋锁:
    • 您可能应该使用例如 TTAS 自旋锁,而不是简单的实现有一些指数退避
    • 在超线程 CPU 上,您可能应该发出暂停指令,作为您正在旋转的提示 - 以便您运行的核心可以在此期间执行一些有用的操作
    • 你真的应该在一段时间后放弃旋转并将控制权让给其他线程
    • 等等...
  • 这仍然是用户模式 ​​- 如果您正在编写内核,您可能还有一些其他可以使用的工具(因为您是调度线程并处理/启用/禁用中断的人)。
  • In practice, these tend to be implemented with CAS and LL/SC.
    (...and some spinning before giving up the time slice of the thread - usually by calling into a kernel function that switches context.)
  • If you only need a spinlock, wikipedia gives you an example which trades CAS for lock prefixed xchg on x86/x64. So in a strict sense, a CAS is not needed for crafting a spinlock - but some kind of atomicity is still required. In this case, it makes use of an atomic operation that can write a register to memory and return the previous contents of that memory slot in a single step. (To clarify a bit more: the lock prefix asserts the #LOCK signal that ensures that the current CPU has exclusive access to the memory. On todays CPUs it is not necessarily carried out this way, but the effect is the same. By using xchg we make sure that we will not get preempted somewhere between reading and writing, since instructions will not be interrupted half-way. So if we had an imaginary lock mov reg0, mem / lock mov mem, reg1 pair (which we don't), that would not quite be the same - it could be preempted just between the two movs.)
  • On current architectures, as pointed out in the comments, you mostly end up using the atomic primitives of the CPU and the coherency protocols provided by the memory subsystem.
  • For this reason, you not only have to use these primitives, but also account for the cache/memory coherency guaranteed by the architecture.
  • There may be implementation nuances as well. Considering e.g. a spinlock:
    • instead of a naive implementation, you should probably use e.g. a TTAS spin-lock with some exponential backoff,
    • on a Hyper-Threaded CPU, you should probably issue pause instructions that serve as hints that you're spinning - so that the core you are running on can do something useful during this
    • you should really give up on spinning and yield control to other threads after a while
    • etc...
  • this is still user mode - if you are writing a kernel, you might have some other tools that you can use as well (since you are the one that schedules threads and handles/enables/disables interrupts).
西瑶 2024-08-30 08:49:30

x86 架构长期以来就有一条名为 xchg 的指令,它将寄存器的内容与内存位置进行交换。 xchg 一直是原子的。

还有一个lock前缀可以应用于任何单个指令以使该指令原子化。在多处理器系统出现之前,这一切实际上是为了防止在锁定指令的中间传递中断。 (xchg 被隐式锁定)。

本文有一些使用 xchg 实现自旋锁的示例代码
http://en.wikipedia.org/wiki/Spinlock

当多 CPU 和更高版本的多核系统时当开始构建时,需要更复杂的系统来确保 lock 和 xchg 能够同步所有内存子系统,包括所有处理器上的 l1 缓存。大约在这个时候,对锁定和无锁算法的新研究表明,原子 CompareAndSet 是一种更灵活的原语,因此更现代的 CPU 将其作为指令。

附录:在评论中 andras 提供了一个“尘土飞扬的旧”指令列表,允许使用 lock 前缀。 http://pdos.csail.mit.edu/6.828/2007 /readings/i386/LOCK.htm

The x86 architecture, has long had an instruction called xchg which will exchange the contents of a register with a memory location. xchg has always been atomic.

There has also always been a lock prefix that could be applied to any a single instruction to make that instruction atomic. Before there were multi processor systems, all this really did was to prevent an interrupt from being delivered in the middle of a locked instruction. (xchg was implicitly locked).

This article has some sample code using xchg to implement a spinlock
http://en.wikipedia.org/wiki/Spinlock

When multi CPU and later multi Core systems began to be built, more sophisticated systems were needed to insure that lock and xchg would synchronize all of the memory subsystems, including l1 cache on all of the processors. About this time, new research into locking and lockless algorithms showed that atomic CompareAndSet was a more flexible primitive to have, so more modern CPUs have that as an instruction.

Addendum: In comments andras supplied a "dusty old" list of instructions which allow the lock prefix. http://pdos.csail.mit.edu/6.828/2007/readings/i386/LOCK.htm

︶ ̄淡然 2024-08-30 08:49:30

我喜欢将线程同步视为自下而上,其中处理器和操作系统提供从原始到更复杂的构造。

在处理器级别,您拥有 CAS 和 LL/SC,它们允许您在单个原子操作中执行测试和存储。 ..您还拥有其他处理器结构,允许您禁用和启用中断(但是它们被认为是危险的...在某些情况下您别无选择,只能使用它们)

操作系统提供了在任务之间进行上下文切换的能力,这可以每当线程使用其时间片时都会发生......或者它可能由于其他原因而发生(我将谈到这一点)

然后有更高级别的构造,例如互斥体,它使用处理器提供的这些原始机制(想想旋转互斥体)。 ..它将不断等待条件变为真,并以原子方式检查该条件,

然后这些旋转互斥体可以使用操作系统提供的功能(上下文切换和系统调用,例如将控制权交给另一个线程的yield),并为我们提供互斥

体这些结构被更高级别的结构进一步利用,例如条件变量(它可以跟踪有多少线程正在等待互斥锁,以及当互斥锁可用时首先允许哪个线程)

这些结构可以进一步用于提供更复杂的同步结构...示例:信号量等

I like to think of thread synchronization as a bottom up where processor and operating system provide construct that are primitive to more sophisticated

At the processor level you have CAS and LL/SC which allow you to perform a test and store in a single atomic operation ... you also have other processor constructs that allow you to disable and enable interrupt (however they are considered dangerous ... under certain circumstances you have no other option but to use them)

operating system provides the ability to context switch between tasks which can happen every time a thread has used its time slice ... or it can happen due to otgher reasons (I will come to that)

then there are higher level constructs like mutexes which uses these primitive mechanisms provided by processor (think spinning mutex) ... which will continuously wait for the condition to become true and checks for that condition atomically

then these spinning mutex can use the functionality provided by OS (context switch and system calls like yield which relinquishes the control to another thread) and gives us mutexes

these constructs are further utilized by higher level constructs like conditional variables (which can keep track of how many threads are waiting for the mutex and which thread to allow first when the mutex becomes available)

These constructs than can be further used to provide more sophisticated synchronization constructs ... example : semaphores etc

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文