多线程程序可以是确定性的吗?
通常,人们认为多线程程序是不确定的,这意味着如果它崩溃,则几乎不可能重新创建导致该情况的错误。人们并不真正知道下一个线程将运行什么,以及它何时会再次被抢占。
当然,这与操作系统线程调度算法有关,而且人们不知道接下来要运行什么线程,以及它会有效运行多长时间。 程序执行顺序也发挥着作用,等等...
但是,如果您有用于线程调度的算法,并且如果您可以知道什么线程正在运行,那么多线程程序是否可以变得“确定性”,如,您能够重现崩溃吗?
Normally it is said that multi threaded programs are non-deterministic, meaning that if it crashes it will be next to impossible to recreate the error that caused the condition. One doesn't ever really know what thread is going to run next, and when it will be preempted again.
Of course this has to do with the OS thread scheduling algorithm and the fact that one doesn't know what thread is going to be run next, and how long it will effectively run.
Program execution order also plays a role as well, etc...
But what if you had the algorithm used for thread scheduling and what if you could know when what thread is running, could a multi threaded program then become "deterministic", as in, you'll be able to reproduce a crash?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
了解该算法实际上并不能让您预测何时会发生什么。程序或线程执行中发生的各种延迟都取决于环境条件,例如:可用内存、交换、传入中断、其他繁忙任务等。
如果您要将多线程程序映射到顺序执行,并且您的线程本身的行为是确定性的,那么您的整个程序就可以是确定性的,并且“并发”问题可以重现。当然,到那时它们就不再是并发问题了。
如果您想了解更多信息,http://en.wikipedia.org/wiki/Process_calculus 非常有趣的读物。
Knowing the algorithm will not actually allow you to predict what will happen when. All kinds of delays that happen in the execution of a program or thread are dependent on environmental conditions such as: available memory, swapping, incoming interrupts, other busy tasks, etc.
If you were to map your multi-threaded program to a sequential execution, and your threads in themselves behave deterministically, then your whole program could be deterministic and 'concurrency' issues could be made reproducible. Of course, at that point they would not be concurrency issues any more.
If you would like to learn more, http://en.wikipedia.org/wiki/Process_calculus is very interesting reading.
我的观点是:技术上不(但数学上是)。您可以编写确定性线程算法,但在您可以将其视为非确定性的一段合理时间后,预测应用程序的状态将非常困难。
My opinion is: technically no (but mathematically yes). You can write deterministic threading algorithm, but it will be extremely hard to predict state of the application after some sensible amount of time that you can treat it is non-deterministic.
有一些工具(正在开发中)会尝试以某种可预测的方式创建竞争条件,但这是前瞻性测试,而不是重建“野外错误”。
CHESS 就是一个例子。
There are some tools (in development) that will try to create race-conditions in a somewhat predictable manner but this is about forward-looking testing, not about reconstructing a 'bug in the wild'.
CHESS is an example.
可以在虚拟多线程机器上运行程序,其中向每个线程分配虚拟周期是通过一些完全确定性的过程完成的,可能使用伪随机生成器(可以在每个程序之前用一个常量作为种子)跑步)。另一种可能更有趣的可能性是拥有一个虚拟机,它可以在“飞溅”模式(它们接触的几乎任何变量的值对其他线程来说变得“未知”)和“清理”模式(其中运行的线程)之间交替运行。其中已知操作数的操作结果对于其他线程来说是可见且已知的)。我预计这种情况可能有点类似于硬件模拟:如果每个门的输出在其最小和最大传播时间之间被视为“未知”,但模拟无论如何都有效,这很好地表明设计是稳健的,但是有许多有用的设计无法在此类模拟中工作(基本上可以保证状态会演变成有效的组合,尽管无法保证是哪一种)。尽管如此,这可能是一个有趣的探索途径,因为许多程序的大部分都可以编写为即使在“splatter 模式”虚拟机中也能正常工作。
It would be possible to run a program on a virtual multi-threaded machine where the allocation of virtual cycles to each thread was done via some entirely deterministic process, possibly using a pseudo-random generator (which could be seeded with a constant before each program run). Another, possibly more interesting, possibility would be to have a virtual machine which would alternate between running threads in 'splatter' mode (where almost any variable they touch would have its value become 'unknown' to other threads) and 'cleanup' mode (where results of operations with known operands would be visible and known to other threads). I would expect the situation would probably be somewhat analogous to hardware simulation: if the output of every gate is regarded as "unknown" between its minimum and maximum propagation times, but the simulation works anyway, that's a good indication the design is robust, but there are many useful designs which could not be constructed to work in such simulations (the states would be essentially guaranteed to evolve into a valid combination, though one could not guarantee which one). Still, it might be an interesting avenue of exploration, since large parts of many programs could be written to work correctly even in a 'splatter mode' VM.
我认为这是不可行的。为了强制执行特定的线程交错,我们需要在共享变量上放置锁,强制线程以特定的顺序访问它们。这会导致严重的性能下降。
重放并发错误通常由记录和重放系统处理。由于记录如此大量的信息也会降低性能,因此最新的系统会进行部分日志记录,然后使用 SMT 解决方案完成线程交错。我相信此类系统的最新进展是 Symbiosis(在今年的 PLDI 会议上发布)。您可以在此 URL 中找到开源实现:
http://www. gsd.inesc-id.pt/~nmachado/software/Symbiosis_Tutorial.html
I don't think it is practicable. To enforce a specific thread interleaving we require to place locks on shared variables, forcing the threads to access them in a specific order. This would cause severe performance degradation.
Replaying concurrency bugs is usually handled by record&replay systems. Since the recording of such large amounts of information also degrades performance, the most recent systems do partial logging and later complete the thread interleavings using SMT solving. I believe that the most recent advance in this type of systems is Symbiosis (published in this year's PLDI conference). Tou can find open source implementations in this URL:
http://www.gsd.inesc-id.pt/~nmachado/software/Symbiosis_Tutorial.html
这实际上是当今许多系统中的一个有效要求,这些系统希望并行执行任务,但有时也需要一些确定性。
例如,移动公司希望并行处理多个用户的订阅事件,但希望一次执行一个用户的事件。
一种解决方案当然是编写所有内容以在单个线程上执行。另一种解决方案是确定性线程。我用 Java 编写了一个简单的库,可用于实现我在上面示例中描述的行为。看看这个 - https://github.com/mukulbansal93/definistic-threading。
现在,话虽如此,CPU 到线程或进程的实际分配掌握在操作系统手中。因此,每次运行同一个程序时,线程可能会以不同的顺序获取 CPU 周期。因此,您无法实现线程分配 CPU 周期的顺序的确定性。但是,通过在线程之间有效委派任务,以便将顺序任务分配给单个线程,您可以实现整个任务执行的确定性。
另外,回答你关于模拟碰撞的问题。所有现代 CPU 调度算法都不会出现饥饿问题。因此,每个线程都必然获得有保证的 CPU 周期。现在,您的崩溃可能是由于在单个 CPU 上执行特定线程序列造成的。无法重新运行相同的执行顺序,或者更确切地说,相同的 CPU 周期分配顺序。然而,如果您运行代码足够多次,现代 CPU 调度算法(无饥饿)和墨菲定律的结合将帮助您模拟错误。
PS,
足够次数
的定义相当模糊,取决于很多因素,例如整个程序所需的执行周期、线程数量等。从数学上来说,这是计算概率的粗略方法模拟在单个处理器上由相同执行序列引起的相同错误是 -1/执行所有已定义线程的所有原子操作的方法数
例如,具有 2 个线程和 2 个原子指令的程序每个处理器都可以在单个处理器上以 4 种不同的方式分配 CPU 周期。所以概率是 1/4。
This is actually a valid requirement in many systems today which want to execute tasks parallelly but also want some determinism from time to time.
For example, a mobile company would want to process subscription events of multiple users parallelly but would want to execute events of a single user one at a time.
One solution is to of course write everything to get executed on a single thread. Another solution is deterministic threading. I have written a simple library in Java that can be used to achieve the behavior I have described in the above example. Take a look at this- https://github.com/mukulbansal93/deterministic-threading.
Now, having said that, the actual allocation of CPU to a thread or process is in the hands of the OS. So, it is possible that the threads get the CPU cycles in a different order every time you run the same program. So, you cannot achieve the determinism in the order the threads are allocated CPU cycles. However, by delegating tasks effectively amongst threads such that sequential tasks are assigned to a single thread, you can achieve determinism in overall task execution.
Also, to answer your question about the simulation of a crash. All modern CPU scheduling algorithms are free from starvation. So, each and every thread is bound to get guaranteed CPU cycles. Now, it is possible that your crash was a result of the execution of a certain sequence of threads on a single CPU. There is no way to rerun that same execution order or rather the same CPU cycle allocation order. However, the combination of modern CPU scheduling algorithms being starvation-free and Murphy's law will help you simulate the error if you run your code enough times.
PS, the definition of
enough times
is quite vague and depends on a lot of factors like execution cycles need by the entire program, number of threads, etc. Mathematically speaking, a crude way to calculate the probability of simulating the same error caused by the same execution sequence is on a single processor is-1/Number of ways to execute all atomic operations of all defined threads
For instance, a program with 2 threads with 2 atomic instructions each can be allocated CPU cycles in 4 different ways on a single processor. So probability would be 1/4.
多线程程序中的许多崩溃与多线程本身(或相关的资源争用)无关。
Lots of crashes in multithreaded programs have nothing to do with the multithreading itself (or the associated resource contention).
我完全不同意这一点,当然多线程程序是不确定的,但单线程程序也是如此,考虑到用户输入、消息泵、鼠标/键盘处理和许多其他因素。多线程程序通常会使重现错误变得更加困难,但绝对不是不可能。无论出于何种原因,程序执行都不是完全随机的,存在某种可重复性(但不是可预测性),我通常可以在我的应用程序中相当快地重现多线程错误,但是我的应用程序中有很多详细的日志记录,例如最终用户的操作。
顺便说一句,如果您遇到崩溃,您是否也可以获得包含调用堆栈信息的崩溃日志?这将极大地帮助调试过程。
I disagree with this entirely, sure multi-threaded programs are non-deterministic, but then so are single-threaded ones, considering user input, message pumps, mouse/keyboard handling, and many other factors. A multi-threaded program usually makes it more difficult to reproduce the error, but definitely not impossible. For whatever reasons, program execution is not completely random, there is some sort of repeatability (but not predictability), I can usually reproduce multi-threaded bugs rather quickly in my apps, but then I have lots of verbose logging in my apps, for the end users' actions.
As an aside, if you are getting crashes, can't you also get crash logs, with call stack info? That will greatly aid in the debugging process.