具有相同来宾操作系统、不同主机操作系统的 VMWare 上的线程同步行为
我是计算机科学课程的助教,我遇到了一个有趣的问题。最近的一项作业涉及 pthread 的同步技术。学生们必须使用互斥体、屏障、条件变量等来避免死锁……每个学生都在 VMWare 虚拟机(Workstation 或 Fusion,具体取决于他们的系统)上运行相同版本的 Ubuntu。显然,每个学生的主机操作系统可能不同。
现在这是令人困惑的部分:一些学生的同步行为与我运行他们的程序时看到的非常不同。对于某些学生,我可能会在运行她的作业时立即看到僵局。然而,当她在家运行时,她永远不会陷入僵局。
根据我的理解,死锁行为似乎仅取决于来宾操作系统的调度程序。主机操作系统应该与此无关。然而,即使我们都有相同的客户操作系统,问题仍然存在。有谁知道为什么会这样?
谢谢!
I'm a TA for a computer science course and I've run into an interesting problem. A recent assignment involved synchronization techniques for pthreads. The students had to avoid deadlocks using mutexes, barriers, conditional variables, etc... Each student is running the same version of Ubuntu on a VMWare virtual machine (either Workstation or Fusion depending on their system). Obviously the host OS may be different for each student.
Now here's the confusing part: the synchronization behavior for some students is very different from what I see when I run their program. For some student, I may run her assignment and see a deadlock immediately. However, when she runs it at home she never gets a deadlock.
From my understanding, the deadlocking behavior seems only dependent on the guest OS's scheduler. The host OS should have nothing to do with this. Yet, even though we all have the same guest OS the problem persists. Does anyone have any idea of why this might be?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
听起来学生陷入了不确定性僵局。这很常见——基本上有一个小窗口,代码可能会死锁,但除此之外应用程序运行正常。她很幸运,但你却没有。
小调度计时可能是罪魁祸首——您的 CPU 可能具有不同的时钟速度、不同的内核数量、不同的后台负载等,这足以改变调度。
这实际上是一个经典问题——多线程代码在测试环境中运行良好,但在生产环境中会遇到问题,因为竞争条件在测试中从未显现出来。
It sounds like the student has a non-deterministic deadlock. This is very common --- basically there is a small window where the code may deadlock, but otherwise the app runs OK. She has been lucky, but you haven't.
Small scheduling timings can be the culprit --- your CPU may have a different clock speed, or different number of cores, or a different background load, or whatever, and this is enough to change the scheduling.
This is actually a classic problem --- multithreaded code runs fine in the test environment but encounters problems in the production environment due to race conditions that just never manifested under test.
我假设您的虚拟机配置为仅使用一个虚拟核心,以便它可以在任何主机上运行。如果是这样,您可以正确地假设来宾操作系统的调度程序负责每一次抢占学生的作业。
然而,调度程序本身很大程度上受到其运行的硬件平台的影响。不同的系统会更快或更慢地运行客户操作系统,或者产生需要不同时间来处理或模拟的硬件中断。所有这些都会影响来宾操作系统的调度决策。
我真的很喜欢您分配虚拟机的方式,以确保每个人都拥有相同的开发和运行时环境来完成任务。然而,仅仅因为每个人都有相同的软件并不意味着他们会看到相同的行为。
I will assume that your virtual machine is configured to use only one virtual core so that it can be run across any host machine. If so, you are correct to assume that the guest OS's scheduler is responsible for every preemption of the student's assignment.
However, the scheduler itself is heavily influenced by the hardware platform it's run on. Different systems will run the guest OS faster or slower, or produce hardware interrupts that take different amounts of time to handle or emulate. All of this will affect the scheduling decisions of the guest OS.
I really like how you distribute a VM to make sure everyone has the same development and runtime environment for the assignment. However, just because everyone has the same software doesn't mean they'll see the same behavior.
您还需要考虑主机本身。我曾经遇到过相同 CPU 的情况(我认为),但英特尔芯片组版本略有不同。这意味着在一个 VM 中,任务切换寄存器在 KVM 中进行了优化,而在另一台 KVM 中则无法进行优化。这导致看似相同的虚拟机和主机在来宾中的时间不同。
另请记住,主机可能会在不同时间运行页面共享进程或任何其他事物,这可能会改变来宾中的时间。
在 valgrind 下运行来宾线程程序会很有趣。由于速度非常慢,线程应用程序经常会出现计时问题。
You need to also consider the host itself. I have had the case of identical CPUs (I thought) but had a slightly different intel chipset revision. This meant that in one VM the task switch register was optimized in KVM and in the other KVm was not able to optimize. This led to different timings in the guest for seemingly identical VMs and hosts.
Also bear in mind the host may be running page sharing processes or any number of other things at different times that could change the timing in the guest.
It can be fun to run your guest thread program under valgrind. As it is very slow, timing problems often pop up with threaded apps.