当前位置：文江博客话题详情

Microsoft.NET 和 Doom 的多核 CPU

发布于 2024-07-08 10:35:33 字数 1740 浏览 6 评论 0原文

正确的问题

有没有人在单核机器上经历过这种异常？

由于线程退出或应用程序请求，I/O 操作已中止。

一些上下文

在单 CPU 系统上，无论线程如何，一次仅执行一个 MSIL 指令。在操作之间，运行时会进行内务处理。

引入第二个CPU（或第二个核心），就可以在运行时执行内务处理的同时执行操作。因此，在单 CPU 计算机上完美运行的代码在多核环境中执行时可能会崩溃，甚至引发蓝屏。

有趣的是，超线程奔腾没有表现出这个问题。

我的示例代码在单核上完美运行，但在多核 CPU 上却表现不佳。它就在某个地方，但我仍在努力寻找它。其要点是，当它被实现为访问者模式时，它会在不可预测的迭代次数后消失，但是将方法移到访问者操作的对象中使问题消失。

对我来说，这表明该框架具有某种用于解析对象引用的内部哈希表，并且在多核系统上存在与访问此哈希表有关的竞争条件。

我目前还有使用 APM 处理串行通信的代码。它曾经在我的 USB 串行适配器的虚拟端口驱动程序内间歇性蓝屏，但我通过在每次 Stream.EndRead(IAsyncResult) 后执行 Thread.Sleep(0) 来修复此问题>

以随机间隔，当调用我提供给 Stream.BeginRead(...) 的 AsyncCallback 且处理程序尝试调用 Stream.EndRead(IAsyncResult) 时，它会抛出异常一个IOException，表明由于线程退出或应用程序请求，I/O 操作已中止。

我怀疑这也与多核相关，并且某种原因内部错误正在杀死等待线程，导致此行为。如果我的观点是正确的，那么该框架在多核环境中存在严重缺陷。虽然有我提到的解决方法，但您不能总是应用它们，因为有时需要将它们应用到其他框架代码内。

例如，如果您在网上搜索上述 IOException，您会发现它影响了那些显然甚至不知道自己正在使用多线程的人编写的代码，因为它发生在框架便利包装器的掩护下。

微软倾向于将这些错误报告视为无法重现。我怀疑这是因为该问题仅发生在多核系统和错误报告上，例如这个没有提到CPU的数量。

所以...请帮我确定问题。如果我是对的，我将必须能够用可重复的测试用例来证明它，因为我认为错误的是需要在框架和运行时进行错误修复。

有人建议问题更可能是我的代码而不是框架。

在调查该问题的变体 A 时，我已将问题代码移植到示例应用程序中，并对其进行了精简，直到只剩下在一个 CPU 上运行但在两个 CPU 上失败的线程设置和方法调用。

变体BI还没有这样测试过，因为我不再有任何单核系统。所以我重复这个问题：有人在单核平台上看到过这个异常吗？

不幸的是没有人能证实我的怀疑，只能反驳它。

告诉我我容易犯错是没有帮助的，我已经意识到了这一点。

如果您知道一种将 .NET 应用程序固定到单个 CPU 的方法，那么解决这个问题将会非常方便。 ---感谢VM的建议。我会这么做的，很好。

原文

The question proper

Has anyone experienced this exception on a single core machine?

The I/O operation has been aborted because of either a thread exit or an application request.

Some context

On a single CPU system, only one MSIL instruction is executed at a time, threads notwithstanding. Between operations, the runtime gets to do its housekeeping.

Introduce a second CPU (or a second core) and it becomes possible to have an operation execute while the runtime does housekeeping. As a result, code that works perfectly on a single CPU machine may crash - or even induce a bluescreen - when executed in a multcore environment.

Interestingly, HyperThreaded Pentiums do not manifest the problem.

I had sample code that worked perfectly on a single core and flaked on a multicore CPU. It's around somewhere but I'm still trying to find it. The gist of it was that when it was implemented as Visitor pattern, it would flake after an unpredictable number of iterations, but moving the method into the object on which the visitor had operated made the problem disappear.

To me this suggests that the framework has some kind of internal hash table for resolving object references, and on a multicore system a race condition exists with respect to accessing this.

I also currently have code using APM to process serial comms. It used to intermittently bluescreen inside the virtual comport driver for my USB serial adaptor, but I fixed this by doing a Thread.Sleep(0) after every Stream.EndRead(IAsyncResult)

At random intervals, when the AsyncCallback I supply to Stream.BeginRead(...) is invoked and the handler tries to invoke Stream.EndRead(IAsyncResult), it throws an IOException stating that The I/O operation has been aborted because of either a thread exit or an application request.

I suspect that this too is multicore related and that some sort of internal error is killing the wait thread, leading to this behaviour. If I am right about this then the framework has serious flaws in the context of a multicore environment. While there are workarounds such as I have mentioned, you can't always apply them because sometimes they need to be applied inside other framework code.

For example, if you search the net regarding the above IOException you will find it affecting code written by people who clearly don't even know they are using multiple threads because it happens under the covers of framework convenience wrappers.

Microsoft tends to blow off these bug reports as unreproduceable. I suspect this is because the problem only occurs on multicore systems and bug reports like this one don't mention the number of CPUs.

So... please help me pin down the problem. If I'm right about this I'm going to have to be able to prove it with repeatable test cases, because what I think is wrong is going to entail bugfixes in both framework and runtime.

It has been suggested that the problem is is more likely to be my code than the framework.

Investigating variant A of the issue, I have transplanted the problem code into a sample app and pared it down until the only things left were thread setup and method invocations that worked on one CPU and failed on two.

Variant B I have not so tested, because I no longer have any single core systems. So I repeat the question: has anyone seen this exception on a single core platform?

Unfortunately no-one can confirm my suspicion, only refute it.

It is not helpful to tell me that I am fallible, I am already aware of this.

If you know of a way to pin a .NET application to a single CPU it would be very handy for figuring this out. ---Thanks for the VM suggestion. I will do exactly that, good call.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

非要怀念 2024-07-15 10:35:34

蓝屏不仅仅是由于应用程序或框架中的错误造成的。蓝屏需要内核模式的“帮助”。您的问题之一是有缺陷的驱动程序，无论有缺陷的驱动程序是在哪个“时代”编码的。

关于一个线程关闭端口而另一个线程仍在使用它的可能性，我认为这可能与一些著名的错误有关框架内务管理。我认为这些错误并不取决于核心数量，但当核心数量更多时，受到这些错误影响的频率可能会增加。尝试添加 GC.KeepAlive 调用以防止框架过早删除您的端口。

回复收藏 0 原文

°如果伤别离去 2024-07-15 10:35:34

我目前正在重写我们的应用程序中使用的整个文件传输堆栈。从与其他工人的交谈中，我知道这种方法在几年前就已经有效，当时生产中使用的是单核笔记本电脑和低速连接。现在每个人都转向双核和高速互联网，整个软件显示出不可预测的结果。

因此，当我开始更多地学习代码时，我发现开发它的人对如何正确编写多线程代码没有任何想法。所有“同步”都是使用 Thread.Sleep() 完成的！线程管理是在“即发即忘”的基础上完成的。有人想停止线程吗？线程.Abort()! 该死！令人惊讶的是，这该死的东西竟然能起作用。

我的观点是——检查您的代码，如果您正在使用某些自定义硬件，请检查其驱动程序的代码。问题就在那里，而不是在.NET、Win32 或其他地方。

回复收藏 0 原文

临风闻羌笛 2024-07-15 10:35:34

根据您的描述，我倾向于归咎于 COM 端口驱动程序。它的驱动程序是在多核时代之前开发的吗？我曾经在这样的设备上遇到过类似的问题，幸运的是后来的驱动程序修订版修复了该问题。

添加：要回答有关如何将应用程序限制为单个 CPU 的问题，您需要将进程关联设置为单个 CPU。请参阅此链接。您也可以在进程开始使用任务管理器后执行此操作（右键单击任务管理器中的进程并选择“设置关联性...”）

回复收藏 0 原文

两人的回忆 2024-07-15 10:35:34

在 Vista 之前，当发出异步 IO 的线程终止时，任何正在进行的异步 IO 都会被终止。这往往会给出您报告的错误，即

I/O 操作已中止
由于线程退出或
申请请求。

我不确定这是否与您的问题相关，但是您是否从可以在操作完成之前终止的线程发出异步操作？

回复收藏 0 原文

逆夏时光 2024-07-15 10:35:34

我在这里完全无言以对。你说你的代码在双核机器上崩溃了，你怀疑微软的原因！！！

如今，每台机器都配备了双核甚至四核。如果 .net 框架在使用双核时存在任何重大问题，那么为什么 live Messenger、Live writer 和许多其他 .net 厚应用程序不会经常崩溃。我相信 SQL Server 2K5 和 2K8 管理工作室也在 .net 中。整个 System.Web 实现都是用 C# 本身实现的。整个 Biztalk 编排设计器都在 .net 中

现在进入正题。您的应用程序似乎具有多线程和大量异步调用。您是否可以灵活配置否。您的应用程序中的线程数？如果是的话可以限制线程数为1然后测试一下吗？由于多线程而导致的错误非常难以追踪。

你试过SOS吗？尝试这样做...我不太了解，但是 Google 一下，您肯定会获得有关 SOS 使用的良好资源。

作为最后的手段，请在 MS 支持下立案。你需要对他们有点耐心，因为一开始他们会问一些愚蠢的问题:)。祝你好运。

回复收藏 0 原文

~没有更多了~