Heisenbug:WinApi 程序在某些计算机上崩溃

发布于 2024-07-05 07:36:22 字数 749 浏览 7 评论 0原文

请帮忙! 我实在是无计可施了。 我的程序是一个小个人笔记管理器(谷歌搜索“cintanotes”)。 在某些计算机上(当然我不拥有任何计算机),它在启动后会因未处理的异常而崩溃。 这些计算机没有什么特别之处,只是它们往往配备 AMD CPU。

环境:Windows XP、Visual C++ 2005/2008、原始 WinApi。

以下是关于这个“Heisenbug”的确定性内容:

1) 崩溃仅发生在 Release 版本中。

2)一旦我删除所有与 GDI 相关的内容,崩溃就会消失。

3)BoundChecker没有任何抱怨。

4) 写入日志显示崩溃发生在局部 int 变量的声明上! 怎么可能呢? 内存损坏?

任何想法将不胜感激!

更新:我已成功在“有故障”的电脑上调试应用程序。 结果:

“CintaNotes.exe 中 0x0044a26a 处未处理的异常:0xC000001D:非法指令。”

上出现代码中断

0044A26A cvtsi2sd xmm1,dword ptr [esp+14h]

所以看来问题出在“代码生成/启用增强指令集”编译器选项中。 它被设置为“/arch:SSE2”,并且在不支持 SSE2 的计算机上崩溃。 我已将此选项设置为“未设置”,错误就消失了。 唷!

非常感谢大家的帮助!!

Please help! I'm really at my wits' end.
My program is a little personal notes manager (google for "cintanotes").
On some computers (and of course I own none of them) it crashes with an unhandled exception just after start.
Nothing special about these computers could be said, except that they tend to have AMD CPUs.

Environment: Windows XP, Visual C++ 2005/2008, raw WinApi.

Here is what is certain about this "Heisenbug":

1) The crash happens only in the Release version.

2) The crash goes away as soon as I remove all GDI-related stuff.

3) BoundChecker has no complains.

4) Writing a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?

Any ideas would be greatly appreciated!

UPDATE: I've managed to get the app debugged on a "faulty" PC. The results:

"Unhandled exception at 0x0044a26a in CintaNotes.exe: 0xC000001D: Illegal Instruction."

and code breaks on

0044A26A cvtsi2sd xmm1,dword ptr [esp+14h]

So it seems that the problem was in the "Code Generation/Enable Enhanced Instruction Set" compiler option. It was set to "/arch:SSE2" and was crashing on the machines that didn't support SSE2. I've set this option to "Not Set" and the bug is gone. Phew!

Thank you all very much for help!!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

叶落知秋 2024-07-12 07:36:23

崩溃说什么? 访问违规? 例外 ? 这将是解决此问题的进一步线索,

使用 PageHeap.exe 确保没有先前的内存损坏

确保没有堆栈溢出 (CBig array[1000000])

确保没有未初始化的内存。

此外,一旦为进程生成调试符号(与创建调试版本不同),您也可以在调试器内运行发布版本。 单步执行并查看调试器跟踪窗口中是否收到任何警告。

What does the crash say ? Access violation ? Exception ? That would be the further clue to solve this with

Ensure you have no preceeding memory corruptions using PageHeap.exe

Ensure you have no stack overflow (CBig array[1000000])

Ensure that you have no un-initialized memory.

Further you can run the release version also inside the debugger, once you generate debug symbols (not the same as creating debug version) for the process. Step through and see if you are getting any warnings in the debugger trace window.

百变从容 2024-07-12 07:36:23

“4) 写入日志显示崩溃发生在局部 int 变量的声明上!怎么可能?内存损坏?”

这可能表明硬件实际上有故障或用力过猛。 了解他们是否对计算机进行了超频。

"4) Writing a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?"

This could be a sign that the hardware is in fact faulty or being pushed too hard. Find out if they've overclocked their computer.

坠似风落 2024-07-12 07:36:23

对我来说听起来像是堆栈损坏。 我最喜欢的追踪这些问题的工具是 IDA Pro。 当然,您无权访问用户的计算机。

一些内存检查器很难发现堆栈损坏(如果确实如此)。 我认为获得这些的最可靠的方法是运行时分析。

这也可能是由于异常路径损坏造成的,即使异常已得到处理。 您是否在打开“捕获第一次机会异常”的情况下进行调试? 你应该尽可能地长久。 在很多情况下,一段时间后它确实会变得烦人。

您可以向这些用户发送您的应用程序的检查版本吗? 查看 Minidump 处理该异常并写出一个垃圾场。 然后使用 WinDbg 进行调试。

另一种方法是编写非常详细的日志。 创建一个“记录每个操作”选项,并要求用户将其打开并将其发送给您。 将内存转储到日志中。 查看 MSDN 上的“_CrtDbgReport()”。

祝你好运!

编辑:

回应您的评论:局部变量声明的错误对我来说并不奇怪。 我已经看过很多次了。 这通常是由于堆栈损坏造成的。

例如,堆栈上的某些变量可能会超出其边界。 在那之后一切都崩溃了。 然后堆栈变量声明会引发随机内存错误、虚拟表损坏等。

每当我长时间看到这些问题时,我都必须使用 IDA Pro。 详细的运行时反汇编调试是我所知道的唯一真正可靠地获​​得这些信息的东西。

许多开发人员使用 WinDbg 进行此类分析。 这就是为什么我还建议使用 Minidump。

Sounds like stack corruption to me. My favorite tool to track those down is IDA Pro. Of course you don't have that access to the user's machine.

Some memory checkers have a hard time catching stack corruption ( if it indeed that ). The surest way to get those I think is runtime analysis.

This can also be due to corruption in an exception path, even if the exception was handled. Do you debug with 'catch first-chance exceptions' turned on? You should as long as you can. It does get annoying after a while in many cases.

Can you send those users a checked version of your application? Check out Minidump Handle that exception and write out a dump. Then use WinDbg to debug on your end.

Another method is writing very detailed logs. Create a "Log every single action" option, and ask the user to turn that on and send it too you. Dump out memory to the logs. Check out '_CrtDbgReport()' on MSDN.

Good Luck!

EDIT:

Responding to your comment: An error on a local variable declaration is not surprising to me. I've seen this a lot. It's usually due to a corrupted stack.

Some variable on the stack may be running over it's boundaries for example. All hell breaks loose after that. Then stack variable declarations throw random memory errors, virtual tables get corrupted, etc.

Anytime I've seen those for a prolong period of time, I've had to go to IDA Pro. Detailed runtime disassembly debugging is the only thing I know that really gets those reliably.

Many developers use WinDbg for this kind of analysis. That's why I also suggested Minidump.

虐人心 2024-07-12 07:36:23

当我得到这种类型的东西时,我尝试通过 gimpels PC-Lint(静态代码分析)运行代码,因为它会检查 BoundsChecker 的不同类别的错误。 如果您使用 Boundschecker,请打开内存中毒选项。

你提到了 AMD CPU。 您是否调查过崩溃的计算机上是否存在类似的显卡/驱动程序版本和/或配置? 它在这些机器上总是崩溃还是只是偶尔崩溃? 也许在这些机器上运行系统信息工具,看看它们有什么共同点,

When I get this type of thing, i try running the code through gimpels PC-Lint (static code analysis) as it checks different classes of errors to BoundsChecker. If you are using Boundschecker, turn on the memory poisoning options.

You mention AMD CPUs. Have you investigated whether there is a similar graphics card / driver version and / or configuration in place on the machines that crash? Does it always crash on these machines or just occasionally? Maybe run the System Information tool on these machines and see what they have in common,

赠意 2024-07-12 07:36:23

尝试 Rational (IBM) PurifyPlus。 它可以捕获很多 BoundsChecker 无法捕获的错误。

Try Rational (IBM) PurifyPlus. It catches a lot of errors that BoundsChecker doesn't.

趁微风不噪 2024-07-12 07:36:22

4) 写入日志显示崩溃发生在局部 int 变量的声明上!怎么可能? 内存损坏

我发现许多“奇怪的崩溃”的原因是在所述对象的成员函数内取消引用损坏的 this

4) Writig a log shows that the crash happen on a declaration of a local int variable!how could that be? Memory corruption

I've found the cause to numerous "strange crashes" to be dereferencing of a broken this inside a member function of said object.

怕倦 2024-07-12 07:36:22

下载Windows 调试工具软件包。 正确设置符号路径,然后在 WinDbg 下运行应用程序。 在某些时候,它会因访问冲突而中断。 然后您应该运行命令“!analyze -v”,该命令非常智能,应该会提示您出了什么问题。

Download the Debugging tools for Windows package. Set the symbol paths correctly, then run your application under WinDbg. At some point, it will break with an Access Violation. Then you should run the command "!analyze -v", which is quite smart and should give you a hint on whats going wrong.

无边思念无边月 2024-07-12 07:36:22

大多数 heisenbugs/仅限发布的错误是由于控制流取决于从未初始化的内存/过时的指针/超过缓冲区末尾的读取,或竞争条件,或两者兼而有之。

尝试覆盖您的分配器,以便它们在分配时将内存清零。 问题是否消失了(或者变得更容易重现?)

写入日志显示崩溃发生在局部 int 变量的声明上! 怎么可能呢? 内存损坏?

堆栈溢出! ;)

Most heisenbugs / release-only bugs are due to either flow of control that depends on reads from uninitialised memory / stale pointers / past end of buffers, or race conditions, or both.

Try overriding your allocators so they zero out memory when allocating. Does the problem go away (or become more reproducible?)

Writig a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?

Stack overflow! ;)

等风来 2024-07-12 07:36:22

那么当配置为DEBUG配置时它不会崩溃? 与 RELEASE 配置有很多不同之处:
1.) 全局变量的初始化
2.) 生成的实际机器代码等。

所以第一步是找出与调试模式相比,发布模式中每个参数的确切设置。

-广告

So it doesnnt crash when configuration is DEBUG Configuration? There are many things different than a RELEASE configruation:
1.) Initialization of globals
2.) Actual machine Code generated etc..

So first step is find out what are exact settings for each parameter in the RELEASE mode as compared to the DEBUG mode.

-AD

微暖i 2024-07-12 07:36:22

1) 崩溃仅发生在Release版本中。

这通常表明您依赖于某些无法保证的行为,但在调试版本中恰好是正确的。 例如,如果您忘记初始化变量,或者访问数组越界。 确保您已打开所有编译器检查 (/RTCsuc)。 还要检查诸如依赖函数参数的评估顺序之类的事情(不能保证)。

2)一旦我删除所有与 GDI 相关的内容,崩溃就会消失。

也许这暗示你在 GDI 相关的东西上做错了什么? 例如,您是否在释放句柄后使用它们?

1) The crash happens only in the Release version.

That's usually a sign that you're relying on some behaviour that's not guaranteed, but happens to be true in the debug build. For example, if you forget to initialize your variables, or access an array out of bounds. Make sure you've turned on all the compiler checks (/RTCsuc). Also check things like relying on the order of evaluation of function parameters (which isn't guaranteed).

2) The crash goes away as soon as I remove all GDI-related stuff.

Maybe that's a hint that you're doing something wrong with the GDI related stuff? Are you using HANDLEs after they've been freed, for example?

夜吻♂芭芘 2024-07-12 07:36:22

4) 写入日志显示崩溃发生在局部 int 变量的声明上! 怎么可能呢? 内存损坏?

可执行文件/程序集中的底层代码是什么? int 的声明根本不是代码,因此不会崩溃。 你是否以某种方式初始化 int ?

要查看发生崩溃的代码,您应该执行所谓的事后分析。

Windows 错误报告

如果您想分析崩溃,您应该获取崩溃转储。 一种选择是注册 Windows 错误报告 - 需要一些钱(您需要数字代码签名 ID)和填写一些表格。 有关更多信息,请访问 https://winqual.microsoft.com/

直接从客户处获取用于 WER 的故障转储

另一种选择是联系某个遇到崩溃的用户,并直接从他那里获取用于 WER 的故障转储。 用户可以在将崩溃发送给 Microsoft 之前单击“技术详细信息”来执行此操作 - 可以在此处检查崩溃转储文件位置。

您自己的小型转储

另一种选择是注册您自己的异常处理程序,处理异常并在您希望的任何地方编写小型转储。 详细说明可以在 使用 Minidump 和 Visual Studio 对应用程序进行事后调试的代码项目中找到。 NET 文章

4) Writig a log shows that the crash happen on a declaration of a local int variable! how could that be? Memory corruption?

What is the underlying code in the executable / assembly? Declaration of int is no code at all, and as such cannot crash. Do you initialize the int somehow?

To see the code where the crash happened you should perform what is called a postmortem analysis.

Windows Error Reporting

If you want to analyse the crash, you should get a crash dump. One option for this is to register for Windows Error Reporting - requires some money (you need a digital code signing ID) and some form filling. For more visit https://winqual.microsoft.com/ .

Get the crash dump intended for WER directly from the customer

Another option is to get in touch witch some user who is experiencing the crash and get a crash dump intended for WER from him directly. The user can do this when he clicks on the Technical details before sending the crash to Microsoft - the crash dump file location can be checked there.

Your own minidump

Another option is to register your own exception handler, handle the exception and write a minidump anywhere you wish. Detailed description can be found at Code Project Post-Mortem Debugging Your Application with Minidumps and Visual Studio .NET article.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文