创建用于崩溃报告 Win32 C++ 的字符串的好方法是什么?这反映了事故的原因?

发布于 2024-10-10 21:17:53 字数 1074 浏览 11 评论 0原文

我们正在使用 Fogbugz 来跟踪问题,我正在围绕 Fogbugz 的 XML API

最佳实践似乎是使用“scout”字段这样类似/相同的崩溃只会被计算在内,但不会再次报告。为此,我们需要针对特定​​崩溃原因的唯一字符串。

在 Win32 中 - 获取 dmp 文件或其他崩溃处理程序后,为崩溃创建唯一字符串的好方法是什么? (我们将创建一个 dmp 文件并将其发送到fogbugz 服务器)

在之前的帖子/文章/等中,Joel 提出了各种建议,但其中大部分都依赖于像 C# 这样使用反射的语言,并且有大量信息要么很难获得,要么不可能获得。

有没有其他人获得了诸如堆栈跟踪或其他东西之类的东西来在fogbugz 中进行侦察条目?

编辑 澄清一下 - 我们不希望每个事件都有一个唯一的 ID - 可能存在具有相同代码路径的崩溃。我们想要捕捉到这一点。我想我们会得到代码中的最后几个堆栈调用(不是来自 win32 DLL 的调用) - 但不确定如何去做。

将每次崩溃报告为唯一的是不正确的。在同一情况下报告所有崩溃是不正确的。不同的用户重复导致崩溃的场景应该映射到同一事件。

编辑

我认为我们想要的是崩溃的一般“签名” - 基于堆栈上的内容。相似的堆栈应该具有​​相同的签名。例如 - 采用我们应用程序中的前 5 个方法,然后将我们对 MS DLL 进行的第一个调用(如果有)。这可能足以作为签名,并且可能将“相同”的崩溃关联起来。

那么如何获取堆栈上的方法列表呢?您如何判断它们是来自您自己的应用程序还是另一个 DLL 中?

编辑 - 注意 我们希望在异常处理程序中创建一个“bucket id”/签名,以便我们可以创建小型转储并将其作为侦察描述发送到fogbugz。或者,我们可以在应用程序下次启动时加载转储,然后使用我们生成的签名发送它。

We're using Fogbugz for tracking issues and I am in the middle of writing a C++ wrapper around the XML API for Fogbugz.

The best practice seems to be to use the "scout" field so that similar/same crashes are just counted but not reported again. To do that we need a unique string for a particular cause of a crash.

In Win32 - after getting a dmp file or other crash handler what is a good way to make a unique string for a crash? (we're going to create a dmp file and send it to the fogbugz server)

In previous postings/articles/etc Joel has made various suggestions but much of those counted on a language like C# that use reflection and have a lot of information that is either harder to get or not possible to get.

Have any other people gotten things like stack traces or other things to make scout entries in fogbugz?

EDIT
To clarify - we don;t want a unique id for every incident - there are likely crashes that have the same code path. We want to capture that. I was thinking that we would get the last few stack calls that are in our code (not ones from win32 DLLs) - but not sure how to go about doing this.

Reporting every crash as unique is not right. Reporting all crashes under the same case is not right. Different users repeating a scenario that causes a crash should map to the same incident.

EDIT

What I think we want is a general "signature" of a crash - based on what is on the stack. Similar stacks should have the same signature. For example - take the top 5 methods that are in our app and then the first call (if any) we make into an MS DLL. This would probably be sufficient for a signature and would likely correlate the crashes that are "the same".

So how does one get the list of methods on the stack? And how can you tell if they are from your own app or in another DLL?

EDIT - NOTE
We want to create a "bucket id"/signature while in the exception handler so that we can create the minidump and send it to fogbugz as a scout description. Alternatively we can load up the dump on t he next start of the app and send it then with a signature we generate.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

指尖上的星空 2024-10-17 21:17:53

在我的项目中,我使用崩溃的地址内存作为“唯一”ID。

Here in my project I use the Address Memory of the Crash as a "Unique" ID.

千笙结 2024-10-17 21:17:53

IMO,您可以使用的最好的东西是转储分析中的存储桶 ID。使用正确配置的 Windows 调试工具 (windbg),可以执行 !analyze -v 并根据存储桶 ID 将转储分类到不同的存储桶中。 Bucket id 保证如果两个转储相同,则它们的 Bucket id 也将相同。这解决了部分难题。

很多时候,源自同一问题的两个转储会创建不同的存储桶 ID(可能是版本差异,比如您的 1.0 和 1.1 都在同一点崩溃)。您可以使用故障模块和堆栈签名来关联同一故障点的错误。

某些事情会导致非常随机的转储(例如堆损坏,故障模块通常是受害者)。因此转储分析应被视为尽力而为。当你不能的时候,你就不能。

IMO the best thing you can use will be bucket id from dump analysis. Use properly configured Debugging Tools for Windows (windbg), one can do !analyze -v and classify your dumps into different buckets based on bucket id. Bucket id guaranteed that if two dumps are the same, their bucket id will be the same. That solves part of the puzzle.

Many times two dumps rooted from same problem will create different bucket id's (maybe version difference, say your 1.0 and 1.1 both crash at same point). You can use faulting module and stack signature to correlate bugs from the same point of fault.

There will be certain things that causes very random dumps (e.g. heap corruption, the faulting module is typically the victim). Therefore dump analysis should be considered best-effort. When you can't, you can't.

一个人练习一个人 2024-10-17 21:17:53

我使用类似的东西在我的上一个应用程序(MSVC)中生成异常,因此每个错误都会记录在源文件及其发生的行中:

class Error {
    //...
    public: Error(string file, string line, string error) ;
};

#define ERROR(err) Error(__FILE__, __LINE__, err)

I used something like this to generate exceptions in my last app (MSVC), so every error would get logged with the sourcefile and line it occured on:

class Error {
    //...
    public: Error(string file, string line, string error) ;
};

#define ERROR(err) Error(__FILE__, __LINE__, err)
万劫不复 2024-10-17 21:17:53

可能有点晚了,但我也会在这里添加我的解决方案,以防它可以帮助其他人。
您可以使用“Windows 调试工具”中的傻瓜来完成此操作,例如 Windbg.exe 或更好的 kd.exe。
运行命令“kd.exe -z "path_to_dump.dmp" -c "kd;q" >> dumpstack.txt,您可能会得到以下结果:

Microsoft (R) Windows Debugger Version 10.0.15063.400 X86
版权所有 (c) Microsoft Corporation。版权所有。

正在加载转储文件 [d:\work\bugs\14122\myexe.exe.2624.dmp]
具有完整内存的用户小型转储文件:仅应用程序数据可用

************* Symbol Path validation summary **************
Response                         Time (ms)     Location
Deferred                                       srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Symbol search path is: srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows 10 Version 15063 MP (4 procs) Free x86 compatible
Product: WinNt, suite: SingleUserTS
15063.0.x86fre.rs2_release.170317-1834
Machine Name:
Debug session time: Fri Oct 13 00:09:01.000 2017 (UTC + 1:00)
System Uptime: 0 days 0:18:33.797
Process Uptime: 0 days 0:03:40.000
................................................................
.....................................................
Loading unloaded module list
..............................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(a40.2580): Security check failure or stack buffer overrun - code c0000409 (first/second chance not available)
eax=00000001 ebx=00000000 ecx=00000007 edx=77cc4350 esi=00000000 edi=00000000
eip=62ae7666 esp=0b75e17c ebp=0b75e1a8 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000202
msvcr120!abort+0x28:
62ae7666 cd29            int     29h
0:068> kd: Reading initial command 'kb;q'
ChildEBP RetAddr  Args to Child              
0b75e178 62addc5f 935dda1f 00000000 00000000 msvcr120!abort+0x28
0b75e1a8 0b75e7d4 62a9b436 0b75e1dc 62a52aa5 msvcr120!terminate+0x33
WARNING: Frame IP not in any known module. Following frames may be wrong.
0b75e1ac 62a9b436 0b75e1dc 62a52aa5 00000000 0xb75e7d4
0b75e1b4 62a52aa5 00000000 62a59740 0b75e7d4 msvcr120!__FrameUnwindToState+0x89
0b75e1c8 62a52b33 00000000 00000000 00000000 msvcr120!_EH4_CallFilterFunc+0x12
0b75e1f4 62a5a0f3 62b1f7b8 62a4f7c6 0b75e324 msvcr120!_except_handler4_common+0x8e
0b75e214 77cd6152 0b75e324 0b75e7c4 0b75e344 msvcr120!_except_handler4+0x1e
0b75e238 77cd6124 0b75e324 0b75e7c4 0b75e344 ntdll!ExecuteHandler2+0x26
0b75e30c 77cc4266 0b75e324 0b75e344 0b75e324 ntdll!ExecuteHandler+0x24
0b75e30c 74cf28f2 0b75e324 0b75e344 0b75e324 ntdll!KiUserExceptionDispatcher+0x26
0b75e684 62a59339 e06d7363 00000001 00000003 KERNELBASE!RaiseException+0x62
0b75e6c4 6001821c 0b75e6e4 6004e1bc 946a8f2a msvcr120!_CxxThrowException+0x5b
0b75e6f8 60018042 0b75e720 946a8efa ffffffff mymodule!FunctionC+0x7c
0b75e730 60016544 946a8ece ffffffff 092889d8 mymodule!FunctionB+0x32
0b75e754 600166b8 00842338 6000588d 00000001 myothermodule!FunctionB+0x44

从该堆栈中,如果您仅从堆栈中获取方法并将它们连接在字符串中,则可以创建一个唯一的存储桶:“mymodule!FunctionC+0x7c ;mymodule!FunctionB+0x32;myothermodule!FunctionB+0x44”。为了使其工作,您需要使用环境变量 _NT_SYMBOL_PATH 或使用 -y 命令行开关来访问您的个人符号服务器。
您也可以仅从返回地址(第二列)创建字符串:“62addc5f,0b75e7d4,62a9b436,62a52aa5,62a52b33,62a5a0f3,77cd6152,77cd6124,77cc4266,74cf28f2,62a59339,6001821 c,60018042,60016544,600166b8< /em>”

It's probably a little bit late, but I will add my solution here, too, in case it can help other people.
You can do this using fools from "Debugging Tools for Windows", for example windbg.exe or better kd.exe.
Running the command "kd.exe -z "path_to_dump.dmp" -c "kd;q" >> dumpstack.txt, you might get the following result:

Microsoft (R) Windows Debugger Version 10.0.15063.400 X86
Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [d:\work\bugs\14122\myexe.exe.2624.dmp]
User Mini Dump File with Full Memory: Only application data is available

************* Symbol Path validation summary **************
Response                         Time (ms)     Location
Deferred                                       srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Symbol search path is: srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows 10 Version 15063 MP (4 procs) Free x86 compatible
Product: WinNt, suite: SingleUserTS
15063.0.x86fre.rs2_release.170317-1834
Machine Name:
Debug session time: Fri Oct 13 00:09:01.000 2017 (UTC + 1:00)
System Uptime: 0 days 0:18:33.797
Process Uptime: 0 days 0:03:40.000
................................................................
.....................................................
Loading unloaded module list
..............................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(a40.2580): Security check failure or stack buffer overrun - code c0000409 (first/second chance not available)
eax=00000001 ebx=00000000 ecx=00000007 edx=77cc4350 esi=00000000 edi=00000000
eip=62ae7666 esp=0b75e17c ebp=0b75e1a8 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000202
msvcr120!abort+0x28:
62ae7666 cd29            int     29h
0:068> kd: Reading initial command 'kb;q'
ChildEBP RetAddr  Args to Child              
0b75e178 62addc5f 935dda1f 00000000 00000000 msvcr120!abort+0x28
0b75e1a8 0b75e7d4 62a9b436 0b75e1dc 62a52aa5 msvcr120!terminate+0x33
WARNING: Frame IP not in any known module. Following frames may be wrong.
0b75e1ac 62a9b436 0b75e1dc 62a52aa5 00000000 0xb75e7d4
0b75e1b4 62a52aa5 00000000 62a59740 0b75e7d4 msvcr120!__FrameUnwindToState+0x89
0b75e1c8 62a52b33 00000000 00000000 00000000 msvcr120!_EH4_CallFilterFunc+0x12
0b75e1f4 62a5a0f3 62b1f7b8 62a4f7c6 0b75e324 msvcr120!_except_handler4_common+0x8e
0b75e214 77cd6152 0b75e324 0b75e7c4 0b75e344 msvcr120!_except_handler4+0x1e
0b75e238 77cd6124 0b75e324 0b75e7c4 0b75e344 ntdll!ExecuteHandler2+0x26
0b75e30c 77cc4266 0b75e324 0b75e344 0b75e324 ntdll!ExecuteHandler+0x24
0b75e30c 74cf28f2 0b75e324 0b75e344 0b75e324 ntdll!KiUserExceptionDispatcher+0x26
0b75e684 62a59339 e06d7363 00000001 00000003 KERNELBASE!RaiseException+0x62
0b75e6c4 6001821c 0b75e6e4 6004e1bc 946a8f2a msvcr120!_CxxThrowException+0x5b
0b75e6f8 60018042 0b75e720 946a8efa ffffffff mymodule!FunctionC+0x7c
0b75e730 60016544 946a8ece ffffffff 092889d8 mymodule!FunctionB+0x32
0b75e754 600166b8 00842338 6000588d 00000001 myothermodule!FunctionB+0x44

From this stack, you can create a unique bucket if you take for example only your methods from the stack and concatenate them in a string: "mymodule!FunctionC+0x7c;mymodule!FunctionB+0x32;myothermodule!FunctionB+0x44". In order for this to work, you need to have access to you personal symbols server, either using the environment variable _NT_SYMBOL_PATH or with the -y command line switch.
You can alternatively create a string from the return addresses only (second column): "62addc5f,0b75e7d4,62a9b436,62a52aa5,62a52b33,62a5a0f3,77cd6152,77cd6124,77cc4266,74cf28f2,62a59339,6001821c,60018042,60016544,600166b8"

败给现实 2024-10-17 21:17:53

只需使用从转储文件生成的 MD5 字符串,您可能会为每次崩溃获得一个唯一的字符串。

Just use an MD5 string generated from the dump file and you will likely to get a unique string for every crash.

飘过的浮云 2024-10-17 21:17:53

我首先收集有关代码中每个函数在崩溃报告堆栈跟踪中“闪烁”的频率的数据。每个报告都必须添加到某种数据库中,每个函数都必须建立索引,以便您以后可以查询,哪些函数似乎比其他函数更容易崩溃。 (当然,像 main() 这样的函数将出现在每个报告中,但这是可以理解的)。

或者,您认为只有崩溃报告似乎才是问题所在,您可以从崩溃堆栈跟踪中删除所有这些条目,然后对其余部分(您的函数)进行哈希处理。这样,您就可以看到您自己的函数的任何特定调用链是否会重复导致崩溃,无论其间调用了哪些外部函数。

当然,一些更复杂的问题无论如何都不会以这种方式捕获,因为堆栈跟踪将完全不同。为此,您可以在每个报告中记录应用程序中的其他数据以及堆栈跟踪,例如缓冲区的大小、计数器、应用程序不同部分的状态等等......然后对此进行一些统计。

I would start with collecting the data on how often every function in your code has been "flashed" in a crash report stack trace. Every report would have to be added to some kind of database, and every function would have to be indexed so that you could later query, which functions seem to crash more often than others. (And of course, functions like main() will be in every report, but that's understandable).

Or, you think that only crash reports seem to be the problem, you could just remove all those entries from crash stack traces, and then hash the rest (your functions). That way you could see if any particular call chain of your own functions causes a crash repeatedly, no matter what external functions have been called in between.

Then of course, some of the more complicated problems will not be captured this way anyway, as the stack trace will be completely different. To help that, you could record other data from your application along with the stack trace in every report, like sizes of buffers, counters, states of different parts of the application and so on... And then do some statistics on that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文