如何调试(可能的)RTL 问题?
我问这个问题是因为我没有好主意……希望得到别人的新观点。
我有一个用户在 Windows 7 64 位系统上运行我们的 32 位 Delphi 应用程序(用 BDS 2006 编译)。直到几周前,我们的软件都“运行良好”。现在突然就不是了:它在初始化(实例化对象)时抛出访问冲突。
我们让他重新安装了我们所有的软件——从头开始。同样的 AV 错误。我们禁用了他的防病毒软件;同样的错误。
由于某种原因,我们的堆栈跟踪代码 (madExcept) 无法提供错误行的堆栈跟踪,因此我们发送了几个错误日志记录版本供用户安装和运行,以隔离生成错误的行...
结果是,这是一个实例化简单 TStringList 后代的行(没有覆盖的 Create 构造函数等) .--基本上创建是只是实例化一个 TStringList,它有一些与后代类关联的自定义方法。)
我很想向用户发送另一个测试 .EXE;它只是实例化一个普通的 TStringList,看看会发生什么。但在这一点上,我感觉自己就像在风车上猛烈地挥舞,如果我发送太多“要尝试的东西”,就有可能耗尽用户的耐心。
关于调试该用户问题的更好方法有什么新想法吗? (我不喜欢解决用户的问题......这些问题往往是如果被忽视,就会突然成为其他 5 个用户突然“发现”的流行病。)
编辑,正如 Lasse 所要求的:
procedure T_fmMain.AfterConstruction;
begin
inherited;
//Logging shows that we return from the Inherited call above,
//then AV in the following line...
FActionList := TAActionList.Create;
...other code here...
end;
这是定义正在创建的对象...
type
TAActionList = class(TStringList)
private
FShadowList: TStringList; //UPPERCASE shadow list
FIsDataLoaded : boolean;
public
procedure AfterConstruction; override;
procedure BeforeDestruction; override;
procedure DataLoaded;
function Add(const S: string): Integer; override;
procedure Delete(Index : integer); override;
function IndexOf(const S : string) : Integer; override;
end;
implementation
procedure TAActionList.AfterConstruction;
begin
Sorted := False; //until we're done loading
FShadowList := TStringList.Create;
end;
I'm asking this because I'm out of good ideas...hoping for someone else's fresh perspective.
I have a user running our 32-bit Delphi application (compiled with BDS 2006) on a Windows 7 64-bit system. Our software was "working fine" until a couple weeks ago. Now suddenly it isn't: it throws an Access Violation while initializing (instancing objects).
We've had him reinstall all our software--starting all over from scratch. Same AV error. We disabled his anti-virus software; same error.
Our stack tracing code (madExcept) for some reason wasn't able to provide a stack trace to the line of the error, so we've sent a couple error logging versions for the user to install and run, to isolate the line which generates the error...
Turns out, it's a line which instances a simple TStringList descendant (there's no overridden Create constructor, etc.--basically the Create is just instancing a TStringList which has a few custom methods associated with the descendant class.)
I'm tempted to send the user yet another test .EXE; one which just instances a plain-vanilla TStringList, to see what happens. But at this point I feel like I'm flailing at windmills, and risk wearing out the user's patience if I send too many more "things to try".
Any fresh ideas on a better approach to debugging this user's problem? (I don't like bailing out on a user's problems...those tend to be the ones which, if ignored, suddenly become an epidemic that 5 other users suddenly "find".)
EDIT, as Lasse requested:
procedure T_fmMain.AfterConstruction;
begin
inherited;
//Logging shows that we return from the Inherited call above,
//then AV in the following line...
FActionList := TAActionList.Create;
...other code here...
end;
And here's the definition of the object being created...
type
TAActionList = class(TStringList)
private
FShadowList: TStringList; //UPPERCASE shadow list
FIsDataLoaded : boolean;
public
procedure AfterConstruction; override;
procedure BeforeDestruction; override;
procedure DataLoaded;
function Add(const S: string): Integer; override;
procedure Delete(Index : integer); override;
function IndexOf(const S : string) : Integer; override;
end;
implementation
procedure TAActionList.AfterConstruction;
begin
Sorted := False; //until we're done loading
FShadowList := TStringList.Create;
end;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我讨厌这类问题,但我认为您应该在对象尝试构建之前关注最近发生的事情。
您描述的症状听起来像是典型的堆损坏,所以也许您有类似的情况......
由于我上面的回答,您已经发布了代码片段。这确实提出了一些我认为可能存在的问题。
a:AfterConstruction 与修改后的构造函数:
正如其他人所提到的,以这种方式使用 AfterConstruction 充其量不是惯用的。我不认为这真的是“错误”,但它可能是一种气味。 Dr. 对这些方法有很好的介绍。 Bob 的站点在这里。
b:重写方法 Add、Delete、IndexOf
我猜测这些方法以某种方式使用 FshadowList 项。在创建 FShadowList 之前是否有可能调用这些方法(并因此使用 FShadowList)?这似乎是可能的,因为您正在使用上面的 AfterConstruction 方法,此时虚拟方法应该“起作用”。希望通过设置一些断点并查看它们被击中的顺序,可以使用调试器轻松检查这一点。
I hate these kind of problems, but I reckon you should focus on what's happening recently BEFORE the object tries to get constructed.
The symptoms you describe sound like typical heap corruption, so maybe you have something like...
Since my answer above, you've posted code snippets. This does raise a couple of possible issues that I can see.
a: AfterConstruction vs. modified constructor:
As others have mentioned, using AfterConstruction in this way is at best not idiomatic. I don't think it's truly "wrong", but it's a possible smell. There's a good intro to these methods on Dr. Bob's site here.
b: overridden methods Add, Delete, IndexOf
I'm guessing these methods use the FshadowList item in some way. Is it remotely possible that these methods are being invoked (and thus using FShadowList) before the FShadowList is created? This seems possible because you're using the AfterConstruction methods above, by which time virtual methods should 'work'. Hopefully this is easy to check with a debugger by setting some breakpoints and seeing the order they get hit in.
您不应该永远重写程序中的
AfterConstruction
和BeforeDestruction
方法。它们并不是用于您正在使用它们所做的事情,而是用于低级 VCL 黑客攻击(如引用添加、自定义内存处理等)。您应该重写
Create 构造函数
和Destroy destructor
并将初始化代码放在这里,如下所示:看一下 VCL 代码,以及所有正式发布的 Delphi 代码,以及您会发现
AfterConstruction
和BeforeDestruction
方法从未被使用过。我想这是您问题的根本原因,因此必须修改您的代码。在 Delphi 的未来版本中情况可能会更糟。You should never override
AfterConstruction
andBeforeDestruction
methods in your programs. They are not meant for what you're doing with them, but for low-level VCL hacking (like reference adding, custom memory handling or such).You should override the
Create constructor
andDestroy destructor
instead and put your initialization code here, like such:Take a look at the VCL code, and all serious published Delphi code, and you'll see that
AfterConstruction
andBeforeDestruction
methods are never used. I guess this is the root cause of your problem, and your code must be modified in consequence. It could be even worse in future version of Delphi.显然,
TAActionList
在构建时所做的事情没有任何可疑之处。即使考虑祖先构造函数和设置Sorted := False
可能产生的副作用,也表明应该不会有问题。我对T_fmMain
内部发生的事情更感兴趣。基本上,发生了一些事情导致
FActionList := TAActionList.Create;
失败,即使TAActionList.Create
的实现没有任何错误(可能是形式可能已被意外破坏)。我建议您尝试按如下方式更改
T_fmMain.AfterConstruction
:如果您的表单使用的组件的环境问题导致它在
AfterConstruction
期间破坏表单,那么它的分配实际导致 AV 的FActionList
的新TAActionList.Create
实例。另一种测试方法是首先将对象创建为局部变量,然后将其分配给类字段:FActionList := LActionList
。环境问题可能很微妙。例如,我们使用一个报告组件,我们发现该组件需要安装打印机驱动程序,否则它会阻止我们的应用程序启动。
您可以通过在表单的析构函数中设置全局变量来确认破坏理论。此外,您还可以从析构函数输出堆栈跟踪,以确认导致表单破坏的确切顺序。
Clearly there is nothing suspicious about what
TAActionList
is doing at time of construction. Even considering ancestor constructors and possible side-effects of settingSorted := False
indicate there shouldn't be a problem. I'm more interested in what's happening insideT_fmMain
.Basically something is happening that causes
FActionList := TAActionList.Create;
to fail, even though there is nothing wrong in the implementation ofTAActionList.Create
(a possibility is that the form may have been unexpectedly destroyed).I suggest you try changing
T_fmMain.AfterConstruction
as follows:If an environment issue with a component used by your form is causing it destroy the form during
AfterConstruction
, then it's the assignment of the newTAActionList.Create
instance toFActionList
that's actually causing the AV. Another way to test would be to first create the object to a local variable, then assign it to the class field:FActionList := LActionList
.Environment problems can be subtle. E.g. We use a reporting component which we discovered requires that a printer driver is installed, otherwise it prevents our application from starting up.
You can confirm the destruction theory by setting a global variable in the form's destructor. Also you may be able to output a stack trace from the destructor to confirm the exact sequence leading to the destruction of the form.
我们的软件一直“运行良好”,直到几周前......突然成为一种流行病,其他 5 个用户突然“发现”。):
听起来你需要这样做一些取证分析,而不是调试:您需要发现该用户环境中发生了什么变化来触发错误。更重要的是,如果您有其他用户使用相同的部署但没有问题(听起来这就是您的情况)。向用户发送“要尝试的事情”是快速削弱用户信心的最佳方法之一! (如果用户现场有 IT 支持,请让他们参与,而不是让用户参与)。
对于初学者,请探索以下选项:
*) 如果可能,我会检查 Windows 事件日志以查找问题出现时该计算机上可能发生的事件。
*) 用户方面是否有某种 IT 支持人员可以与您讨论该用户环境中可能发生的更改/问题?
*) 在错误出现时,该用户是否存在某种可能与之相关的支持问题/事件,和/或导致他们特有的某种数据或文件损坏?
(至于代码本身,我同意@Warran P关于解耦等的观点)
Our software was "working fine" until a couple weeks ago... suddenly become an epidemic that 5 other users suddenly "find".) :
Sounds like you need to do some forensic analysis, not debugging: You need to discover what changed in that user's environment to trigger the error. All the more so if you have other users with the same deployment that don't have the problem (sounds like that's your situation). Sending a user 'things to try' is one of the best ways to erode user confidence very quickly! (If there is IT support at the user site, get them involved, not the user).
For starters, explore these options:
*) If possible, I'd check the Windows Event Log for events that may have occurred on that machine around the time the problem arose.
*) Is there some kind of IT support person on the user's side that you can talk to about possible changes/problems in that user's environment?
*) Was there some kind of support issue/incident with that user around the time the error surfaced that may be connected to it, and/or caused some kind of data or file corruption particular to them?
(As for the code itself, I agree with @Warran P about decoupling etc)
当 MadExcept 不够时要做的事情(我必须说,这种情况很少见):
尝试 Jedi JCL 的 JCLDEBUG。如果您将 MadExcept 更改为 JCLDEBUG,并直接将堆栈跟踪写入磁盘而不进行任何 UI 交互,您可能会获得堆栈跟踪。
运行像 MS/SysInternals debugview 这样的调试查看器,并跟踪输出内容,例如发生问题的对象的 Self 指针。我怀疑不知何故,一个无效的实例指针最终出现在那里。
解耦事物并重构事物,并编写单元测试,直到您发现真正丑陋的事物正在摧毁您。 (有人建议堆损坏。我经常发现堆损坏与不安全、丑陋的未经测试的代码以及深度绑定的 UI+模型级联故障密切相关。)
Things to do when MadExcept is NOT Enough (which is rare, I must say):
Try Jedi JCL's JCLDEBUG instead. You might get a stack traceback with it, if you change out MadExcept for JCLDEBUG, and write directly the stack trace to the disk without ANY UI interaction.
Run a debug-viewer like MS/SysInternals debugview, and trace output things like the Self pointers of the objects where the problems are happening. I suspect that somehow an INVALID instance pointer is ending up in there.
Decouple things and refactor things, and write unit tests, until you find the really ugly thing that's trashing you. (Someone suggested heap corruption. I often find heap corruption goes hand in hand with unsafe ugly untested code, and deeply bound UI+model cascading failures.)