C++/CLI 库中的早期终结和内存泄漏

发布于 2024-12-09 20:23:13 字数 3881 浏览 0 评论 0原文

我遇到了似乎在我正在从事的 C++/CLI（和 C#）项目早期调用终结器的问题。这似乎是一个非常复杂的问题，我将在代码中提到许多不同的类和类型。幸运的是，它是开源的，您可以在此处进行操作： Pstsdk.Net （mercurial 存储库）我也尝试在适当的情况下直接链接到文件浏览器，以便您可以在阅读时查看代码。我们处理的大部分代码都位于存储库的 pstsdk.mcpp 文件夹中。

现在的代码处于相当可怕的状态（我正在处理这个问题），并且我正在处理的代码的当前版本位于 Finalization Fix (UNSTABLE!) 分支中。该分支中有两个变更集，为了理解我的冗长问题，我们需要处理这两个变更集。（变更集：ee6a002df36f 和 a12e9f5ea9fe)

对于某些背景，该项目是非托管库用 C++ 编写。我不是该项目的协调员，有几个设计决策我不同意，因为我相信很多看过代码的人都会同意，但我离题了。我们将原始库的大部分层包装在 C++/CLI dll 中，但在 C# dll 中公开易于使用的 API。这样做是因为该项目的目的是将整个库转换为托管 C# 代码。

如果您能够编译代码，则可以使用此测试代码来重现问题。

问题

最新的变更集，标题为将资源管理代码移至终结器，以显示错误，显示了我最初遇到的问题。此代码中的每个类都使用相同的模式来释放非托管资源。下面是一个示例 (C++/CLI)：

DBContext::~DBContext()
{
    this->!DBContext();
    GC::SuppressFinalize(this);
}

DBContext::!DBContext()
{
    if(_pst.get() != nullptr)
        _pst.reset();            // _pst is a clr_scoped_ptr (managed type)
                                 // that wraps a shared_ptr<T>.
}

该代码有两个好处。首先，当这样的类位于 using 语句中时，资源会立即正确释放。其次，如果用户忘记了一个dispose，那么当GC最终决定终结该类时，非托管资源将被释放。

这种方法的问题是，我根本无法理解，GC 有时会决定最终确定一些用于枚举文件中数据的类。许多不同的 PST 文件都会发生这种情况，并且我已经能够确定它与调用 Finalize 方法有关，即使该类仍在使用中。

我可以通过此文件（下载）¹始终实现这一点。早期调用的终结器位于 DBAccessor.cpp 文件。如果您能够运行上面链接的代码（由于依赖于 boost 库，该项目可能很难设置），应用程序将失败并出现异常，因为 _nodes 列表设置为 null 并且 _db_ 指针由于终结器运行而被重置。

1) `NodeIdCollection` 类中的枚举代码是否存在任何明显的问题，导致 GC 在该类仍在使用时最终确定该类？

我只能使用下面描述的解决方法使代码正常运行。

一个难看的解决方法

现在，我能够通过将所有资源管理代码从每个终结器 (!classname) 移动到析构函数 (~classname) 来解决这个问题>）。这已经解决了问题，尽管它并没有解决我对为什么这些课程提前完成的好奇心。

然而，方法有问题，我承认这更多是设计的问题。由于代码中大量使用指针，几乎每个类都处理自己的资源，并且需要释放每个类。这使得使用枚举变得非常丑陋（C#）：

   foreach (var msg in pst.Messages)
   {
      // If this using statement were removed, we would have
      // memory leaks
      using (msg)  
      {
             // code here
      }
   }

作用于集合中的项目的 using 语句对我来说简直是错误的，但是，使用这种方法非常有必要防止任何内存泄漏。如果没有它，即使调用 pst 类上的 dispose 方法，也永远不会调用 dispose，并且永远不会释放内存。

我全心全意地试图改变这个设计。第一次编写这段代码时的根本问题是，除了我对 C++/CLI 知之甚少甚至一无所知之外，我无法将本机类放入托管类中。我觉得可能可以使用作用域指针，当类不再使用时，它会自动释放内存，但我无法确定这是否是解决此问题的有效方法，或者是否有效。所以，我的第二个问题是：

2）以轻松的方式处理托管类中的非托管资源的最佳方法是什么？

详细地说，我可以用最近添加到代码中的 clr_scoped_ptr 包装器替换本机指针吗（clr_scoped_ptr.h 来自这个 stackexchange问题）。或者我是否需要将本机指针包装在 scoped_ptr 或 smart_ptr 中？

感谢您阅读所有这些，我知道这很多。我希望我已经说得足够清楚了，这样我就可以从比我更有经验的人那里得到一些见解。这是一个很大的问题，我打算在允许的情况下增加赏金。希望有人可以提供帮助。

谢谢！

¹此文件是免费提供的

原文

I'm having issues with finalizers seemingly being called early in a C++/CLI (and C#) project I'm working on. This seems to be a very complex problem and I'm going to be mentioning a lot of different classes and types from the code. Fortunately it's open source, and you can follow along here: Pstsdk.Net (mercurial repository) I've also tried linking directly to the file browser where appropriate, so you can view the code as you read. Most of the code we deal with is in the pstsdk.mcpp folder of the repository.

The code right now is in a fairly hideous state (I'm working on that), and the current version of the code I'm working on is in the Finalization fixes (UNSTABLE!) branch. There are two changesets in that branch, and to understand my long-winded question, we'll need to deal with both. (changesets: ee6a002df36f and a12e9f5ea9fe)

For some background, this project is a C++/CLI wrapper of an unmanaged library written in C++. I am not the coordinator of the project, and there are several design decisions that I disagree with, as I'm sure many of you who look at the code will, but I digress. We wrap much of the layers of original library in the C++/CLI dll, but expose the easy-to-use API in the C# dll. This is done because the intention of the project is to convert the entire library to managed C# code.

If you're able to get the code to compile, you can use this test code to reproduce the problem.

The problem

The latest changeset, entitled moved resource management code to finalizers, to show bug, shows the original problem I was having. Every class in this code is uses the same pattern to free the unmanaged resources. Here is an example (C++/CLI):

DBContext::~DBContext()
{
    this->!DBContext();
    GC::SuppressFinalize(this);
}

DBContext::!DBContext()
{
    if(_pst.get() != nullptr)
        _pst.reset();            // _pst is a clr_scoped_ptr (managed type)
                                 // that wraps a shared_ptr<T>.
}

This code has two benefits. First, when a class such as this is in a using statement, the resources are properly freed immediately. Secondly, if a dispose is forgotten by the user, when the GC finally decides to finalize the class, the unmanaged resources will be freed.

Here is the problem with this approach, that I simply cannot get my head around, is that occasionally, the GC will decide to finalize some of the classes that are used to enumerate over data in the file. This happens with many different PST files, and I've been able to determine it has something to do with the Finalize method being called, even though the class is still in use.

I can consistently get it to happen with this file (download)¹. The finalizer that gets called early is in the NodeIdCollection class that is in DBAccessor.cpp file. If you are able to run the code that was linked to above (this project can be difficult to setup because of the dependencies on the boost library), the application would fail with an exception, because the _nodes list is set to null and the _db_ pointer was reset as a result of the finalizer running.

1) Are there any glaring problems with the enumeration code in the `NodeIdCollection` class that would cause the GC to finalize this class while it's still in use?

I've only been able to get the code to run properly with the workaround I've described below.

An unsightly workaround

Now, I was able to work around this problem by moving all of the resource management code from the each of the finalizers (!classname) to the destructors (~classname). This has solved the problem, though it hasn't solved my curiosity of why the classes are finalized early.

However, there is a problem with the approach, and I'll admit that it's more a problem with the design. Due to the heavy use of pointers in the code, nearly every class handles its own resources, and requires each class be disposed. This makes using the enumerations quite ugly (C#):

   foreach (var msg in pst.Messages)
   {
      // If this using statement were removed, we would have
      // memory leaks
      using (msg)  
      {
             // code here
      }
   }

The using statement acting on the item in the collection just screams wrong to me, however, with the approach it's very necessary to prevent any memory leaks. Without it, the dispose never gets called and the memory is never freed, even if the dispose method on the pst class is called.

I have every intention trying to change this design. The fundamental problem when this code was first being written, besides the fact that I knew little to nothing about C++/CLI, was that I couldn't put a native class inside of a managed one. I feel it might be possible to use scoped pointers that will free the memory automatically when the class is no longer in use, but I can't be sure if that's a valid way to go about this or if it would even work. So, my second question is:

2) What would be the best way to handle the unmanaged resources in the managed classes in a painless way?

To elaborate, could I replace a native pointer with the clr_scoped_ptr wrapper that was just recently added to the code (clr_scoped_ptr.h from this stackexchange question). Or would I need to wrap the native pointer in something like a scoped_ptr<T> or smart_ptr<T>?

Thank you for reading all of this, I know it was a lot. I hope I've been clear enough so that I might get some insight from people a little more experienced than I am. It's such a large question, I intend on adding a bounty when it allows me too. Hopefully, someone can help.

Thanks!

¹This file is part of the freely available enron dataset of PST files

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

望笑 2024-12-16 20:23:13

clr_scoped_ptr 是我的，来自此处。

如果有任何错误，请告诉我。

即使我的代码并不完美，使用智能指针也是处理此问题的正确方法，即使在托管代码中也是如此。

您不需要（也不应该）在终结器中重置 clr_scoped_ptr。每个clr_scoped_ptr本身将由运行时最终确定。

使用智能指针时，您不需要编写自己的析构函数或终结器。编译器生成的析构函数将自动调用所有子对象的析构函数，并且每个子对象终结器将在被收集时运行。

仔细查看您的代码，NodeIdCollection 中确实存在错误。 GetEnumerator() 每次调用时都必须返回不同的枚举器对象，以便每个枚举都从序列的开头开始。您正在重复使用单个枚举器，这意味着该位置在对 GetEnumerator() 的连续调用之间共享。那很糟糕。

回复收藏 0 原文

檐上三寸雪 2024-12-16 20:23:13

通过一些 Microsoft 文档，刷新我对析构函数/终结器的记忆，您我认为至少可以稍微简化你的代码。

这是我的序列版本：

DBContext::~DBContext()
{
    this->!DBContext();
}

DBContext::!DBContext()
{
    delete _pst;
    _pst = NULL;
}

“GC::SupressFinalize”是由 C++/CLI 自动完成的，因此不需要这样做。由于 _pst 变量是在构造函数中初始化的（并且删除 null 变量无论如何不会导致任何问题），因此我看不出有任何理由通过使用智能指针来使代码复杂化。

在调试说明中，我想知道您是否可以通过几次调用“GC::Collect”来帮助使问题更加明显。这应该会强制为您完成悬空对象。

希望这会有所帮助，

Refreshing my memory of destructors/finalalisers, from some Microsoft documentation, you could at least simplify your code a little, I think.

Here's my version of your sequence:

DBContext::~DBContext()
{
    this->!DBContext();
}

DBContext::!DBContext()
{
    delete _pst;
    _pst = NULL;
}

The "GC::SupressFinalize" is automatically done by C++/CLI, so no need for that. Since the _pst variable is initialised in the constructor (and deleting a null variable causes no problems anyway), I can't see any reason to complicate the code by using smart pointers.

On a debugging note, I wonder if you can help make the problem more apparent by sprinkling in a few calls to "GC::Collect". That should force finalization on dangling objects for you.

Hope this helps a little,

回复收藏 0 原文

~没有更多了~