在循环中使用Microsoft.MSHTML,内存泄漏

发布于 2024-08-17 05:04:46 字数 985 浏览 6 评论 0原文

嘿,我正在尝试使用 Microsoft.MSHTML(版本 7.0.3300.0)库从 HTML 字符串中提取正文文本。我已将此功能抽象为单个辅助方法 GetBody(string)。

当在无限循环中调用时,该进程最终会耗尽内存(通过查看任务管理器中的内存使用情况来确认)。我怀疑该问题是由于我错误地清理了 MSHTML 对象造成的。我做错了什么?

我当前对 GetBody(string) 的定义是:

public static string GetBody(string html)
{
    mshtml.IHTMLDocument2 htmlDoc = null;
    mshtml.IHTMLElement bodyElement = null;
    string body;

    try
    {
        htmlDoc = new mshtml.HTMLDocumentClass();
        htmlDoc.write(html);
        bodyElement = htmlDoc.body;
        body = bodyElement.innerText;
    }
    catch (Exception ex)
    {
        Trace.TraceError("Failed to use MSHTML to parse HTML body: " + ex.Message);
        body = email.Body;
    }
    finally
    {
        if (bodyElement != null)
            Marshal.ReleaseComObject(bodyElement);
        if (htmlDoc != null)
            Marshal.ReleaseComObject(htmlDoc);
    }

    return body;
}

编辑:内存泄漏已追溯到用于填充 html 值的代码。在本例中,它是 Outlook Redemption。

Hey, I am attempting to use the Microsoft.MSHTML (Version 7.0.3300.0) library to extract the body text from an HTML string. I've abstracted this functionality into a single helper method GetBody(string).

When called in an infinite loop, the process eventually runs out of memory (confirmed by eyeballing Mem Usage in Task Manager). I suspect the problem is due to my incorrect cleanup of the MSHTML objects. What am I doing wrong?

My current definition of GetBody(string) is:

public static string GetBody(string html)
{
    mshtml.IHTMLDocument2 htmlDoc = null;
    mshtml.IHTMLElement bodyElement = null;
    string body;

    try
    {
        htmlDoc = new mshtml.HTMLDocumentClass();
        htmlDoc.write(html);
        bodyElement = htmlDoc.body;
        body = bodyElement.innerText;
    }
    catch (Exception ex)
    {
        Trace.TraceError("Failed to use MSHTML to parse HTML body: " + ex.Message);
        body = email.Body;
    }
    finally
    {
        if (bodyElement != null)
            Marshal.ReleaseComObject(bodyElement);
        if (htmlDoc != null)
            Marshal.ReleaseComObject(htmlDoc);
    }

    return body;
}

Edit: the memory leak has been traced to the code used in populating a value for html. In this case it was Outlook Redemption.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

烟雨凡馨 2024-08-24 05:04:46

好久没用mshtml了,但是IHTMLElement2接口不是有close方法吗?你尝试过打电话吗?

在泄漏明显之前循环运行了多长时间?

我将看看是否可以挖掘我这里使用 mshtml 的一些遗留代码,并了解开发人员如何发布这些对象。

编辑:

我们这里的旧代码调用 HTMLDocument2 上的 close,然后释放 com 对象。

但需要注意的一件事是,ReleaseComObject 方法会在循环中调用,直到返回零。这将确保所有 com 包装器和原始对象都被释放,有一个关于它的注释 此处

It has been a long time since I have used mshtml, but doesn't the IHTMLElement2 interface have a close method? Have you tried calling it?

How long did the loop run before the leak was obvious?

I will see if I can dig through some of the legacy code I have here that uses mshtml and see how the developers released the objects.

EDIT:

The old code we have here calls close on the HTMLDocument2 then release com object as you have it.

One thing to note though is that the ReleaseComObject method is called in a loop until it returns zero. This will ensure all com wrapers and the original object are released, there is a note about it here.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文