在循环中使用Microsoft.MSHTML，内存泄漏

发布于 2024-08-17 05:04:46 字数 985 浏览 19 评论 0原文

嘿，我正在尝试使用 Microsoft.MSHTML（版本 7.0.3300.0）库从 HTML 字符串中提取正文文本。我已将此功能抽象为单个辅助方法 GetBody(string)。

当在无限循环中调用时，该进程最终会耗尽内存（通过查看任务管理器中的内存使用情况来确认）。我怀疑该问题是由于我错误地清理了 MSHTML 对象造成的。我做错了什么？

我当前对 GetBody(string) 的定义是：

public static string GetBody(string html)
{
    mshtml.IHTMLDocument2 htmlDoc = null;
    mshtml.IHTMLElement bodyElement = null;
    string body;

    try
    {
        htmlDoc = new mshtml.HTMLDocumentClass();
        htmlDoc.write(html);
        bodyElement = htmlDoc.body;
        body = bodyElement.innerText;
    }
    catch (Exception ex)
    {
        Trace.TraceError("Failed to use MSHTML to parse HTML body: " + ex.Message);
        body = email.Body;
    }
    finally
    {
        if (bodyElement != null)
            Marshal.ReleaseComObject(bodyElement);
        if (htmlDoc != null)
            Marshal.ReleaseComObject(htmlDoc);
    }

    return body;
}

编辑：内存泄漏已追溯到用于填充 html 值的代码。在本例中，它是 Outlook Redemption。

原文

Hey, I am attempting to use the Microsoft.MSHTML (Version 7.0.3300.0) library to extract the body text from an HTML string. I've abstracted this functionality into a single helper method GetBody(string).

When called in an infinite loop, the process eventually runs out of memory (confirmed by eyeballing Mem Usage in Task Manager). I suspect the problem is due to my incorrect cleanup of the MSHTML objects. What am I doing wrong?

My current definition of GetBody(string) is:

public static string GetBody(string html)
{
    mshtml.IHTMLDocument2 htmlDoc = null;
    mshtml.IHTMLElement bodyElement = null;
    string body;

    try
    {
        htmlDoc = new mshtml.HTMLDocumentClass();
        htmlDoc.write(html);
        bodyElement = htmlDoc.body;
        body = bodyElement.innerText;
    }
    catch (Exception ex)
    {
        Trace.TraceError("Failed to use MSHTML to parse HTML body: " + ex.Message);
        body = email.Body;
    }
    finally
    {
        if (bodyElement != null)
            Marshal.ReleaseComObject(bodyElement);
        if (htmlDoc != null)
            Marshal.ReleaseComObject(htmlDoc);
    }

    return body;
}

Edit: the memory leak has been traced to the code used in populating a value for html. In this case it was Outlook Redemption.

分享到QQ

分享到微博