使用mshtml编写.net文档

发布于 2024-10-19 23:25:54 字数 785 浏览 2 评论 0原文

我正在使用 mshtml 进行 html 解析。 (版本 7.0.3300.0,C:\Program Files\Microsoft.NET\Primary Interop Assemblies\Microsoft.mshtml.dll)。

HTMLDocumentClass 有一个 write 方法,所以我使用了它,但它引发了 ComException 错误代码:-2147352571 和消息:类型不匹配。其原因何在?如果 HTMLDocumentClass 的 write 方法不会被使用,为什么要定义它们?

    HTMLDocumentClass getHTMLDocument(string html)
    {
        HTMLDocumentClass doc = new HTMLDocumentClass();

        doc.write(new object[] { html }); // raises exception
        doc.close();

        return doc;
    }

    HTMLDocumentClass getHTMLDocument2(string html)
    {
        HTMLDocumentClass doc = new HTMLDocumentClass();
        IHTMLDocument2 doc2 = (IHTMLDocument2)doc;
        doc2.write(new object[] { html });
        doc2.close();

        return doc;
    }

I am using mshtml for html parsing. (version 7.0.3300.0, C:\Program Files\Microsoft.NET\Primary Interop Assemblies\Microsoft.mshtml.dll).

HTMLDocumentClass have a write method so i used it but it raises ComException with
ErrorCode:-2147352571 and Message:Type mismatch. What is the reason for it? If write method of HTMLDocumentClass will not be used why did they define?

    HTMLDocumentClass getHTMLDocument(string html)
    {
        HTMLDocumentClass doc = new HTMLDocumentClass();

        doc.write(new object[] { html }); // raises exception
        doc.close();

        return doc;
    }

    HTMLDocumentClass getHTMLDocument2(string html)
    {
        HTMLDocumentClass doc = new HTMLDocumentClass();
        IHTMLDocument2 doc2 = (IHTMLDocument2)doc;
        doc2.write(new object[] { html });
        doc2.close();

        return doc;
    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

唔猫 2024-10-26 23:25:54

好吧,我找到了。这是一种有趣的故障模式。我在计算机上安装的所有 Microsoft.mshtml PIA 都已过时。其中不少于 4 个,版本均为 7.0.3300.0,运行时目标为 1.0.3705(相当旧)。

原因是由类型库导入程序生成的 fooClass 互操作类。它是一个合成类,它的存在是为了使事件更容易处理,它们在 COM 中的处理方式非常不同。该类是所有接口的所有组合方法的扁平化版本。 HTMLDocument coclass 的当前 SDK 版本声明如下(来自 mshmtl.idl):

[
    uuid(25336920-03F9-11cf-8FD0-00AA00686F13)
]
coclass HTMLDocument
{
    [default]           dispinterface DispHTMLDocument;
    [source, default]   dispinterface HTMLDocumentEvents;
    [source]            dispinterface HTMLDocumentEvents2;
    [source]            dispinterface HTMLDocumentEvents3;
                        interface IHTMLDocument2;
                        interface IHTMLDocument3;
                        interface IHTMLDocument4;
                        interface IHTMLDocument5;
                        interface IHTMLDocument6;
                        interface IHTMLDOMNode;
                        interface IHTMLDOMNode2;
                        interface IDocumentSelector;
                        interface IHTMLDOMConstructor;
};

如果您在互操作库上使用对象浏览器,您将看到 HTMLDocumentClass缺少 IHTMLDocument6 的接口方法, IDocumentSelector 和 IHTMLDOMConstructor。您使用的 write() 方法超出了这些接口。

这意味着如果您使用 HTMLDocumentClass.write(),您将调用错误的方法。引发异常是因为无论调用什么方法都对该参数不满意。当然不是。

这当然是一种令人讨厌的失败模式。这是因为 Microsoft 违反了一项非常严格的 COM 要求,更改 COM 接口或组件类需要不同 guid。上述声明中的 [uuid] 属性。然而,这也使得新版本的 Internet Explorer 与使用它的旧代码完全不兼容。摇滚和艰难的地方,向后兼容性在微软是相当神圣的。在常规 COM 客户端中,组件类中接口实现的顺序通常不是问题。除了在 .NET 中之外,它会破坏 tlbimp 生成的合成 XxxClass 类型的布局。

我从未见过实际需要该合成类的情况,而且我自己也从未使用过它。您始终可以通过在 C# 中进行转换来获取正确的接口指针,调用 QueryInterface() 并始终返回正确的指针,无论版本如何。您的替代方案是正确的解决方法。

Okay, I found it. This is an interesting failure mode. All of the PIAs for Microsoft.mshtml that I have installed on machine are outdated. No less than 4 of them, all version 7.0.3300.0 with a runtime target of 1.0.3705 (which is quite old).

The fooClass interop class that's generated by the type library importer is the cause. It is a synthetic class, it exists to make events a bit easier to deal with, they are done very differently in COM. The class is a flattened version of all of the combined methods of all interfaces. The current SDK version of the HTMLDocument coclass is declared as follows (from mshmtl.idl):

[
    uuid(25336920-03F9-11cf-8FD0-00AA00686F13)
]
coclass HTMLDocument
{
    [default]           dispinterface DispHTMLDocument;
    [source, default]   dispinterface HTMLDocumentEvents;
    [source]            dispinterface HTMLDocumentEvents2;
    [source]            dispinterface HTMLDocumentEvents3;
                        interface IHTMLDocument2;
                        interface IHTMLDocument3;
                        interface IHTMLDocument4;
                        interface IHTMLDocument5;
                        interface IHTMLDocument6;
                        interface IHTMLDOMNode;
                        interface IHTMLDOMNode2;
                        interface IDocumentSelector;
                        interface IHTMLDOMConstructor;
};

If you use Object Browser on the interop library, you'll see that HTMLDocumentClass is missing the interface methods for IHTMLDocument6, IDocumentSelector and IHTMLDOMConstructor. The write() method you are using is past these interfaces.

Which means that if you use HTMLDocumentClass.write(), you'll call the wrong method. The exception is raised because whatever method is being called isn't happy about the argument. Of course it is not.

This is a nasty failure mode of course. This came about because Microsoft broke a very hard COM requirement, changing a COM interface or coclass requires a different guid. The [uuid] attribute in the above declaration. That however also makes new versions of Internet Explorer completely incompatible with old code that uses it. Rock and a hard place, backwards compatibility is quite sacred at Microsoft. The order of interface implementations in a coclass is not normally a problem in regular COM clients. Except in .NET, it breaks the layout of the synthetic XxxClass type that tlbimp generates.

I've never seen a case where that synthetic class was actually required and never use it myself. You can always obtain the correct interface pointer by casting in C#, that calls QueryInterface() and always returns the correct pointer regardless of the version. Your alternative is the proper workaround.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文