如何在 mshtml.HTMLDocument (.NET) 中禁用 Javascript

发布于 2024-07-06 08:21:50 字数 432 浏览 6 评论 0原文

我有这样的代码:

Dim Document As New mshtml.HTMLDocument
Dim iDoc As mshtml.IHTMLDocument2 = CType(Document, mshtml.IHTMLDocument2)
iDoc.write(html)
iDoc.close()

但是,当我加载这样的 HTML 时,它会执行其中的所有 Javascript,并从“html”代码请求某些资源。

我想禁用 javascript 和所有其他弹出窗口(例如证书错误)。

我的目标是使用 mshtml 文档中的 DOM 以可靠的方式从 HTML 中提取一些标签(而不是一堆正则表达式)。

或者是否有另一个 IE/Office DLL,我可以只加载 HTML,而无需考虑 IE 相关的弹出窗口或活动脚本?

I've got a code like this :

Dim Document As New mshtml.HTMLDocument
Dim iDoc As mshtml.IHTMLDocument2 = CType(Document, mshtml.IHTMLDocument2)
iDoc.write(html)
iDoc.close()

However when I load an HTML like this it executes all Javascripts in it as well as doing request to some resources from "html" code.

I want to disable javascript and all other popups (such as certificate error).

My aim is to use DOM from mshtml document to extract some tags from the HTML in a reliable way (instead of bunch of regexes).

Or is there another IE/Office DLL which I can just load an HTML wihtout thinking about IE related popups or active scripts?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

桃扇骨 2024-07-13 08:21:50
Dim Document As New mshtml.HTMLDocument
Dim iDoc As mshtml.IHTMLDocument2 = CType(Document, mshtml.IHTMLDocument2)
'add this code
iDoc.designMode="On"
iDoc.write(html)iDoc.close()
Dim Document As New mshtml.HTMLDocument
Dim iDoc As mshtml.IHTMLDocument2 = CType(Document, mshtml.IHTMLDocument2)
'add this code
iDoc.designMode="On"
iDoc.write(html)iDoc.close()
复古式 2024-07-13 08:21:50

如果您已经将“html”作为字符串,并且您只想访问它的 DOM 视图,那么为什么要将它“渲染”到浏览器控件呢?

我不熟悉.Net 技术,但必须有某种 StringToDOM/StringToJSON 类型的东西更适合您的需求。

同样,如果您上面使用的“html”变量是 URL,则只需使用 wget 或类似工具将标记检索为字符串,并使用适用的工具进行解析。

我会寻找 .Net XML/DOM 库并使用它。 (再次,我认为这将是该语言的一部分,但我不确定)

PS 经过快速谷歌后我发现了这个(来源)。 不确定如果您要在 HTMLDocument 中使用它是否会有帮助。

    if(typeof(DOMParser) == 'undefined') {
      DOMParser = function() {}
      DOMParser.prototype.parseFromString = function(str, contentType) {
      if(typeof(ActiveXObject) != 'undefined') {
        var xmldata = new ActiveXObject('MSXML.DomDocument');
        xmldata.async = false;
        xmldata.loadXML(str);
        return xmldata;
     } else if(typeof(XMLHttpRequest) != 'undefined') {
        var xmldata = new XMLHttpRequest;
        if(!contentType) {
          contentType = 'application/xml';
        }
        xmldata.open('GET', 'data:' + contentType + ';charset=utf-8,' + encodeURIComponent(str), false);
        if(xmldata.overrideMimeType) {
          xmldata.overrideMimeType(contentType);
        }
        xmldata.send(null);
        return xmldata.responseXML;
     }
  }
}

If you have the 'html' as a string already, and you just want access to the DOM view of it, why "render" it to a browser control at all?

I'm not familiar with .Net technology, but there has to be some sort of StringToDOM/StringToJSON type of thing that would better suit your needs.

Likewise, if the 'html' variable you are using above is a URL, then just use wget or similar to retrieve the markup as a string, and parse with an applicable tool.

I'd look for a .Net XML/DOM library and use that. (again, I would figure that this would be part of the language, but I'm not sure)

PS after a quick Google I found this (source). Not sure if it would help, if you were to use this in your HTMLDocument instead.

    if(typeof(DOMParser) == 'undefined') {
      DOMParser = function() {}
      DOMParser.prototype.parseFromString = function(str, contentType) {
      if(typeof(ActiveXObject) != 'undefined') {
        var xmldata = new ActiveXObject('MSXML.DomDocument');
        xmldata.async = false;
        xmldata.loadXML(str);
        return xmldata;
     } else if(typeof(XMLHttpRequest) != 'undefined') {
        var xmldata = new XMLHttpRequest;
        if(!contentType) {
          contentType = 'application/xml';
        }
        xmldata.open('GET', 'data:' + contentType + ';charset=utf-8,' + encodeURIComponent(str), false);
        if(xmldata.overrideMimeType) {
          xmldata.overrideMimeType(contentType);
        }
        xmldata.send(null);
        return xmldata.responseXML;
     }
  }
}
薄荷梦 2024-07-13 08:21:50

听起来您正在截取一些资源,然后尝试以编程方式使用生成的 HTML 执行某些操作?

如果您提前知道它是有效的 XHTML,则将 XHTML 字符串(实际上是 XML)加载到 XmlDocument 对象,并以这种方式使用它。

否则,如果它可能无效或格式不正确,HTML 那么您将需要类似 hpricot (但那是一个 Ruby 库)

It sounds like you're screenscraping some resource, then trying to programmatically do something w/ the resulting HTML?

If you know it is valid XHTML ahead of time, then load the XHTML string (which is really XML) into an XmlDocument object, and work with it that way.

Otherwise, if it is potentially invalid, or not properly formed, HTML then you'll need something like hpricot (but that is a Ruby library)

债姬 2024-07-13 08:21:50

如果我没记错的话 MSHTML 会自动继承 IE 的设置。

因此,如果您在 Internet Explorer 中为执行代码的用户禁用 javascript,那么 Javascript 也不应该在 MSHTML 中运行。

If I remember correctly MSHTML automatically inherits the settings of IE.

So if you disable javascript in internet explorer for the user that is executing the code then Javascript shouldn't run in MSHTML either.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文