C# Internet Explorer 和剥离 HTML 标签
有没有办法从 C# 打开 Internet Explorer 进程,将 html 内容发送到该浏览器并捕获“显示”内容?
我知道其他 html 剥离方法(例如 HtmlAgilityPack),但我想探索上述途径。
谢谢, LG
Is there any way to open Internet Explorer process from C#, send html content to this browser and capture 'displayed' content?
I am aware of other html stripping methods (e.g. HtmlAgilityPack) but I would like to explore the above avenue.
Thanks,
LG
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用 WinForms 和 WPF 中都存在的 WebBrowser 控件在您的应用程序中托管 IE。然后,您可以将控件的 Source 设置为您的 HTML,等待内容加载(使用 LayoutUpdated 事件,而不是 Loaded 事件,该事件在 HTML 下载完成时引发,不一定排列和所有动态 JS 运行),然后访问Document 属性来获取 HTML。
You can use the WebBrowser control, which exists for both WinForms and WPF, to host IE in your application. You can then set the control's Source to your HTML, wait for the content to load (using the LayoutUpdated event, not the Loaded event, which is raised when the HTML is finished downloading, not necessarily arranged and all dynamic JS run), then access the Document property to get the HTML.
其他人创建了正则表达式,因此我不能将此归功于此,但上面的代码将打开传入网页的 webclient 对象,并使用正则表达式查找该页面的所有子链接。不确定这是否是您正在寻找的内容,但如果您只是想“抓取”所有 HTML 内容并将其保存到文件中,您可以简单地保存在“string s = w”行中创建的字符串“s” .DownloadString(网页);"到一个文件。
Someone else created the regular expressions so i can not take credit for that, but the above code will open a webclient object to the passed in webpage and use regular expressions to find all of the childLinks for that page. Not sure if this is what you are looking for, but if you simply wanted to "grab" all of that HTML content and save it to a file, you could simply save the string "s" created in the line "string s = w.DownloadString(webpage);" to a File.