将 Webbrowser 与 Control.invoke 结合使用
我正在开发一个用于网页抓取的 Windows 应用程序。为此,我使用 Webbrowser 控件 - 我无法使用 webrequest/webclient/webresponse 类,因为网页是使用 javascript 动态加载的。
该应用程序运行良好,但由于我进行了大量处理,因此它不必要地加载 UI。我间歇性地收到“未响应”消息。所以我所做的是:
1. 在UI线程上创建web浏览器
2. 将长时间运行的进程置于后台线程
3. 每当我需要获取页面文档时,我都会使用 Control.Invoke。
4.通过后台线程的invoke调用返回页面的文档
在回调函数中,我可以看到页面的文档已正确提取。但是,返回给后台工作人员的文档 (HtmlDocument) 未正确评估。当我单步执行调试器时,我收到“函数评估超时消息...”。我已经尝试了语法并不断收到无效的强制转换异常或跨线程消息传递异常。
以下是我对回调/委托进行编码的方式:
private delegate HtmlDocument RefreshDelegate();
private HtmlDocument RefreshBrowser()
{
WebBrowser br1 = ((WebBrowser)this.Controls["br1"]); //get webbrowser, "br1"
br1.Refresh(); //refresh browser
return br1.Document; //is retrieved correctly
}
现在,后台工作程序中处理“返回”HTMLDocument 的代码:
WebBrowser br1 = ((WebBrowser)this.Controls["br1"]); //get the browser
HtmlDocument document = (HtmlDocument)br1.Invoke(new RefreshDelegate(this.RefreshBrowser)); //not evaluated
//do stuff with document
遇到调试器消息:“由于先前的函数计算超时,函数计算被禁用。您必须继续执行才能重新启用函数计算。”。这是解决这个问题的正确方法吗?正如我所说,我无法通过 webrequest 等获取 javascript 内容,我也无法在 UI 上运行 htmldocument 解析,因为这会导致糟糕的用户体验。此外,碰巧我需要创建多个网络浏览器实例。如果这不是最好的方法,我也向其他图书馆开放。谢谢。
I am developing a windows application for web scraping. To do this, I use the Webbrowser control - I can't use the the webrequest/webclient/webresponse classes because the web pages are loaded dynamically using javascript.
The application works fine, but since I do a lot of processing, it loads the UI unnecessarily. I get the "not responding" message intermittently. So what I did is:
1. Create the webbrowser on the UI thread
2. Put the long-running processes on a background thread
3. Whenever I need to get the page' document I use a Control.Invoke.
4. Return the page's document via the invoke call to the background thread
In the callback function, I can see that the page's document is extracted fine. However, the document (HtmlDocument) returned to background worker is not correctly evaluated. When I step through the debugger, I get "Function evaluation timed out message...". I've played around with the syntax and keep getting invalid cast exception or cross threading messaging exception.
Below is how I've coded the callback/ delegate:
private delegate HtmlDocument RefreshDelegate();
private HtmlDocument RefreshBrowser()
{
WebBrowser br1 = ((WebBrowser)this.Controls["br1"]); //get webbrowser, "br1"
br1.Refresh(); //refresh browser
return br1.Document; //is retrieved correctly
}
Now for the code in the background worker that processes the "returned" HTMLDocument:
WebBrowser br1 = ((WebBrowser)this.Controls["br1"]); //get the browser
HtmlDocument document = (HtmlDocument)br1.Invoke(new RefreshDelegate(this.RefreshBrowser)); //not evaluated
//do stuff with document
Debugger message encountered: "Function evaluation disabled because a previous function evaluation timed out. You must continue execution to reenable function evaluation.". Is this the correct way to solve this problem? As I said I can't get the javascript content with webrequest etc, I also can't run the htmldocument parsing on the UI, because it results in a poor user experience. Additionally, it happens that i need to create several webbrowser instances. If this is not the best way, I'm open to other libraries as well. Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
发生这种情况是因为您在工作线程或调试器线程中调用的 WebBrowser 方法实际上并不在该线程上运行。 WebBrowser 是一个单元线程 COM 组件,COM 自动将来自工作线程的调用封送回 UI 线程。这在调试器中效果不佳,因为 UI 线程被调试器冻结。
您对此无能为力,实际上让这些调用在 UI 线程上运行仍然会让您面临 UI 冻结的情况。解决这个问题的唯一方法是在自己的 STA 线程上完全关闭浏览器。你看不到它,我想这不应该是一个问题。检查此答案以获取您需要的代码。
This happens because the WebBrowser methods you call in the worker thread or the debugger thread don't actually run on that thread. WebBrowser is an apartment threaded COM component, COM automatically marshals calls from the worker back to the UI thread. This doesn't work well in the debugger because the UI thread is frozen by the debugger.
Nothing you can do about that, actually having these calls run on the UI thread still leaves you open to UI freezes. The only cure for that is the run the browser completely off on its own STA thread. You can't look at it, shouldn't be an issue I imagine. Check this answer for the code you'll need.
我建议使用 HtmlAgilityPack。这是专门为网络“抓取”而设计的。
http://htmlagilitypack.codeplex.com/
I would suggest using the HtmlAgilityPack. This is specifically designed for web "scraping".
http://htmlagilitypack.codeplex.com/