当我需要抓取网站时,我应该使用BackgroundWorker还是Threads?

发布于 2024-08-04 09:38:23 字数 190 浏览 2 评论 0原文

我将在一个游戏网站上进行屏幕抓取以获取一些数据。我希望能够发送多个请求,这样我就可以一次屏幕抓取多个页面。我已经向网站管理员发送了电子邮件,并获得了以中等速率(每秒几个请求)进行抓取的许可。

据我所知,BackgroundWorker 使用线程池,我认为这是理想的。
对于此用例使用BackgroundWorker 或使用实际线程是否有意义?

I'm going to screen-scrape a gaming website for some data. I'd like to be able to send multiple requests so I can screen-scrape several pages at once. I've emailed the site administrator and gotten permission to scrape at a moderate rate (a few requests per second).

As far as I know BackgroundWorker uses the thread-pool which I think would be desirable.
Does it make sense to use BackgroundWorker for this use-case, or use actual Threads?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

北恋 2024-08-11 09:38:23

还有另一种构造称为 ThreadPool。可能值得使用它,因为它将为您管理多个线程,并且您可以控制最小/最大线程数。 BackgroundWorker 仅限一个线程,最适合用于您有后台 I/O 并且不想锁定用户界面线程的 WinForms 应用程序。

您将需要保留一个页面队列来抓取这些页面并将其提供给线程池。您可能仍然希望暂停或限制线程以获得预期的抓取级别。我个人会将检索到的页面内容的解析与通过 HTTP 进行的页面的实际检索分开。这通常会使事情更容易维护,并且您可能不需要本地处理是多线程的。

There is another construct known as a ThreadPool. It might be worth using this as it will manage multiple threads for you and you can control the min/max number of threads. BackgroundWorker is limited one thread and is best used for WinForms apps where you have background I/O and don't want to lock the user interface thread.

You will want to keep a queue of pages to scrape and feed these to the thread pool. You may still want to pause or limit the threads to get the intended level of scraping. I would personally separate parsing of retrieved page content from the actual retrieval of the pages over HTTP. This would generally make things easier to maintain and you may not need the local processing to be multi-threaded.

哎呦我呸! 2024-08-11 09:38:23

Typical use of the BackgroundWorker is a keeping a UI responsive; instead, use the thread pool to queue multiple http requests/responses.

See ThreadPool.QueueUserWorkItem

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文