当前位置：文江博客话题详情

从 URL 获取 HTML 的优化方法

发布于 2024-11-30 02:01:08 字数 1432 浏览 0 评论 0原文

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一城柳絮吹成雪 2024-12-07 02:01:08

WebClient 可能有一个更简单的 api，但两者都应该可以工作。

至于运行大量请求，您应该使用多个线程或线程池来实现它。如果网址位于同一服务器上，则应小心不要使其过载。

如果您想要通过线程池实现它的示例，我可以提供它们。

更新

using System;
using System.Threading;
using System.Collections.Generic;
using System.Net;
using System.IO;

namespace WebClientApp
{
class MainClassApp
{
    private static int requests = 0;
    private static object requests_lock = new object();

    public static void Main() {

        List<string> urls = new List<string> { "http://www.google.com", "http://www.slashdot.org"};
        foreach(var url in urls) {
            ThreadPool.QueueUserWorkItem(GetUrl, url);
        }

        int cur_req = 0;

        while(cur_req<urls.Count) {

            lock(requests_lock) {
                cur_req = requests; 
            }

            Thread.Sleep(1000);
        }

        Console.WriteLine("Done");
    }

private static void GetUrl(Object the_url) {

        string url = (string)the_url;
        WebClient client = new WebClient();
        Stream data = client.OpenRead (url);

        StreamReader reader = new StreamReader(data);
        string html = reader.ReadToEnd ();

        /// Do something with html
        Console.WriteLine(html);

        lock(requests_lock) {
            requests++; 
        }
    }
}

}

WebClient probably has a more simple api but both should work.

As far as running a lot of requests you should implement it using multiple threads or a thread pool. If the urls are on the same server you should be careful not to overload it.

If you want examples to implement it via a thread pool I can provide them.

Update

using System;
using System.Threading;
using System.Collections.Generic;
using System.Net;
using System.IO;

namespace WebClientApp
{
class MainClassApp
{
    private static int requests = 0;
    private static object requests_lock = new object();

    public static void Main() {

        List<string> urls = new List<string> { "http://www.google.com", "http://www.slashdot.org"};
        foreach(var url in urls) {
            ThreadPool.QueueUserWorkItem(GetUrl, url);
        }

        int cur_req = 0;

        while(cur_req<urls.Count) {

            lock(requests_lock) {
                cur_req = requests; 
            }

            Thread.Sleep(1000);
        }

        Console.WriteLine("Done");
    }

private static void GetUrl(Object the_url) {

        string url = (string)the_url;
        WebClient client = new WebClient();
        Stream data = client.OpenRead (url);

        StreamReader reader = new StreamReader(data);
        string html = reader.ReadToEnd ();

        /// Do something with html
        Console.WriteLine(html);

        lock(requests_lock) {
            requests++; 
        }
    }
}

}

回复收藏 0 原文

绾颜 2024-12-07 02:01:08

使用 Parallel.Invoke 设置所有请求并给予它慷慨的 MaxDegreesOfParallelism。

您将花费大部分时间等待 I/O，因此尽可能多地使用多线程。

回复收藏 0 原文

~没有更多了~

关于作者

笔落惊风雨

暂无简介

0 文章

0 评论

25 人气

关注发私信

Gabu-gabumon

文章 0 评论 0

关注

qq_CgiN62

文章 0 评论 0

关注

荔枝明

文章 0 评论 0

关注

赏烟花じ飞满天

文章 0 评论 0

关注

独守阴晴ぅ圆缺

文章 0 评论 0

关注

¤→小豸慧

文章 0 评论 0

友情链接

文江博客

从 URL 获取 HTML 的优化方法

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签