如何使用 Html Agility Pack 使请求超时

发布于 2024-11-18 10:24:18 字数 1011 浏览 7 评论 0原文

我正在向当前离线(故意)的远程 Web 服务器发出请求。

我想找出使请求超时的最佳方法。基本上,如果请求运行时间超过“X”毫秒,则退出请求并返回 null 响应。

目前,网络请求只是坐在那里等待响应......

我将如何最好地解决这个问题?

这是当前的代码片段

    public JsonpResult About(string HomePageUrl)
    {
        Models.Pocos.About about = null;
        if (HomePageUrl.RemoteFileExists())
        {
            // Using the Html Agility Pack, we want to extract only the
            // appropriate data from the remote page.
            HtmlWeb hw = new HtmlWeb();
            HtmlDocument doc = hw.Load(HomePageUrl);
            HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='wrapper1-border']");

            if (node != null)
            { 
                about = new Models.Pocos.About { html = node.InnerHtml };
            }
                //todo: look into whether this else statement is necessary
            else 
            {
                about = null;
            }
        }

        return this.Jsonp(about);
    }

I'm making a request to a remote web server that is currently offline (on purpose).

I'd like to figure out the best way to time out the request. Basically if the request runs longer than "X" milliseconds, then exit the request and return a null response.

Currently the web request just sits there waiting for a response.....

How would I best approach this problem?

Here's a current code snippet

    public JsonpResult About(string HomePageUrl)
    {
        Models.Pocos.About about = null;
        if (HomePageUrl.RemoteFileExists())
        {
            // Using the Html Agility Pack, we want to extract only the
            // appropriate data from the remote page.
            HtmlWeb hw = new HtmlWeb();
            HtmlDocument doc = hw.Load(HomePageUrl);
            HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='wrapper1-border']");

            if (node != null)
            { 
                about = new Models.Pocos.About { html = node.InnerHtml };
            }
                //todo: look into whether this else statement is necessary
            else 
            {
                about = null;
            }
        }

        return this.Jsonp(about);
    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

回忆那么伤 2024-11-25 10:24:18

通过以下方法检索您的 url 网页:

private static string retrieveData(string url)
    {
        // used to build entire input
        StringBuilder sb = new StringBuilder();

        // used on each read operation
        byte[] buf = new byte[8192];

        // prepare the web page we will be asking for
        HttpWebRequest request = (HttpWebRequest)
        WebRequest.Create(url);
        request.Timeout = 10; //10 millisecond
        // execute the request

        HttpWebResponse response = (HttpWebResponse)
        request.GetResponse();

        // we will read data via the response stream
        Stream resStream = response.GetResponseStream();

        string tempString = null;
        int count = 0;

        do
        {
            // fill the buffer with data
            count = resStream.Read(buf, 0, buf.Length);

            // make sure we read some data
            if (count != 0)
            {
                // translate from bytes to ASCII text
                tempString = Encoding.ASCII.GetString(buf, 0, count);

                // continue building the string
                sb.Append(tempString);
            }
        }
        while (count > 0); // any more data to read?

        return sb.ToString();
    }

并使用 HTML Agility 包并检索 html 标签,如下所示:

public static string htmlRetrieveInfo()
    {
        string htmlSource = retrieveData("http://example.com/test.html");
        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(htmlSource);
        if (doc.DocumentNode.SelectSingleNode("//body") != null)
        {
          HtmlNode node = doc.DocumentNode.SelectSingleNode("//body");
        }
        return node.InnerHtml;
    }

Retrieve your url web page through this method:

private static string retrieveData(string url)
    {
        // used to build entire input
        StringBuilder sb = new StringBuilder();

        // used on each read operation
        byte[] buf = new byte[8192];

        // prepare the web page we will be asking for
        HttpWebRequest request = (HttpWebRequest)
        WebRequest.Create(url);
        request.Timeout = 10; //10 millisecond
        // execute the request

        HttpWebResponse response = (HttpWebResponse)
        request.GetResponse();

        // we will read data via the response stream
        Stream resStream = response.GetResponseStream();

        string tempString = null;
        int count = 0;

        do
        {
            // fill the buffer with data
            count = resStream.Read(buf, 0, buf.Length);

            // make sure we read some data
            if (count != 0)
            {
                // translate from bytes to ASCII text
                tempString = Encoding.ASCII.GetString(buf, 0, count);

                // continue building the string
                sb.Append(tempString);
            }
        }
        while (count > 0); // any more data to read?

        return sb.ToString();
    }

And to use the HTML Agility pack and retrive the html tag like this:

public static string htmlRetrieveInfo()
    {
        string htmlSource = retrieveData("http://example.com/test.html");
        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(htmlSource);
        if (doc.DocumentNode.SelectSingleNode("//body") != null)
        {
          HtmlNode node = doc.DocumentNode.SelectSingleNode("//body");
        }
        return node.InnerHtml;
    }
一梦浮鱼 2024-11-25 10:24:18

Html Agility Pack 是开源的。这就是为什么你可以自己修改源代码。
首先将此代码添加到类 HtmlWeb 中:

private int _timeout = 20000;

public int Timeout 
    { 
        get { return _timeout; } 
        set
        {
            if (_timeout < 1) 
                throw new ArgumentException("Timeout must be greater then zero.");
            _timeout = value;
        }
    }

然后找到此方法

private HttpStatusCode Get(Uri uri, string method, string path, HtmlDocument doc, IWebProxy proxy, ICredentials creds)

并修改它:

req = WebRequest.Create(uri) as HttpWebRequest;
req.Method = method;
req.UserAgent = UserAgent;
req.Timeout = Timeout; //add this

或者类似的东西:

htmlWeb.PreRequest = request =>
            {
                request.Timeout = 15000;
                return true;
            };

Html Agility Pack is open souce. Thats why you may modify source yurself.
For first add this code to class HtmlWeb:

private int _timeout = 20000;

public int Timeout 
    { 
        get { return _timeout; } 
        set
        {
            if (_timeout < 1) 
                throw new ArgumentException("Timeout must be greater then zero.");
            _timeout = value;
        }
    }

Then find this method

private HttpStatusCode Get(Uri uri, string method, string path, HtmlDocument doc, IWebProxy proxy, ICredentials creds)

and modify it:

req = WebRequest.Create(uri) as HttpWebRequest;
req.Method = method;
req.UserAgent = UserAgent;
req.Timeout = Timeout; //add this

Or something like that:

htmlWeb.PreRequest = request =>
            {
                request.Timeout = 15000;
                return true;
            };
最笨的告白 2024-11-25 10:24:18

我必须对最初发布的代码进行一些小调整

    public JsonpResult About(string HomePageUrl)
    {
        Models.Pocos.About about = null;
        // ************* CHANGE HERE - added "timeout in milliseconds" to RemoteFileExists extension method.
        if (HomePageUrl.RemoteFileExists(1000))
        {
            // Using the Html Agility Pack, we want to extract only the
            // appropriate data from the remote page.
            HtmlWeb hw = new HtmlWeb();
            HtmlDocument doc = hw.Load(HomePageUrl);
            HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='wrapper1-border']");

            if (node != null)
            { 
                about = new Models.Pocos.About { html = node.InnerHtml };
            }
                //todo: look into whether this else statement is necessary
            else 
            {
                about = null;
            }
        }

        return this.Jsonp(about);
    }

然后我修改了我的 RemoteFileExists 扩展方法以设置超时

    public static bool RemoteFileExists(this string url, int timeout)
    {
        try
        {
            //Creating the HttpWebRequest
            HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;

            // ************ ADDED HERE
            // timeout the request after x milliseconds
            request.Timeout = timeout;
            // ************

            //Setting the Request method HEAD, you can also use GET too.
            request.Method = "HEAD";
            //Getting the Web Response.
            HttpWebResponse response = request.GetResponse() as HttpWebResponse;
            //Returns TRUE if the Status code == 200
            return (response.StatusCode == HttpStatusCode.OK);
        }
        catch
        {
            //Any exception will returns false.
            return false;
        }
    }

在这种方法中,如果我的超时在 RemoteFileExists 可以确定之前触发标头响应,那么我的 bool 将返回 false。

I had to make a small adjustment to my originally posted code

    public JsonpResult About(string HomePageUrl)
    {
        Models.Pocos.About about = null;
        // ************* CHANGE HERE - added "timeout in milliseconds" to RemoteFileExists extension method.
        if (HomePageUrl.RemoteFileExists(1000))
        {
            // Using the Html Agility Pack, we want to extract only the
            // appropriate data from the remote page.
            HtmlWeb hw = new HtmlWeb();
            HtmlDocument doc = hw.Load(HomePageUrl);
            HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='wrapper1-border']");

            if (node != null)
            { 
                about = new Models.Pocos.About { html = node.InnerHtml };
            }
                //todo: look into whether this else statement is necessary
            else 
            {
                about = null;
            }
        }

        return this.Jsonp(about);
    }

Then I modified my RemoteFileExists extension method to have a timeout

    public static bool RemoteFileExists(this string url, int timeout)
    {
        try
        {
            //Creating the HttpWebRequest
            HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;

            // ************ ADDED HERE
            // timeout the request after x milliseconds
            request.Timeout = timeout;
            // ************

            //Setting the Request method HEAD, you can also use GET too.
            request.Method = "HEAD";
            //Getting the Web Response.
            HttpWebResponse response = request.GetResponse() as HttpWebResponse;
            //Returns TRUE if the Status code == 200
            return (response.StatusCode == HttpStatusCode.OK);
        }
        catch
        {
            //Any exception will returns false.
            return false;
        }
    }

In this approach, if my timeout fires before RemoteFileExists can determine the header response, then my bool will return false.

合约呢 2024-11-25 10:24:18

您可以使用标准 HttpWebRequest 来获取远程资源并设置 超时属性。如果生成的 HTML 成功,则将其提供给 HTML Agility Pack 进行解析。

You could use a standard HttpWebRequest to fetch the remote resource and set the Timeout property. Then feed the resulting HTML if it succeeds to HTML Agility Pack for parsing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文