为什么 HTML Agility Pack HtmlDocument.DocumentNode 为 null？

发布于 2025-01-01 23:42:26 字数 805 浏览 1 评论 0原文

我使用此代码来更改 HTML 流的 href 属性。

首先，我使用此代码下载完整的 html 页面：（URL 是网页地址）

HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse myHttpWebResponse = 
                         (HttpWebResponse)myHttpWebRequest.GetResponse();

Stream s = myHttpWebResponse.GetResponseStream();

然后我处理这个：

HtmlDocument doc = new HtmlDocument();

doc.Load(s);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("/a"))
{
    string att = link.Attributes["href"].Value;
    link.Attributes["href"].Value = "http://ahmadalli.somee.com/default.aspx?url=" + att;
}
doc.Save(s);

s 是 html 流。

但我有一个异常，说 doc.DocumentNode 为空！

我尝试了很多网站，但doc.DocumentNode为空

原文

I'm using this code to change the href attribute of a HTML stream.

first I download a full html page using this code:(URL is webpage address)

HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse myHttpWebResponse = 
                         (HttpWebResponse)myHttpWebRequest.GetResponse();

Stream s = myHttpWebResponse.GetResponseStream();

then I process this:

HtmlDocument doc = new HtmlDocument();

doc.Load(s);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("/a"))
{
    string att = link.Attributes["href"].Value;
    link.Attributes["href"].Value = "http://ahmadalli.somee.com/default.aspx?url=" + att;
}
doc.Save(s);

s is html stream.

but I've got an exception that says doc.DocumentNode is null!

i tried many sites but doc.DocumentNode is null to

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无所的.畏惧 2025-01-08 23:42:26

这对我有用。

using(WebClient client = new WebClient())
{
    client.Encoding = System.Text.Encoding.UTF8;
    var doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(client.DownloadString("http://www.google.com?q=stackoverflow"));
    foreach (var href in doc.DocumentNode.Descendants("a").Select(x => x.Attributes["href"]))
    {
        if (href == null) continue;
        href.Value = "http://ahmadalli.somee.com/default.aspx?url=" + HttpUtility.UrlEncode(href.Value);
    }
    StringWriter writer = new StringWriter();
    doc.Save(writer);
    var finalHtml = writer.ToString();
}

另请参阅 HttpUtility.UrlEncode 以便能够正确获取 url。否则，原url中的某些参数可能会导致问题。

使用 HttpUtility.UrlDecode 对其进行解码。

This works for me.

using(WebClient client = new WebClient())
{
    client.Encoding = System.Text.Encoding.UTF8;
    var doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(client.DownloadString("http://www.google.com?q=stackoverflow"));
    foreach (var href in doc.DocumentNode.Descendants("a").Select(x => x.Attributes["href"]))
    {
        if (href == null) continue;
        href.Value = "http://ahmadalli.somee.com/default.aspx?url=" + HttpUtility.UrlEncode(href.Value);
    }
    StringWriter writer = new StringWriter();
    doc.Save(writer);
    var finalHtml = writer.ToString();
}

Also see the HttpUtility.UrlEncode to be able to get the url back correctly. Otherwise, some parameters in original url may cause problem.

Use HttpUtility.UrlDecode to decode it.

回复收藏 0 原文

萌梦深 2025-01-08 23:42:26

尝试使用 //a 而不是 /a。

在 XPath 中，这基本上意味着给我文档中的所有链接，而不是给我文档根中的所有链接。

更新：

以下代码可以正常工作：

        var myHttpWebRequest = (HttpWebRequest)WebRequest.Create("http://google.com");
        var myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();

        var s = myHttpWebResponse.GetResponseStream();

        var doc = new HtmlDocument();

        doc.Load(s);
        foreach (var link in doc.DocumentNode.SelectNodes("//a"))
        {
            var att = link.Attributes["href"].Value;
            link.Attributes["href"].Value = "http://ahmadalli.somee.com/default.aspx?url=" + att;

            Console.WriteLine(link.Attributes["href"].Value);
        }

Try using //a instead of /a.

In XPath, this basically means give me all the links in the document, as opposed to give me all the links in the document root.

Update:

The following code works fine:

        var myHttpWebRequest = (HttpWebRequest)WebRequest.Create("http://google.com");
        var myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();

        var s = myHttpWebResponse.GetResponseStream();

        var doc = new HtmlDocument();

        doc.Load(s);
        foreach (var link in doc.DocumentNode.SelectNodes("//a"))
        {
            var att = link.Attributes["href"].Value;
            link.Attributes["href"].Value = "http://ahmadalli.somee.com/default.aspx?url=" + att;

            Console.WriteLine(link.Attributes["href"].Value);
        }

回复收藏 0 原文

遮云壑 2025-01-08 23:42:26

这是您的答案：HTML Agility Pack Null 参考。

回复收藏 0 原文

街道布景 2025-01-08 23:42:26

尝试使用以下代码：

HtmlDocument htmlDoc = new HtmlDocument
        {
            OptionAddDebuggingAttributes = false,
            OptionAutoCloseOnEnd = true,
            OptionFixNestedTags = true,
            OptionReadEncoding = true
        };
        try
        {
            using (Stream reader = myHttpWebResponse.GetResponseStream())
            {
                reader.Seek(0, SeekOrigin.Begin);
                htmlDoc.Load(reader, true);
            }
            HtmlNode node = htmlDoc.DocumentNode;
            if (node != null)
            {
                foreach (var href in doc.DocumentNode.Descendants("a").Select(x =>x.Attributes["href"]))
                 {
                     href.Value = "http://ahmadalli.somee.com/default.aspx?url=" +HttpUtility.UrlEncode(href.Value);
                 }
            }
        }
        catch { }

我正在使用 HtmlAgility 包版本：1.4.0

解决了您的问题吗？如果没有，请评论。否则标记为答案。

Try using the below code:

HtmlDocument htmlDoc = new HtmlDocument
        {
            OptionAddDebuggingAttributes = false,
            OptionAutoCloseOnEnd = true,
            OptionFixNestedTags = true,
            OptionReadEncoding = true
        };
        try
        {
            using (Stream reader = myHttpWebResponse.GetResponseStream())
            {
                reader.Seek(0, SeekOrigin.Begin);
                htmlDoc.Load(reader, true);
            }
            HtmlNode node = htmlDoc.DocumentNode;
            if (node != null)
            {
                foreach (var href in doc.DocumentNode.Descendants("a").Select(x =>x.Attributes["href"]))
                 {
                     href.Value = "http://ahmadalli.somee.com/default.aspx?url=" +HttpUtility.UrlEncode(href.Value);
                 }
            }
        }
        catch { }

I am using HtmlAgility pack version: 1.4.0

Solved your problem? If no, please comment. Else mark as answer.

回复收藏 0 原文

酒绊 2025-01-08 23:42:26

锚标记引用是一个错误转义的字符串：

...doc.DocumentNode.SelectNodes("/a")    //incorrect
...doc.DocumentNode.SelectNodes("//a")   //correct
...doc.DocumentNode.SelectNodes(@"/a")   //also correct

原始代码无法选择任何节点，并且计算结果为 null；应该对此进行检查，以防止在根本没有链接的文档上失败（尽管不太可能:)

var anchors = doc.DocumentNode.SelectNodes("//a");
if (anchors != null)
{
    foreach (HtmlNode link in anchors)
    {
        /*do stuff*/
    } 
}

Anchor tag reference is an incorrectly escaped string:

...doc.DocumentNode.SelectNodes("/a")    //incorrect
...doc.DocumentNode.SelectNodes("//a")   //correct
...doc.DocumentNode.SelectNodes(@"/a")   //also correct

The original code fails to select any nodes and evaluates to null; this should be checked against to prevent failing on, say, a document where there are no links at all (however unlikely that is :)

var anchors = doc.DocumentNode.SelectNodes("//a");
if (anchors != null)
{
    foreach (HtmlNode link in anchors)
    {
        /*do stuff*/
    } 
}

回复收藏 0 原文

~没有更多了~

关于作者

深海里的那抹蓝

暂无简介

文章

26 人气

关注发私信

佚名

文章 0 评论 0

关注

羁客

文章 0 评论 0

关注

天天爱笑的徐老师

文章 0 评论 0

关注

星

文章 0 评论 0

关注

夏日落

文章 0 评论 0

关注

隐诗

文章 0 评论 0

友情链接

文江博客

为什么 HTML Agility Pack HtmlDocument.DocumentNode 为 null？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

佚名

羁客

天天爱笑的徐老师

星

夏日落

隐诗

友情链接

为什么 HTML Agility Pack HtmlDocument.DocumentNode 为 null？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

佚名

羁客

天天爱笑的徐老师

星

夏日落

隐诗

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。