使用 HTML Agility Pack C# 解析 HTML 标签时出现问题

发布于 2024-10-25 22:11:42 字数 1588 浏览 3 评论 0原文

这看起来应该是一件很容易做的事情，但我在这方面遇到了一些重大问题。我正在尝试使用 HAP 解析特定标签。我使用 Firebug 找到我想要的 XPath 并得出 //*[@id="atfResults"]。我相信我的问题在于 "，因为它表示新字符串的开始和结束。我尝试将其设为文字字符串，但出现错误。我已附加函数

        public List<string> GetHtmlPage(string strURL)
    {
        // the html retrieved from the page

        WebResponse objResponse;
        WebRequest objRequest = System.Net.HttpWebRequest.Create(strURL);
        objResponse = objRequest.GetResponse();
        // the using keyword will automatically dispose the object 
        // once complete
        using (StreamReader sr =
        new StreamReader(objResponse.GetResponseStream()))
        {//*[@id="atfResults"]
            string strContent = sr.ReadToEnd();
            // Close and clean up the StreamReader
            sr.Close();
            /*Regex regex = new Regex("<body>((.|\n)*?)</body>", RegexOptions.IgnoreCase);

            //Here we apply our regular expression to our string using the 
            //Match object. 
            Match oM = regex.Match(strContent);
            Result = oM.Value;*/

            HtmlDocument doc = new HtmlDocument();
            doc.Load(new StringReader(strContent));
            HtmlNode root = doc.DocumentNode;
            List<string> itemTags = new List<string>();



            string listingtag = "//*[@id="atfResults"]";

            foreach (HtmlNode link in root.SelectNodes(listingtag))
            {
                string att = link.OuterHtml;

                itemTags.Add(att);
            }

            return itemTags;
        }

    }

原文

This seems like it should be a easy thing to do but I am having some major issues with this. I am trying to parse for a specific tag with the HAP. I use Firebug to find the XPath I want and come up with //*[@id="atfResults"]. I believe my issue is with the " since the signals the start and end of a new string. I have tried making it a literal string but I have errors. I have attached the functions

        public List<string> GetHtmlPage(string strURL)
    {
        // the html retrieved from the page

        WebResponse objResponse;
        WebRequest objRequest = System.Net.HttpWebRequest.Create(strURL);
        objResponse = objRequest.GetResponse();
        // the using keyword will automatically dispose the object 
        // once complete
        using (StreamReader sr =
        new StreamReader(objResponse.GetResponseStream()))
        {//*[@id="atfResults"]
            string strContent = sr.ReadToEnd();
            // Close and clean up the StreamReader
            sr.Close();
            /*Regex regex = new Regex("<body>((.|\n)*?)</body>", RegexOptions.IgnoreCase);

            //Here we apply our regular expression to our string using the 
            //Match object. 
            Match oM = regex.Match(strContent);
            Result = oM.Value;*/

            HtmlDocument doc = new HtmlDocument();
            doc.Load(new StringReader(strContent));
            HtmlNode root = doc.DocumentNode;
            List<string> itemTags = new List<string>();



            string listingtag = "//*[@id="atfResults"]";

            foreach (HtmlNode link in root.SelectNodes(listingtag))
            {
                string att = link.OuterHtml;

                itemTags.Add(att);
            }

            return itemTags;
        }

    }

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

静若繁花 2024-11-01 22:11:42

你可以逃避它：

string listingtag = "//*[@id=\"atfResults\"]";

如果你想使用原始字符串，那就是：

string listingtag = @"//*[@id=""atfResults""]";

正如你所看到的，原始字符串在这里并没有真正提供好处。

但是，您可以改为使用：

HtmlNode link = doc.GetElementById("atfResults");

这也会稍微快一些。

You can escape it:

string listingtag = "//*[@id=\"atfResults\"]";

If you wanted to use a raw string, it would be:

string listingtag = @"//*[@id=""atfResults""]";

As you can see, raw strings don't really provide a benefit here.

However, you can instead use:

HtmlNode link = doc.GetElementById("atfResults");

This will also be slightly faster.

回复收藏 0 原文

酒解孤独 2024-11-01 22:11:42

你有没有尝试过这个：

  string listingtag = "//*[@id='atfResults']";

Have you tried this:

  string listingtag = "//*[@id='atfResults']";

回复收藏 0 原文

~没有更多了~

关于作者

云柯

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

使用 HTML Agility Pack C# 解析 HTML 标签时出现问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

知足的幸福

我一向站在原地

慕烟庭风

秉忠贞之诚守退让之实

小兔几

mb_3y7WUgWY

友情链接

使用 HTML Agility Pack C# 解析 HTML 标签时出现问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

知足的幸福

我一向站在原地

慕烟庭风

秉忠贞之诚 守退让之实

小兔几

mb_3y7WUgWY

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

秉忠贞之诚守退让之实