使用 HTML Agility Pack C# 解析 HTML 标签时出现问题
这看起来应该是一件很容易做的事情,但我在这方面遇到了一些重大问题。我正在尝试使用 HAP 解析特定标签。我使用 Firebug 找到我想要的 XPath 并得出 //*[@id="atfResults"]。我相信我的问题在于 ",因为它表示新字符串的开始和结束。我尝试将其设为文字字符串,但出现错误。我已附加函数
public List<string> GetHtmlPage(string strURL)
{
// the html retrieved from the page
WebResponse objResponse;
WebRequest objRequest = System.Net.HttpWebRequest.Create(strURL);
objResponse = objRequest.GetResponse();
// the using keyword will automatically dispose the object
// once complete
using (StreamReader sr =
new StreamReader(objResponse.GetResponseStream()))
{//*[@id="atfResults"]
string strContent = sr.ReadToEnd();
// Close and clean up the StreamReader
sr.Close();
/*Regex regex = new Regex("<body>((.|\n)*?)</body>", RegexOptions.IgnoreCase);
//Here we apply our regular expression to our string using the
//Match object.
Match oM = regex.Match(strContent);
Result = oM.Value;*/
HtmlDocument doc = new HtmlDocument();
doc.Load(new StringReader(strContent));
HtmlNode root = doc.DocumentNode;
List<string> itemTags = new List<string>();
string listingtag = "//*[@id="atfResults"]";
foreach (HtmlNode link in root.SelectNodes(listingtag))
{
string att = link.OuterHtml;
itemTags.Add(att);
}
return itemTags;
}
}
This seems like it should be a easy thing to do but I am having some major issues with this. I am trying to parse for a specific tag with the HAP. I use Firebug to find the XPath I want and come up with //*[@id="atfResults"]. I believe my issue is with the " since the signals the start and end of a new string. I have tried making it a literal string but I have errors. I have attached the functions
public List<string> GetHtmlPage(string strURL)
{
// the html retrieved from the page
WebResponse objResponse;
WebRequest objRequest = System.Net.HttpWebRequest.Create(strURL);
objResponse = objRequest.GetResponse();
// the using keyword will automatically dispose the object
// once complete
using (StreamReader sr =
new StreamReader(objResponse.GetResponseStream()))
{//*[@id="atfResults"]
string strContent = sr.ReadToEnd();
// Close and clean up the StreamReader
sr.Close();
/*Regex regex = new Regex("<body>((.|\n)*?)</body>", RegexOptions.IgnoreCase);
//Here we apply our regular expression to our string using the
//Match object.
Match oM = regex.Match(strContent);
Result = oM.Value;*/
HtmlDocument doc = new HtmlDocument();
doc.Load(new StringReader(strContent));
HtmlNode root = doc.DocumentNode;
List<string> itemTags = new List<string>();
string listingtag = "//*[@id="atfResults"]";
foreach (HtmlNode link in root.SelectNodes(listingtag))
{
string att = link.OuterHtml;
itemTags.Add(att);
}
return itemTags;
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你可以逃避它:
如果你想使用原始字符串,那就是:
正如你所看到的,原始字符串在这里并没有真正提供好处。
但是,您可以改为使用:
这也会稍微快一些。
You can escape it:
If you wanted to use a raw string, it would be:
As you can see, raw strings don't really provide a benefit here.
However, you can instead use:
This will also be slightly faster.
你有没有尝试过这个:
Have you tried this: