XPATH 查询、HtmlAgilityPack 和提取文本

发布于 2024-09-04 15:51:47 字数 764 浏览 7 评论 0原文

我一直在尝试从名为“tim_new”的类中提取链接。我也得到了解决方案。

给出了解决方案、代码片段和必要信息这里

所说的XPATH查询是“//a[@class='tim_new']，我的问题是，这个查询如何区分片段的第一行（在上面的链接和片段的第二行）。

更具体地说，这个 XPATH 查询的字面翻译（英文）是什么。

此外，我想编写几行代码来提取针对 NSE 编写的文本：

<div class="FL gL_12 PL10 PT15">BSE: 523395 &nbsp;&nbsp;|&nbsp;&nbsp; NSE: 3MINDIA &nbsp;&nbsp;|&nbsp;&nbsp; ISIN: INE470A01017</div>

希望能帮助您形成必要的选择查询。

我的代码写为：

IEnumerable<string> NSECODE = doc.DocumentNode.SelectSingleNode("//div[@NSE:]");

但这看起来不对。

原文

I had been trying to extract links from a class called "tim_new" . I have been given a solution as well.

Both the solution, snippet and necessary information is given here

The said XPATH query was "//a[@class='tim_new'], my question is, how did this query differentiate between the first line of the snippet (given in the link above and the second line of the snippet).

More specifically, what is the literal translation (in English) of this XPATH query.

Furthermore, I want to write a few lines of code to extract the text written against NSE:

<div class="FL gL_12 PL10 PT15">BSE: 523395   |   NSE: 3MINDIA   |   ISIN: INE470A01017</div>

Would appreciate help in forming the necessary selection query.

My code is written as:

IEnumerable<string> NSECODE = doc.DocumentNode.SelectSingleNode("//div[@NSE:]");

But this doesnt look right. Would appreciate some help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

假情假意假温柔 2024-09-11 15:51:47

第一个选择中的 XPath 为“选择所有具有名为 class 且值为 tim_new 的属性的文档元素”。括号中的内容不是您要返回的内容，而是您应用于搜索的条件。

我没有 HTML Agility 包，但如果您尝试查询以“NSE:”作为文本的 div，第二个查询的 XPath 应该只是“//div”，那么您需要过滤使用 LINQ。

类似于

var nodes = 
    doc.DocumentNode.SelectNodes("//div[text()]").Where(a => a.InnerText.IndexOf("NSE:") > -1);

英语中的“将所有立即包含文本的 div 元素返回到 LINQ，然后检查内部文本值是否包含 NSE:”。
再说一次，我不确定语法是否完美，但这就是想法。

XPath“//div[@NSE:]”将返回所有具有名为 NSE: 的属性的 div，无论如何，这都是非法的，因为属性名称中不允许使用“:”。您正在寻找元素的文本，而不是其属性之一。

希望有帮助。

注意：如果您的嵌套 div 都包含文本，如

NSE: some text
NSE: more text

您将得到重复的结果。

The XPath in the first selection reads "select all document elements that have an attribute named class with a value of tim_new". The stuff in brackets is not what you're returning, it's the criteria you're applying to the search.

I don't have the HTML Agility pack, but if you are trying to query the divs that have "NSE:" as its text, your XPath for the second query should just be "//div" then you'll want to filter using LINQ.

Something like

var nodes = 
    doc.DocumentNode.SelectNodes("//div[text()]").Where(a => a.InnerText.IndexOf("NSE:") > -1);

So in English, "Return all the div elements that immediately contain text to LINQ, then check that the inner text value contains NSE:".
Again, I'm not sure the syntax is perfect, but that's the idea.

The XPath "//div[@NSE:]" would return all divs that have and attribute named, NSE:, which would be illegal anyway because ":" isn't allowed in an attribute name. Youre looking for the text of the element, not one of its attributes.

Hope that helps.'

Note: If you have nested divs that both contain text as in <div>NSE: some text<div>NSE: more text</div></div> you're going to get duplicate results.

回复收藏 0 原文

~没有更多了~