当前位置：文江博客话题详情

如何选择包含特定关键字的表 - c# - xpath - htmlagilitypack

发布于 2024-12-22 20:40:39 字数 433 浏览 0 评论 0 原文

我必须从没有任何类或 ID 的产品页面收集信息。我正在使用 htmlagilitypack 和 c# 4.0。

该产品页面源代码中有很多表格。价格表包含“KDV”字符串。所以我想得到这个包含表的“KDV”字符串。我怎样才能做到这一点？

例如，下面的 xpath 将选择所有表。

string srxPathOfCategory = "//table";
var selectedNodes = myDoc.DocumentNode.SelectNodes(srxPathOfCategory);

下面的代码选择表，但从最外部的表开始。我需要选择包含给定字符串

//table[contains(., ' KDV')]

c# 、 xpath 、 htmlagilitypack的最内部表

原文

I have to gather information from a product page which does not have any class or id. I am using htmlagilitypack and c# 4.0.

There are many tables at this product page source code. The prices table contains " KDV" string. So i would like to get this " KDV" string containing table. How can i do that ?

The xpath below would select all tables for example

string srxPathOfCategory = "//table";
var selectedNodes = myDoc.DocumentNode.SelectNodes(srxPathOfCategory);

The code below selects the table but starting from most outer table. I need to select most inner table which contains that given string

//table[contains(., ' KDV')]

c# , xpath , htmlagilitypack

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

桃酥萝莉 2024-12-29 20:40:39

下面的代码选择表，但从最外面的表开始。我
需要选择包含给定字符串的最内部表

使用：

//table
    [not(descendant::table) 
   and 
     .//text()[contains(., ' KDV')]
    ]

这会选择 XML 文档中没有 table 后代的任何 table ，并且它有一个包含字符串 " KDV" 的文本节点后代。

一般来说，上面的表达式可以选择许多这样的 table 元素。

如果您只想选择其中一个（例如第一个），请使用此 XPath 表达式 - 请注意括号：

   (//table
        [not(descendant::table) 
       and 
         .//text()[contains(., ' KDV')]
        ]
    )[1]

记住：如果您想选择第一个 someName 元素在文档中，使用它（如当前接受的答案）是错误的：

//someName[1]

这是 XPath 中第二常见的常见问题解答（仅次于如何在具有默认值的 XML 文档中选择具有无前缀名称的元素）命名空间）。

上面的表达式实际上选择文档中的任何 someName 元素，即其父元素的第一个子元素 - 尝试一下。

出现这种不直观行为的原因是 XPath [] 运算符的优先级（优先级）高于 // 伪运算符。

真正仅选择第一个 someName 元素（在任何 XML 文档中）（如果存在）的正确表达式是：

(//someName)[1]

这里的括号用于显式覆盖默认的 XPath 运算符优先级。

The code below selects the table but starting from most outer table. I
need to select most inner table which contains that given string

Use:

//table
    [not(descendant::table) 
   and 
     .//text()[contains(., ' KDV')]
    ]

This selects any table in the XML document that doesn't have a table descendant, and that has a text node descendant that contains the string " KDV" .

In general the above expression could select many such table elements.

If you want only one of them selected (say the first), use this XPath expression -- do notice the brackets:

   (//table
        [not(descendant::table) 
       and 
         .//text()[contains(., ' KDV')]
        ]
    )[1]

Remember: If you want to select the first someName element in the document, using this (as in the currently accepted answer) is wrong:

//someName[1]

This is the second most FAQ in XPath (after the one how to select elements with unprefixed names in an XML document with a default namespace).

The expression above actually selects any someName element in the document, that is the first child of its parent -- try it.

The reason for this unintuitive behavior is because the XPath [] operator has a higher precedence (priority) that the // pseudo-operator.

The correct expression that really selects only the first someName element (in any XML document), if such exists is:

(//someName)[1]

Here the brackets are used to explicitly override the default XPath operator precedence.

回复收藏 0 原文

满意归宿 2024-12-29 20:40:39

可能有一种更有效的方法来做到这一点。反正，
这是我用于您的案例的完整代码，它对我有用：

        HtmlDocument doc = new HtmlDocument();
        string url = "http://www.pratikev.com/fractalv33/pratikEv/pages/viewProduct.jsp?pInstanceId=3138821";
        using (var response = (WebRequest.Create(url).GetResponse()))
        {
            doc.LoadHtml(new StreamReader(response.GetResponseStream()).ReadToEnd());
        }
        /*There is an bug in the xpath used here. Should have been 
          (//table/tr/td/font[contains(.,'KDV')])[1]/ancestor::table[2] 
          See Dimitre's answer for an explanation and an alternative / 
          more generic / (needless to say) better approach */
        string xpath = "//table/tr/td/font[contains(.,'KDV')][1]/ancestor::table[2]"; 
        HtmlNode table = doc.DocumentNode.SelectSingleNode(xpath);

There might be a more efficient way to do it. Anyway,
this is the entire code I have used for your case and it works for me:

        HtmlDocument doc = new HtmlDocument();
        string url = "http://www.pratikev.com/fractalv33/pratikEv/pages/viewProduct.jsp?pInstanceId=3138821";
        using (var response = (WebRequest.Create(url).GetResponse()))
        {
            doc.LoadHtml(new StreamReader(response.GetResponseStream()).ReadToEnd());
        }
        /*There is an bug in the xpath used here. Should have been 
          (//table/tr/td/font[contains(.,'KDV')])[1]/ancestor::table[2] 
          See Dimitre's answer for an explanation and an alternative / 
          more generic / (needless to say) better approach */
        string xpath = "//table/tr/td/font[contains(.,'KDV')][1]/ancestor::table[2]"; 
        HtmlNode table = doc.DocumentNode.SelectSingleNode(xpath);

回复收藏 0 原文

~没有更多了~

关于作者

開玄

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

如何选择包含特定关键字的表 - c# - xpath - htmlagilitypack

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

18058794968

未名湖

断舍离

权

cyay10

qq_RdefO0

友情链接

如何选择包含特定关键字的表 - c# - xpath - htmlagilitypack

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

18058794968

未名湖

断舍离

权

cyay10

qq_RdefO0

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。