如何选择包含特定关键字的表 - c# - xpath - htmlagilitypack

发布于 2024-12-22 20:40:39 字数 433 浏览 0 评论 0 原文

我必须从没有任何类或 ID 的产品页面收集信息。我正在使用 htmlagilitypack 和 c# 4.0。

该产品页面源代码中有很多表格。价格表包含“KDV”字符串。所以我想得到这个包含表的“KDV”字符串。我怎样才能做到这一点?

例如,下面的 xpath 将选择所有表。

string srxPathOfCategory = "//table";
var selectedNodes = myDoc.DocumentNode.SelectNodes(srxPathOfCategory);

下面的代码选择表,但从最外部的表开始。我需要选择包含给定字符串

//table[contains(., ' KDV')]

c# 、 xpath 、 htmlagilitypack的最内部表

I have to gather information from a product page which does not have any class or id. I am using htmlagilitypack and c# 4.0.

There are many tables at this product page source code. The prices table contains " KDV" string. So i would like to get this " KDV" string containing table. How can i do that ?

The xpath below would select all tables for example

string srxPathOfCategory = "//table";
var selectedNodes = myDoc.DocumentNode.SelectNodes(srxPathOfCategory);

The code below selects the table but starting from most outer table. I need to select most inner table which contains that given string

//table[contains(., ' KDV')]

c# , xpath , htmlagilitypack

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

桃酥萝莉 2024-12-29 20:40:39

下面的代码选择表,但从最外面的表开始。我
需要选择包含给定字符串的最内部表

使用

//table
    [not(descendant::table) 
   and 
     .//text()[contains(., ' KDV')]
    ]

这会选择 XML 文档中没有 table 后代的任何 table ,并且它有一个包含字符串 " KDV" 的文本节点后代。

一般来说,上面的表达式可以选择许多这样的 table 元素。

如果您只想选择其中一个(例如第一个),请使用此 XPath 表达式 - 请注意括号

   (//table
        [not(descendant::table) 
       and 
         .//text()[contains(., ' KDV')]
        ]
    )[1]

记住:如果您想选择第一个 someName 元素在文档中,使用它(如当前接受的答案)是错误的:

//someName[1]

这是 XPath 中第二常见的常见问题解答(仅次于如何在具有默认值的 XML 文档中选择具有无前缀名称的元素)命名空间)。

上面的表达式实际上选择文档中的任何 someName 元素,即其父元素的第一个子元素 - 尝试一下。

出现这种不直观行为的原因是 XPath [] 运算符的优先级(优先级)高于 // 伪运算符。

真正仅选择第一个 someName 元素(在任何 XML 文档中)(如果存在)的正确表达式是:

(//someName)[1]

这里的括号用于显式覆盖默认的 XPath 运算符优先级。

The code below selects the table but starting from most outer table. I
need to select most inner table which contains that given string

Use:

//table
    [not(descendant::table) 
   and 
     .//text()[contains(., ' KDV')]
    ]

This selects any table in the XML document that doesn't have a table descendant, and that has a text node descendant that contains the string " KDV" .

In general the above expression could select many such table elements.

If you want only one of them selected (say the first), use this XPath expression -- do notice the brackets:

   (//table
        [not(descendant::table) 
       and 
         .//text()[contains(., ' KDV')]
        ]
    )[1]

Remember: If you want to select the first someName element in the document, using this (as in the currently accepted answer) is wrong:

//someName[1]

This is the second most FAQ in XPath (after the one how to select elements with unprefixed names in an XML document with a default namespace).

The expression above actually selects any someName element in the document, that is the first child of its parent -- try it.

The reason for this unintuitive behavior is because the XPath [] operator has a higher precedence (priority) that the // pseudo-operator.

The correct expression that really selects only the first someName element (in any XML document), if such exists is:

(//someName)[1]

Here the brackets are used to explicitly override the default XPath operator precedence.

满意归宿 2024-12-29 20:40:39

可能有一种更有效的方法来做到这一点。反正,
这是我用于您的案例的完整代码,它对我有用:

        HtmlDocument doc = new HtmlDocument();
        string url = "http://www.pratikev.com/fractalv33/pratikEv/pages/viewProduct.jsp?pInstanceId=3138821";
        using (var response = (WebRequest.Create(url).GetResponse()))
        {
            doc.LoadHtml(new StreamReader(response.GetResponseStream()).ReadToEnd());
        }
        /*There is an bug in the xpath used here. Should have been 
          (//table/tr/td/font[contains(.,'KDV')])[1]/ancestor::table[2] 
          See Dimitre's answer for an explanation and an alternative / 
          more generic / (needless to say) better approach */
        string xpath = "//table/tr/td/font[contains(.,'KDV')][1]/ancestor::table[2]"; 
        HtmlNode table = doc.DocumentNode.SelectSingleNode(xpath);

There might be a more efficient way to do it. Anyway,
this is the entire code I have used for your case and it works for me:

        HtmlDocument doc = new HtmlDocument();
        string url = "http://www.pratikev.com/fractalv33/pratikEv/pages/viewProduct.jsp?pInstanceId=3138821";
        using (var response = (WebRequest.Create(url).GetResponse()))
        {
            doc.LoadHtml(new StreamReader(response.GetResponseStream()).ReadToEnd());
        }
        /*There is an bug in the xpath used here. Should have been 
          (//table/tr/td/font[contains(.,'KDV')])[1]/ancestor::table[2] 
          See Dimitre's answer for an explanation and an alternative / 
          more generic / (needless to say) better approach */
        string xpath = "//table/tr/td/font[contains(.,'KDV')][1]/ancestor::table[2]"; 
        HtmlNode table = doc.DocumentNode.SelectSingleNode(xpath);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文