如何使用 xpath 迭代与 css 类匹配的 DOM 元素?

发布于 2024-09-10 04:23:54 字数 310 浏览 6 评论 0原文

我正在使用 Python + Selenium RC 处理具有可变数量的 p 元素和 css 类“myclass”的 HTML 页面。

当我尝试使用此 xpath:

//p[@class='myclass'][n]

(使用 na 自然数)

选择每个节点时,对于每个 n,我仅获得具有此 css 类的第一个 p 元素,这与我迭代选择所有 p 元素时的情况不同:

//p[n]

有什么办法我可以使用 xpath 通过 css 类迭代元素吗?

I'm processing an HTML page with a variable number of p elements with a css class "myclass", using Python + Selenium RC.

When I try to select each node with this xpath:

//p[@class='myclass'][n]

(with n a natural number)

I get only the first p element with this css class for every n, unlike the situation if I iterate through selecting ALL p elements with:

//p[n]

Is there any way I can iterate through elements by css class using xpath?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

奶气 2024-09-17 04:23:54

XPath 1.0 不提供迭代构造

可以使用托管 XPath 的语言对选定的节点集执行迭代。

示例

在 XSLT 1.0 中

   <xsl:for-each select="someExpressionSelectingNodes">
     <!-- Do something with the current node -->
   </xsl:for-each>

在 C# 中

using System;
using System.IO;
using System.Xml;

public class Sample {

  public static void Main() {

    XmlDocument doc = new XmlDocument();
    doc.Load("booksort.xml");

    XmlNodeList nodeList;
    XmlNode root = doc.DocumentElement;

    nodeList=root.SelectNodes("descendant::book[author/last-name='Austen']");

    //Change the price on the books.
    foreach (XmlNode book in nodeList)
    {
      book.LastChild.InnerText="15.95";
    }

    Console.WriteLine("Display the modified XML document....");
    doc.Save(Console.Out);

  }
}

XPath 2.0 有自己的迭代构造

   for $varname1 in someExpression1,
       $varname2 in someExpression2, 
      .  .  .  .  .  .  .  .  .  .  .
       $varnameN in someExpressionN 
    return
        SomeExpressionUsingTheVarsAbove

XPath 1.0 doesn't provide an iterating construct.

Iteration can be performed on the selected node-set in the language that is hosting XPath.

Examples:

In XSLT 1.0:

   <xsl:for-each select="someExpressionSelectingNodes">
     <!-- Do something with the current node -->
   </xsl:for-each>

In C#:

using System;
using System.IO;
using System.Xml;

public class Sample {

  public static void Main() {

    XmlDocument doc = new XmlDocument();
    doc.Load("booksort.xml");

    XmlNodeList nodeList;
    XmlNode root = doc.DocumentElement;

    nodeList=root.SelectNodes("descendant::book[author/last-name='Austen']");

    //Change the price on the books.
    foreach (XmlNode book in nodeList)
    {
      book.LastChild.InnerText="15.95";
    }

    Console.WriteLine("Display the modified XML document....");
    doc.Save(Console.Out);

  }
}

XPath 2.0 has its own iteration construct:

   for $varname1 in someExpression1,
       $varname2 in someExpression2, 
      .  .  .  .  .  .  .  .  .  .  .
       $varnameN in someExpressionN 
    return
        SomeExpressionUsingTheVarsAbove
十级心震 2024-09-17 04:23:54

现在我再看这个问题,我认为真正的问题不在于迭代,而在于使用//

这是常见问题解答

//p[@class='myclass'][1] 

选择每个 p 元素,该元素具有值为 "myclass" 的 class 属性 这是其父级的第一个这样的子级。因此,该表达式可能会选择许多 p 元素,但其中没有一个真正是文档中第一个这样的 p 元素。

当我们想要获取文档中满足上述谓词的第一个 p 元素时,一个正确的表达式是:

(//p)[@class='myclass'][1] 

记住[] 运算符具有比 // 缩写更高的优先级(优先级)。
无论何时需要对 // 选择的节点进行索引,始终将要索引的表达式放在括号中。

这里是一个演示

<nums>
 <a>
  <n x="1"/>
  <n x="2"/>
  <n x="3"/>
  <n x="4"/>
 </a>
 <b>
  <n x="5"/>
  <n x="6"/>
  <n x="7"/>
  <n x="8"/>
 </b>
</nums>

XPath 表达式

//n[@x mod 2 = 0][1]

选择以下两个节点

<n x="2" />
<n x="6" />

XPath 表达式< /strong>:

(//n)[@x mod 2 = 0][1]

精确选择文档中具有所需属性的第一个 n 元素

<n x="2" />

首先尝试使用以下转换

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select="//n[@x mod 2 = 0][1]"/>
 </xsl:template>
</xsl:stylesheet>

结果是两个节点。

<n x="2" />
<n x="6" />

现在,如下所示更改 XPath 表达式,然后重试

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select="(//n)[@x mod 2 = 0][1]"/>
 </xsl:template>
</xsl:stylesheet>

结果就是我们真正想要的 - 中的第一个这样的 n 元素文档:

<n x="2" />

Now that I look again at this question, I think the real problem is not in iterating, but in using //.

This is a FAQ:

//p[@class='myclass'][1] 

selects every p element that has a class attribute with value "myclass" and that is the first such child of its parent. Therefore this expression may select many p elements, none of which is really the first such p element in the document.

When we want to get the first p element in the document that satisfies the above predicate, one correct expression is:

(//p)[@class='myclass'][1] 

Remember: The [] operator has a higher priority (precedence) than the // abbreviation.
WHanever you need to index the nodes selected by //, always put the expression to be indexed in brackets.

Here is a demonstration:

<nums>
 <a>
  <n x="1"/>
  <n x="2"/>
  <n x="3"/>
  <n x="4"/>
 </a>
 <b>
  <n x="5"/>
  <n x="6"/>
  <n x="7"/>
  <n x="8"/>
 </b>
</nums>

The XPath expression:

//n[@x mod 2 = 0][1]

selects the following two nodes:

<n x="2" />
<n x="6" />

The XPath expression:

(//n)[@x mod 2 = 0][1]

selects exactly the first n element in the document with the wanted property:

<n x="2" />

Try this first with the following transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select="//n[@x mod 2 = 0][1]"/>
 </xsl:template>
</xsl:stylesheet>

and the result is two nodes.

<n x="2" />
<n x="6" />

Now, change the XPath expression as below and try again:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select="(//n)[@x mod 2 = 0][1]"/>
 </xsl:template>
</xsl:stylesheet>

and the result is what we really wanted -- the first such n element in the document:

<n x="2" />
糖粟与秋泊 2024-09-17 04:23:54

也许您的所有此类的 div 都处于同一级别,因此通过 //p[@class='myclass'] 您会收到具有指定类的段落数组。所以你应该使用索引迭代它,即
//p[@class='myclass'][1], //p[@class='myclass'][2],...,//p[@class='myclass'][last()]

Maybe all your divs with this class are at the same level, so by //p[@class='myclass'] you receive the array of paragraphs with the specified class. So you should iterate through it using indexes, i.e.
//p[@class='myclass'][1], //p[@class='myclass'][2],...,//p[@class='myclass'][last()]

握住我的手 2024-09-17 04:23:54

我认为您没有将“索引”用于其真正目的。此选择中的 //p[selection][index] 语法实际上告诉您它应该是其父级中的哪个元素...因此 //p[selection][1]< /code> 表示您选择的 p 必须是其父级的第一个子级。 //p[selection][2] 表示它必须是第二个孩子。根据您的 html,这可能不是您想要的。

鉴于您使用的是 Selenium 和 Python,有几种方法可以完成您想要的操作,您可以查看 这个问题来查看它们(那里给出了两个选项,一个在 selenium Javascript 中,另一个使用服务器端 selenium 调用)。

I don't think you're using the "index" for it's real purpose. The //p[selection][index] syntax in this selection is actually telling you which element within its parent it should be... So //p[selection][1] is saying that your selected p must be the first child of its parent. //p[selection][2] is saying it must be the 2nd child. Depending on your html, it's likely this isn't what you want.

Given that you're using Selenium and Python, there's a couple ways to do what you want, and you can look at this question to see them (there are two options given there, one in selenium Javascript, the other using the server-side selenium calls).

皇甫轩 2024-09-17 04:23:54

这是一个可能对您有所帮助的 C# 代码片段。

这里的关键是 Selenium 函数 GetXpathCount()。它应该返回您正在查找的 Xpath 表达式的出现次数。

您可以在 XPather 或任何其他 Xpath 分析工具中输入 //p[@class='myclass'],以便您确实可以验证返回的多个结果。然后您只需迭代代码中的结果即可。

就我而言,需要迭代的是 UL 中的所有列表项 - 即 //li[@class='myclass']/ul/li - 所以根据您的要求应该是像这样的东西:

int numProductsInLeftNav = Convert.ToInt32(selenium.GetXpathCount("//p[@class='myclass']"));

List<string> productsInLeftNav = new List<string>();
for (int i = 1; i <= numProductsInLogOutLeftNav; i++) {
    string productName = selenium.GetText("//p[@class='myclass'][" + i + "]");
    productsInLogoutLeftNav.Add(productName);
}

Here's a C# code snippet that might help you out.

The key here is the Selenium function GetXpathCount(). It should return the number of occurrences of the Xpath expression you are looking for.

You can enter //p[@class='myclass'] in XPather or any other Xpath analysis tool so you can indeed verify multiple results are returned. Then you just iterate through the results in your code.

In my case, it was all the list items in an UL that needed to be iterated -i.e. //li[@class='myclass']/ul/li - so based on your requirements should be something like:

int numProductsInLeftNav = Convert.ToInt32(selenium.GetXpathCount("//p[@class='myclass']"));

List<string> productsInLeftNav = new List<string>();
for (int i = 1; i <= numProductsInLogOutLeftNav; i++) {
    string productName = selenium.GetText("//p[@class='myclass'][" + i + "]");
    productsInLogoutLeftNav.Add(productName);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文