如何在 selenium 定位器中使用正则表达式

发布于 2024-08-04 16:33:51 字数 266 浏览 5 评论 0原文

我正在使用 selenium RC，例如，我想获取所有具有匹配属性 href 的链接元素：

http://[^/]*\d+com

我想使用：

sel.get_attribute( '//a[regx:match(@href, "http://[^/]*\d+.com")]/@name' )

这将返回与正则表达式匹配的所有链接的名称属性列表。（或类似的东西）

谢谢

原文

I'm using selenium RC and I would like, for example, to get all the links elements with attribute href that match:

http://[^/]*\d+com

I would like to use:

sel.get_attribute( '//a[regx:match(@href, "http://[^/]*\d+.com")]/@name' )

which would return a list of the name attribute of all the links that match the regex.
(or something like it)

thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

美人骨 2024-08-11 16:33:51

上面的答案可能是查找与正则表达式匹配的所有链接的正确方法，但我认为回答问题的其他部分（如何在 Xpath 定位器中使用正则表达式）也很有帮助。您需要使用正则表达式 matches() 函数，如下所示：（

xpath=//div[matches(@id,'che.*boxes')]

当然，这会单击带有“id=checkboxes”或“id=cheANYTHINGHEREboxes”的 div）

但请注意，不支持 matches 函数由 Xpath 的所有本机浏览器实现（最明显的是，在 FF3 中使用它会抛出错误：无效的 xpath[2]）。

如果您在使用特定浏览器时遇到问题（就像我对 FF3 所做的那样），请尝试使用 Selenium 的allowNativeXpath("false") 切换到 JavaScript Xpath 解释器。它会更慢，但它似乎可以与更多 Xpath 函数配合使用，包括“matches”和“ends-with”。 :)

The answer above is probably the right way to find ALL of the links that match a regex, but I thought it'd also be helpful to answer the other part of the question, how to use regex in Xpath locators. You need to use the regex matches() function, like this:

xpath=//div[matches(@id,'che.*boxes')]

(this, of course, would click the div with 'id=checkboxes', or 'id=cheANYTHINGHEREboxes')

Be aware, though, that the matches function is not supported by all native browser implementations of Xpath (most conspicuously, using this in FF3 will throw an error: invalid xpath[2]).

If you have trouble with your particular browser (as I did with FF3), try using Selenium's allowNativeXpath("false") to switch over to the JavaScript Xpath interpreter. It'll be slower, but it does seem to work with more Xpath functions, including 'matches' and 'ends-with'. :)

回复收藏 0 原文

初心未许 2024-08-11 16:33:51

您可以使用 Selenium 命令 getAllLinks 获取页面上链接 id 的数组，然后您可以循环遍历该数组并使用 getAttribute 检查 href，该数组采用定位器后跟 @ 和属性名称。例如，在 Java 中，这可能是：

String[] allLinks = session().getAllLinks();
List<String> matchingLinks = new ArrayList<String>();

for (String linkId : allLinks) {
    String linkHref = selenium.getAttribute("id=" + linkId + "@href");
    if (linkHref.matches("http://[^/]*\\d+.com")) {
        matchingLinks.add(link);
    }
}

You can use the Selenium command getAllLinks to get an array of the ids of links on the page, which you could then loop through and check the href using the getAttribute, which takes the locator followed by an @ and the attribute name. For example in Java this might be:

String[] allLinks = session().getAllLinks();
List<String> matchingLinks = new ArrayList<String>();

for (String linkId : allLinks) {
    String linkHref = selenium.getAttribute("id=" + linkId + "@href");
    if (linkHref.matches("http://[^/]*\\d+.com")) {
        matchingLinks.add(link);
    }
}

回复收藏 0 原文

爱的十字路口 2024-08-11 16:33:51

一个可能的解决方案是使用 sel.get_eval() 并编写一个返回链接列表的 JS 脚本。像下面的答案：
硒：是否有可能在 selenium 定位器中使用正则表达式

回复收藏 0 原文

命比纸薄 2024-08-11 16:33:51

这里还有一些 Selenium RC 的替代方法。这些不是纯粹的 Selenium 解决方案，它们允许与您的编程语言数据结构和 Selenium 进行交互。

您还可以获取 HTML 页面源代码，然后使用正则表达式该源代码返回一组匹配的链接。使用正则表达式分组来分隔 URL、链接文本/ID 等，然后您可以将它们传递回 Selenium 以单击或导航到。

另一种方法是获取父/根元素的 HTML 页面源或innerHTML（通过 DOM 定位器），然后将 HTML 转换为 XML 作为编程语言中的 DOM 对象。然后，您可以使用所需的 XPath（无论是否使用正则表达式）遍历 DOM，并获取仅包含感兴趣的链接的节点集。从它们解析出链接文本/ID 或 URL，您可以传回 Selenium 以单击或导航到。

根据要求，我在下面提供示例。它是混合语言，因为该帖子似乎并没有特定于语言。我只是使用我可以使用的东西来组合示例。它们没有经过完全测试或根本没有经过测试，但我之前在其他项目中使用过一些代码，因此这些是概念验证代码示例，说明如何实现我刚才提到的解决方案。

//Example of element attribute processing by page source and regex (in PHP)
$pgSrc = $sel->getPageSource();
//simple hyperlink extraction via regex below, replace with better regex pattern as desired
preg_match_all("/<a.+href=\"(.+)\"/",$pgSrc,$matches,PREG_PATTERN_ORDER);
//$matches is a 2D array, $matches[0] is array of whole string matched, $matches[1] is array of what's in parenthesis
//you either get an array of all matched link URL values in parenthesis capture group or an empty array
$links = count($matches) >= 2 ? $matches[1] : array();
//now do as you wish, iterating over all link URLs
//NOTE: these are URLs only, not actual hyperlink elements

//Example of XML DOM parsing with Selenium RC (in Java)
String locator = "id=someElement";
String htmlSrcSubset = sel.getEval("this.browserbot.findElement(\""+locator+"\").innerHTML");
//using JSoup XML parser library for Java, see jsoup.org
Document doc = Jsoup.parse(htmlSrcSubset);
/* once you have this document object, can then manipulate & traverse
it as an XML/HTML node tree. I'm not going to go into details on this
as you'd need to know XML DOM traversal and XPath (not just for finding locators).
But this tutorial URL will give you some ideas:

http://jsoup.org/cookbook/extracting-data/dom-navigation

the example there seems to indicate first getting the element/node defined
by content tag within the "document" or source, then from there get all
hyperlink elements/nodes and then traverse that as a list/array, doing
whatever you want with an object oriented approach for each element in
the array. Each element is an XML node with properties. If you study it,
you'd find this approach gives you the power/access that WebDriver/Selenium 2
now gives you with WebElements but the example here is what you can do in
Selenium RC to get similar WebElement kind of capability
*/

Here's some alternate methods as well for Selenium RC. These aren't pure Selenium solutions, they allow interaction with your programming language data structures and Selenium.

You can also get get HTML page source, then regular expression the source to return a match set of links. Use regex grouping to separate out URLs, link text/ID, etc. and you can then pass them back to selenium to click on or navigate to.

Another method is get HTML page source or innerHTML (via DOM locators) of a parent/root element then convert the HTML to XML as DOM object in your programming language. You can then traverse the DOM with desired XPath (with regular expression or not), and obtain a nodeset of only the links of interest. From their parse out the link text/ID or URL and you can pass back to selenium to click on or navigate to.

Upon request, I'm providing examples below. It's mixed languages since the post didn't appear to be language specific anyways. I'm just using what I had available to hack together for examples. They aren't fully tested or tested at all, but I've worked with bits of the code before in other projects, so these are proof of concept code examples of how you'd implement the solutions I just mentioned.

//Example of element attribute processing by page source and regex (in PHP)
$pgSrc = $sel->getPageSource();
//simple hyperlink extraction via regex below, replace with better regex pattern as desired
preg_match_all("/<a.+href=\"(.+)\"/",$pgSrc,$matches,PREG_PATTERN_ORDER);
//$matches is a 2D array, $matches[0] is array of whole string matched, $matches[1] is array of what's in parenthesis
//you either get an array of all matched link URL values in parenthesis capture group or an empty array
$links = count($matches) >= 2 ? $matches[1] : array();
//now do as you wish, iterating over all link URLs
//NOTE: these are URLs only, not actual hyperlink elements

//Example of XML DOM parsing with Selenium RC (in Java)
String locator = "id=someElement";
String htmlSrcSubset = sel.getEval("this.browserbot.findElement(\""+locator+"\").innerHTML");
//using JSoup XML parser library for Java, see jsoup.org
Document doc = Jsoup.parse(htmlSrcSubset);
/* once you have this document object, can then manipulate & traverse
it as an XML/HTML node tree. I'm not going to go into details on this
as you'd need to know XML DOM traversal and XPath (not just for finding locators).
But this tutorial URL will give you some ideas:

http://jsoup.org/cookbook/extracting-data/dom-navigation

the example there seems to indicate first getting the element/node defined
by content tag within the "document" or source, then from there get all
hyperlink elements/nodes and then traverse that as a list/array, doing
whatever you want with an object oriented approach for each element in
the array. Each element is an XML node with properties. If you study it,
you'd find this approach gives you the power/access that WebDriver/Selenium 2
now gives you with WebElements but the example here is what you can do in
Selenium RC to get similar WebElement kind of capability
*/

回复收藏 0 原文

你如我软肋 2024-08-11 16:33:51

Selenium 的 By.Id 和 By.CssSelector 方法不支持 Regex，而 By.XPath 仅在启用 XPath 2.0 的情况下支持。如果您想使用正则表达式，您可以执行以下操作：

void MyCallingMethod(IWebDriver driver)
{
    //Search by ID:
    string attrName = "id";
    //Regex = 'a number that is 1-10 digits long'
    string attrRegex= "[0-9]{1,10}";
    SearchByAttribute(driver, attrName, attrRegex);
}
IEnumerable<IWebElement> SearchByAttribute(IWebDriver driver, string attrName, string attrRegex)
{    
     List<IWebElement> elements = new List<IWebElement>();

     //Allows spaces around equal sign. Ex: id = 55
     string searchString = attrName +"\\s*=\\s*\"" + attrRegex +"\"";
     //Search page source
     MatchCollection matches = Regex.Matches(driver.PageSource, searchString, RegexOptions.IgnoreCase);
    //iterate over matches
    foreach (Match match in matches)
    {
        //Get exact attribute value
        Match innerMatch = Regex.Match(match.Value, attrRegex);
        cssSelector = "[" + attrName + "=" + attrRegex + "]";
       //Find element by exact attribute value
       elements.Add(driver.FindElement(By.CssSelector(cssSelector)));
   }

   return elements;
}

注意：此代码未经测试。此外，您可以通过找出消除第二次搜索的方法来优化此方法。

Selenium's By.Id and By.CssSelector methods do not support Regex and By.XPath only does where XPath 2.0 is enabled. If you want to use Regex, you can do something like this:

void MyCallingMethod(IWebDriver driver)
{
    //Search by ID:
    string attrName = "id";
    //Regex = 'a number that is 1-10 digits long'
    string attrRegex= "[0-9]{1,10}";
    SearchByAttribute(driver, attrName, attrRegex);
}
IEnumerable<IWebElement> SearchByAttribute(IWebDriver driver, string attrName, string attrRegex)
{    
     List<IWebElement> elements = new List<IWebElement>();

     //Allows spaces around equal sign. Ex: id = 55
     string searchString = attrName +"\\s*=\\s*\"" + attrRegex +"\"";
     //Search page source
     MatchCollection matches = Regex.Matches(driver.PageSource, searchString, RegexOptions.IgnoreCase);
    //iterate over matches
    foreach (Match match in matches)
    {
        //Get exact attribute value
        Match innerMatch = Regex.Match(match.Value, attrRegex);
        cssSelector = "[" + attrName + "=" + attrRegex + "]";
       //Find element by exact attribute value
       elements.Add(driver.FindElement(By.CssSelector(cssSelector)));
   }

   return elements;
}

Note: this code is untested. Also, you can optimize this method by figuring out a way to eliminate the second search.

回复收藏 0 原文

~没有更多了~