通过类属性的部分匹配获取所有元素

发布于 2024-11-09 05:56:38 字数 737 浏览 0 评论 0原文

我正在尝试使用 Nokogiri 显示 URL 的结果。（本质上是抓取一个 URL）。

我有一些类似于以下内容的 HTML：

<p class="mattFacer">Matty</p>
<p class="mattSmith">Matthew</p>
<p class="suzieSmith">Suzie</p>

所以我需要找到以“matt”一词开头的所有元素。我需要做的是保存元素的值和元素名称，以便下次可以引用它..所以我需要捕获

"Matty" and "<p class='mattFacer'>"
"Matthew" and "<p class='mattSmith'>"

我还没有弄清楚如何捕获元素 HTML，但这是我到目前为止所拥有的对于元素（它不起作用！）

doc = Nokogiri::HTML(open(url))
tmp = ""
doc.xpath("[class*=matt").each do |item|
    tmp += item.text
end

@testy2 = tmp

原文

I'm trying to use Nokogiri to display results from a URL. (essentially scraping a URL).

I have some HTML which is similar to:

<p class="mattFacer">Matty</p>
<p class="mattSmith">Matthew</p>
<p class="suzieSmith">Suzie</p>

So I need to then find all the elements which begin with the word "matt". What I need to do is save the value of the element and the element name so I can reference it next time.. so I need to capture

"Matty" and "<p class='mattFacer'>"
"Matthew" and "<p class='mattSmith'>"

I haven't worked out how to capture the element HTML, but here's what I have so far for the element (It doesnt work!)

doc = Nokogiri::HTML(open(url))
tmp = ""
doc.xpath("[class*=matt").each do |item|
    tmp += item.text
end

@testy2 = tmp

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

扬花落满肩 2024-11-16 05:56:38

这应该可以帮助您开始：

doc.xpath('//p[starts-with(@class, "matt")]').each do |el|
  p [el.attributes['class'].value, el.children[0].text]
end
["mattFacer", "Matty"]
["mattSmith", "Matthew"]

This should get you started:

doc.xpath('//p[starts-with(@class, "matt")]').each do |el|
  p [el.attributes['class'].value, el.children[0].text]
end
["mattFacer", "Matty"]
["mattSmith", "Matthew"]

回复收藏 0 原文

拿命拼未来 2024-11-16 05:56:38

doc = Nokogiri::HTML(open(url))
tmp = ""
items = doc.css("p[class*=matt]").map(&:text).join

doc = Nokogiri::HTML(open(url))
tmp = ""
items = doc.css("p[class*=matt]").map(&:text).join

回复收藏 0 原文

妄断弥空 2024-11-16 05:56:38

使用：

/*/p[starts-with(@class, 'matt')] | /*/p[starts-with(@class, 'matt')]/text()

这会选择作为 XML 文档顶部元素的子元素且其 class 属性值以 < 开头的任何 p 元素。 code>"matt" 以及任何此类 p 元素的任何文本节点子节点。

根据此 XML 文档进行评估时（未提供任何内容！）：

<html>
    <p class="mattFacer">Matty</p>
    <p class="mattSmith">Matthew</p>
    <p class="suzieSmith">Suzie</p>
</html>

选择以下节点（每个节点位于单独的行上）并且可以按位置访问：

<p class="mattFacer">Matty</p>
Matty
<p class="mattSmith">Matthew</p>
Matthew

此处是一个快速的 XSLT 验证：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:for-each select=
  "/*/p[starts-with(@class, 'matt')]
  |
   /*/p[starts-with(@class, 'matt')]/text()
  ">
  <xsl:copy-of select="."/>
  <xsl:text>
</xsl:text>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

当应用于同一个 XML 文档（上面）时，此转换的结果是所选节点的预期正确序列：

<p class="mattFacer">Matty</p>
Matty
<p class="mattSmith">Matthew</p>
Matthew

Use:

/*/p[starts-with(@class, 'matt')] | /*/p[starts-with(@class, 'matt')]/text()

This selects any p elements that is a child of the top element of the XML document and the value of whose class attribute starts with "matt" and any text-node child of any such p element.

When evaluated against this XML document (none was provided!):

<html>
    <p class="mattFacer">Matty</p>
    <p class="mattSmith">Matthew</p>
    <p class="suzieSmith">Suzie</p>
</html>

the following nodes are selected (each on a separate line) and can be accessed by position:

<p class="mattFacer">Matty</p>
Matty
<p class="mattSmith">Matthew</p>
Matthew

Here is a quick XSLT verification:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:for-each select=
  "/*/p[starts-with(@class, 'matt')]
  |
   /*/p[starts-with(@class, 'matt')]/text()
  ">
  <xsl:copy-of select="."/>
  <xsl:text>
</xsl:text>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

The result of this transformation, when applied on the same XML document (above) is the expected, correct sequence of selected nodes:

<p class="mattFacer">Matty</p>
Matty
<p class="mattSmith">Matthew</p>
Matthew

回复收藏 0 原文

街角迷惘 2024-11-16 05:56:38

接受的答案很好，但另一种方法是使用 Nikkou，它可以让您通过正则表达式进行匹配（无需熟悉 XPATH 函数）：

doc.attr_matches('class', /^matt/).collect do |item|
  [item.attributes['class'].value, item.text]
end

The accepted answer is great, but another approach would be to use Nikkou, which lets you match via regular expressions (without needing to be familiar with XPATH functions):

doc.attr_matches('class', /^matt/).collect do |item|
  [item.attributes['class'].value, item.text]
end

回复收藏 0 原文

~没有更多了~

关于作者

久随

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

通过类属性的部分匹配获取所有元素

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

通过类属性的部分匹配获取所有元素

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。