Hpricot XML 文本搜索
Hpricot + Ruby XML 解析和逻辑选择。
目标:找到作者鲍勃写的所有标题。
我的 XML 文件:
<rss>
<channel>
<item>
<title>Book1</title>
<pubDate>march 1 2010</pubDate>
<author>Bob</author>
</item>
<item>
<title>book2</title>
<pubDate>october 4 2009</pubDate>
<author>Bill</author>
</item>
<item>
<title>book3</title>
<pubDate>June 5 2010</pubDate>
<author>Steve</author>
</item>
</channel>
</rss>
#my Hpricot, running this code returns no output, however the search pattern works on its own.
(doc % :rss % :channel / :item).each do |item|
a=item.search("author[text()*='Bob']")
#puts "FOUND" if a.include?"Bob"
puts item.at("title") if a.include?"Bob"
end
Hpricot + Ruby XML parsing and logical selection.
Objective: Find all title written by author Bob.
My XML file:
<rss>
<channel>
<item>
<title>Book1</title>
<pubDate>march 1 2010</pubDate>
<author>Bob</author>
</item>
<item>
<title>book2</title>
<pubDate>october 4 2009</pubDate>
<author>Bill</author>
</item>
<item>
<title>book3</title>
<pubDate>June 5 2010</pubDate>
<author>Steve</author>
</item>
</channel>
</rss>
#my Hpricot, running this code returns no output, however the search pattern works on its own.
(doc % :rss % :channel / :item).each do |item|
a=item.search("author[text()*='Bob']")
#puts "FOUND" if a.include?"Bob"
puts item.at("title") if a.include?"Bob"
end
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您还没有设置 Hpricot,这里有一种在 Nokogiri 中使用 XPath 执行此操作的方法:
编辑:@theTinMan 的 XPath 也运行良好,更具可读性,而且可能会更快:
If you're not set on Hpricot, here's one way to do this with XPath in Nokogiri:
Edit: @theTinMan's XPath also works well, is more readable, and may very well be faster:
XPath 背后的想法之一是它允许我们像磁盘目录一样导航 DOM:
这意味着:“查找 Bob 的所有书籍,然后查找一级并找到标题标签”。
我添加了一本“Bob”的额外书来测试所有出现的情况。
要获取包含 Bob 的书的项目,只需向后移动一个级别:
我还弄清楚了
(doc % :rss % :channel / :item)
正在做什么。它相当于嵌套搜索,减去包装括号,并且这些在 Hpricot-ese 中应该是相同的:因为
'//rss/channel/item'
是您通常看到 XPath 的方式访问器,而'rss Channel item'
是 CSS 访问器,我建议使用这些格式以进行维护和清晰。One of the ideas behind XPath is it allows us to navigate a DOM similarly to a disk directory:
That means: "find all the books by Bob, then look up one level and find the title tag".
I added an extra book by "Bob" to test getting all occurrences.
To get the item containing a book by Bob, just move back up a level:
I also figured out what
(doc % :rss % :channel / :item)
is doing. It's equivalent to nesting the searches, minus the wrapping parenthesis, and these should all be the same in Hpricot-ese:Because
'//rss/channel/item'
is how you'd normally see an XPath accessor, and'rss channel item'
is a CSS accessor, I'd recommend using those formats for maintenance and clarity.