正则表达式不适用于 Yahoo Pipes V2（适用于 V1）

发布于 2024-11-26 12:40:29 字数 2360 浏览 3 评论 0原文

我正在使用 Yahoo Pipes 来分析 RSS 提要。在每篇文章中，我想用正则表达式解析 HTML 代码，看看字符串“Total Songs”后面的行上的值是否大于 7。在所有文章中，代码的布局如下例所示（行在相同位置结束）。

这是我想做的一个例子。在下面的代码中，要提取的值应该是 10：

<table BORDER="0" WIDTH="100%"><tr><td><table border="0" width="100%" cellspacing="0" cellpadding="0">
<tr valign="top" align="left">
<td ALIGN="CENTER" WIDTH="166" VALIGN="TOP"><a rel="nofollow" target="_blank" href="http://itunes.apple.com/preorder/bn2-1tw/id449071164?uo=1&v0=9988"><img border="0" src="http://a2.mzstatic.com/us/r1000/091/Music/73/0e/f0/mzi.gxsvtfmh.100x100-75.jpg"/></a></td>
<td width="10"><img alt="" width="10" height="1" src="http://r.mzstatic.com/images/spacer.gif"/></td>
<td width="95%"><b><a rel="nofollow" target="_blank" href="http://itunes.apple.com/preorder/bn2-1tw/id449071164?uo=1&v0=9988">Bn2 1Tw</a></b><br>
<a rel="nofollow" target="_blank" href="http://itunes.apple.com/artist/kobana/id424122973?uo=1&v0=9988">Kobana &amp; Yane3dots</a><br><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Expected Release Date:</b>
August 17, 2011<br>
</font><font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Total Songs:</b>
10</font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Genre:</b>
<a rel="nofollow" target="_blank" href="http://itunes.apple.com/genre/music-electronic/id7?uo=1&v0=9988">Electronic</a></font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Album Price:</b>
$1.99</font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Copyright</b>
Proton LLC</font></td>
</tr>
</table></td></tr>
</table>

对于 Yahoo Pipes 引擎的版本 1，我使用

(?<=Total.Songs\:.....)((8|9)|([1-9][0-9]+))

Which 曾经可以工作，但当时，我得到的 HTML 格式有点不同（ Pipes 引擎在不同的位置插入了换行符）。现在我转向了 V2 引擎（这是必需的，因为他们将在 8 月 1 日逐步淘汰 V1），它不会提取任何内容。

我认为这与和 10 之间的换行符有关，但即使我尝试了多种组合，我也找不到有效的组合。

有人可以帮助我吗？

谢谢

原文

I'm using Yahoo Pipes to analyze an RSS feed. In each article, I want to parse HTML code with regex to see if the value on the line after the string "Total Songs" is bigger than 7. In all the articles, the code is layed out as in the example below (with lines ending at the same locations).

Here is an example of what I want to do. In the following code, the value to extract should be 10:

<table BORDER="0" WIDTH="100%"><tr><td><table border="0" width="100%" cellspacing="0" cellpadding="0">
<tr valign="top" align="left">
<td ALIGN="CENTER" WIDTH="166" VALIGN="TOP"><a rel="nofollow" target="_blank" href="http://itunes.apple.com/preorder/bn2-1tw/id449071164?uo=1&v0=9988"><img border="0" src="http://a2.mzstatic.com/us/r1000/091/Music/73/0e/f0/mzi.gxsvtfmh.100x100-75.jpg"/></a></td>
<td width="10"><img alt="" width="10" height="1" src="http://r.mzstatic.com/images/spacer.gif"/></td>
<td width="95%"><b><a rel="nofollow" target="_blank" href="http://itunes.apple.com/preorder/bn2-1tw/id449071164?uo=1&v0=9988">Bn2 1Tw</a></b><br>
<a rel="nofollow" target="_blank" href="http://itunes.apple.com/artist/kobana/id424122973?uo=1&v0=9988">Kobana & Yane3dots</a><br><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Expected Release Date:</b>
August 17, 2011<br>
</font><font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Total Songs:</b>
10</font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Genre:</b>
<a rel="nofollow" target="_blank" href="http://itunes.apple.com/genre/music-electronic/id7?uo=1&v0=9988">Electronic</a></font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Album Price:</b>
$1.99</font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Copyright</b>
Proton LLC</font></td>
</tr>
</table></td></tr>
</table>

With version 1 of the Yahoo Pipes engine, I used

(?<=Total.Songs\:.....)((8|9)|([1-9][0-9]+))

Which used to work, but back then, the HTML formatting I got was a little different (line breaks were inserted at different places than now by the Pipes engine). Now that I moved to V2 engine (which is a necessity since they are phasing out V1 on August 1st), it does not extract anything.

I think it has to do with the line break between the </b> and the 10, but even though I tried multiple combinations, I could not find one that works.

Can anybody help me?

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

桃气十足 2024-12-03 12:40:29

尝试这个正则表达式：

Total Songs:\D*((?!0*[0-7](?!\d))\d+)(?!\d)

该数字将存储在第一个捕获组中。

Try this regex:

Total Songs:\D*((?!0*[0-7](?!\d))\d+)(?!\d)

The number will be stored in the first capturing group.

回复收藏 0 原文

~没有更多了~

关于作者

别忘他

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

正则表达式不适用于 Yahoo Pipes V2（适用于 V1）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

正则表达式不适用于 Yahoo Pipes V2（适用于 V1）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。