正则表达式不适用于 Yahoo Pipes V2(适用于 V1)

发布于 2024-11-26 12:40:29 字数 2360 浏览 3 评论 0原文

我正在使用 Yahoo Pipes 来分析 RSS 提要。在每篇文章中,我想用正则表达式解析 HTML 代码,看看字符串“Total Songs”后面的行上的值是否大于 7。在所有文章中,代码的布局如下例所示(行在相同位置结束)。

这是我想做的一个例子。在下面的代码中,要提取的值应该是 10

<table BORDER="0" WIDTH="100%"><tr><td><table border="0" width="100%" cellspacing="0" cellpadding="0">
<tr valign="top" align="left">
<td ALIGN="CENTER" WIDTH="166" VALIGN="TOP"><a rel="nofollow" target="_blank" href="http://itunes.apple.com/preorder/bn2-1tw/id449071164?uo=1&v0=9988"><img border="0" src="http://a2.mzstatic.com/us/r1000/091/Music/73/0e/f0/mzi.gxsvtfmh.100x100-75.jpg"/></a></td>
<td width="10"><img alt="" width="10" height="1" src="http://r.mzstatic.com/images/spacer.gif"/></td>
<td width="95%"><b><a rel="nofollow" target="_blank" href="http://itunes.apple.com/preorder/bn2-1tw/id449071164?uo=1&v0=9988">Bn2 1Tw</a></b><br>
<a rel="nofollow" target="_blank" href="http://itunes.apple.com/artist/kobana/id424122973?uo=1&v0=9988">Kobana &amp; Yane3dots</a><br><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Expected Release Date:</b>
August 17, 2011<br>
</font><font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Total Songs:</b>
10</font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Genre:</b>
<a rel="nofollow" target="_blank" href="http://itunes.apple.com/genre/music-electronic/id7?uo=1&v0=9988">Electronic</a></font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Album Price:</b>
$1.99</font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Copyright</b>
Proton LLC</font></td>
</tr>
</table></td></tr>
</table>

对于 Yahoo Pipes 引擎的版本 1,我使用

(?<=Total.Songs\:.....)((8|9)|([1-9][0-9]+))

Which 曾经可以工作,但当时,我得到的 HTML 格式有点不同( Pipes 引擎在不同的位置插入了换行符)。现在我转向了 V2 引擎(这是必需的,因为他们将在 8 月 1 日逐步淘汰 V1),它不会提取任何内容。

我认为这与 10 之间的换行符有关,但即使我尝试了多种组合,我也找不到有效的组合。

有人可以帮助我吗?

谢谢

I'm using Yahoo Pipes to analyze an RSS feed. In each article, I want to parse HTML code with regex to see if the value on the line after the string "Total Songs" is bigger than 7. In all the articles, the code is layed out as in the example below (with lines ending at the same locations).

Here is an example of what I want to do. In the following code, the value to extract should be 10:

<table BORDER="0" WIDTH="100%"><tr><td><table border="0" width="100%" cellspacing="0" cellpadding="0">
<tr valign="top" align="left">
<td ALIGN="CENTER" WIDTH="166" VALIGN="TOP"><a rel="nofollow" target="_blank" href="http://itunes.apple.com/preorder/bn2-1tw/id449071164?uo=1&v0=9988"><img border="0" src="http://a2.mzstatic.com/us/r1000/091/Music/73/0e/f0/mzi.gxsvtfmh.100x100-75.jpg"/></a></td>
<td width="10"><img alt="" width="10" height="1" src="http://r.mzstatic.com/images/spacer.gif"/></td>
<td width="95%"><b><a rel="nofollow" target="_blank" href="http://itunes.apple.com/preorder/bn2-1tw/id449071164?uo=1&v0=9988">Bn2 1Tw</a></b><br>
<a rel="nofollow" target="_blank" href="http://itunes.apple.com/artist/kobana/id424122973?uo=1&v0=9988">Kobana & Yane3dots</a><br><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Expected Release Date:</b>
August 17, 2011<br>
</font><font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Total Songs:</b>
10</font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Genre:</b>
<a rel="nofollow" target="_blank" href="http://itunes.apple.com/genre/music-electronic/id7?uo=1&v0=9988">Electronic</a></font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Album Price:</b>
$1.99</font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Copyright</b>
Proton LLC</font></td>
</tr>
</table></td></tr>
</table>

With version 1 of the Yahoo Pipes engine, I used

(?<=Total.Songs\:.....)((8|9)|([1-9][0-9]+))

Which used to work, but back then, the HTML formatting I got was a little different (line breaks were inserted at different places than now by the Pipes engine). Now that I moved to V2 engine (which is a necessity since they are phasing out V1 on August 1st), it does not extract anything.

I think it has to do with the line break between the </b> and the 10, but even though I tried multiple combinations, I could not find one that works.

Can anybody help me?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

桃气十足 2024-12-03 12:40:29

尝试这个正则表达式:

Total Songs:\D*((?!0*[0-7](?!\d))\d+)(?!\d)

该数字将存储在第一个捕获组中。

Try this regex:

Total Songs:\D*((?!0*[0-7](?!\d))\d+)(?!\d)

The number will be stored in the first capturing group.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文