正则表达式不适用于 Yahoo Pipes V2(适用于 V1)
我正在使用 Yahoo Pipes 来分析 RSS 提要。在每篇文章中,我想用正则表达式解析 HTML 代码,看看字符串“Total Songs”后面的行上的值是否大于 7。在所有文章中,代码的布局如下例所示(行在相同位置结束)。
这是我想做的一个例子。在下面的代码中,要提取的值应该是 10:
<table BORDER="0" WIDTH="100%"><tr><td><table border="0" width="100%" cellspacing="0" cellpadding="0">
<tr valign="top" align="left">
<td ALIGN="CENTER" WIDTH="166" VALIGN="TOP"><a rel="nofollow" target="_blank" href="http://itunes.apple.com/preorder/bn2-1tw/id449071164?uo=1&v0=9988"><img border="0" src="http://a2.mzstatic.com/us/r1000/091/Music/73/0e/f0/mzi.gxsvtfmh.100x100-75.jpg"/></a></td>
<td width="10"><img alt="" width="10" height="1" src="http://r.mzstatic.com/images/spacer.gif"/></td>
<td width="95%"><b><a rel="nofollow" target="_blank" href="http://itunes.apple.com/preorder/bn2-1tw/id449071164?uo=1&v0=9988">Bn2 1Tw</a></b><br>
<a rel="nofollow" target="_blank" href="http://itunes.apple.com/artist/kobana/id424122973?uo=1&v0=9988">Kobana & Yane3dots</a><br><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Expected Release Date:</b>
August 17, 2011<br>
</font><font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Total Songs:</b>
10</font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Genre:</b>
<a rel="nofollow" target="_blank" href="http://itunes.apple.com/genre/music-electronic/id7?uo=1&v0=9988">Electronic</a></font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Album Price:</b>
$1.99</font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Copyright</b>
Proton LLC</font></td>
</tr>
</table></td></tr>
</table>
对于 Yahoo Pipes 引擎的版本 1,我使用
(?<=Total.Songs\:.....)((8|9)|([1-9][0-9]+))
Which 曾经可以工作,但当时,我得到的 HTML 格式有点不同( Pipes 引擎在不同的位置插入了换行符)。现在我转向了 V2 引擎(这是必需的,因为他们将在 8 月 1 日逐步淘汰 V1),它不会提取任何内容。
我认为这与 和 10 之间的换行符有关,但即使我尝试了多种组合,我也找不到有效的组合。
有人可以帮助我吗?
谢谢
I'm using Yahoo Pipes to analyze an RSS feed. In each article, I want to parse HTML code with regex to see if the value on the line after the string "Total Songs" is bigger than 7. In all the articles, the code is layed out as in the example below (with lines ending at the same locations).
Here is an example of what I want to do. In the following code, the value to extract should be 10:
<table BORDER="0" WIDTH="100%"><tr><td><table border="0" width="100%" cellspacing="0" cellpadding="0">
<tr valign="top" align="left">
<td ALIGN="CENTER" WIDTH="166" VALIGN="TOP"><a rel="nofollow" target="_blank" href="http://itunes.apple.com/preorder/bn2-1tw/id449071164?uo=1&v0=9988"><img border="0" src="http://a2.mzstatic.com/us/r1000/091/Music/73/0e/f0/mzi.gxsvtfmh.100x100-75.jpg"/></a></td>
<td width="10"><img alt="" width="10" height="1" src="http://r.mzstatic.com/images/spacer.gif"/></td>
<td width="95%"><b><a rel="nofollow" target="_blank" href="http://itunes.apple.com/preorder/bn2-1tw/id449071164?uo=1&v0=9988">Bn2 1Tw</a></b><br>
<a rel="nofollow" target="_blank" href="http://itunes.apple.com/artist/kobana/id424122973?uo=1&v0=9988">Kobana & Yane3dots</a><br><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Expected Release Date:</b>
August 17, 2011<br>
</font><font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Total Songs:</b>
10</font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Genre:</b>
<a rel="nofollow" target="_blank" href="http://itunes.apple.com/genre/music-electronic/id7?uo=1&v0=9988">Electronic</a></font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Album Price:</b>
$1.99</font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Copyright</b>
Proton LLC</font></td>
</tr>
</table></td></tr>
</table>
With version 1 of the Yahoo Pipes engine, I used
(?<=Total.Songs\:.....)((8|9)|([1-9][0-9]+))
Which used to work, but back then, the HTML formatting I got was a little different (line breaks were inserted at different places than now by the Pipes engine). Now that I moved to V2 engine (which is a necessity since they are phasing out V1 on August 1st), it does not extract anything.
I think it has to do with the line break between the </b>
and the 10, but even though I tried multiple combinations, I could not find one that works.
Can anybody help me?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试这个正则表达式:
该数字将存储在第一个捕获组中。
Try this regex:
The number will be stored in the first capturing group.