正则表达式预匹配“有时”不正确
我有一个带有正则表达式的 preg_match_all ,它应该获取 YouTube 的视频编号并将其放入数组中,因此 YouTube 视频越多,它生成的数组就越多。这是正确的结果:
C1
Array ( [0] => j5-yKhDd64s ) 1Array ( [0] => j5-yKhDd64s ) 1Array ( [0] => j5-yKhDd64s ) 1
现在,我的问题是有时会发生这种情况:
C2
Array ( [0] => _dKtoRU7Tlk http://www.youtube.com/watch?v=_dKtoRU7Tlk http://www.youtube.com/watch?v=_dKtoRU7Tlk ) 1
看到区别了吗?在C1处,它在数组中获取正确的视频ID,因此,C2抓取一个,然后失败并将其余部分放入数组中。
C1 youtube 链接如下:
http://www.youtube.com/观看?v=j5-yKhDd64s&feature=email&email=comment_reply_received http://www.youtube.com/watch?v=j5-yKhDd64s&feature=email&email=comment_reply_received http://www.youtube.com/watch?v =j5-yKhDd64s&feature=email&email=comment_reply_received
C2 youtube 链接如下:
http://www.youtube.com/watch?v=_dKtoRU7Tlk http://www.youtube.com/watch?v=_dKtoRU7Tlk http://www.youtube.com/watch?v=_dKtoRU7Tlk
区别在于C1 中有 &feature...。我认为这是因为我的正则表达式不是完全最优的?
if (preg_match_all("#(?<=v=)[a-zA-Z0-9-]+(?=&)|(?<=[0-9]/)[^&\n]+|(?<=v=)[^&\n]+#", $content, $matches, PREG_SET_ORDER)) {
foreach($matches as $m) {
echo print_r($m);
}
}
$nContent = preg_replace("#(?:https?://)?(?:www\.)?youtube\.com/(?:[^\s]*)#", '', $content);
echo $nContent;
我该如何解决这个问题?谢谢你!
I have a preg_match_all with a regexpression, that should take a youtube´s video number and place it in the array, so the more youtube videos there is, the more arrays it makes. Here's the result, that are correct:
C1
Array ( [0] => j5-yKhDd64s ) 1Array ( [0] => j5-yKhDd64s ) 1Array ( [0] => j5-yKhDd64s ) 1
Now, my problem is sometimes this occurs:
C2
Array ( [0] => _dKtoRU7Tlk http://www.youtube.com/watch?v=_dKtoRU7Tlk http://www.youtube.com/watch?v=_dKtoRU7Tlk ) 1
See the difference? At C1 it takes the video id correct in the ararys and so, the C2 grabs one and then fails and takes the rest in to the array.
The C1 youtube links was like this:
http://www.youtube.com/watch?v=j5-yKhDd64s&feature=email&email=comment_reply_received
http://www.youtube.com/watch?v=j5-yKhDd64s&feature=email&email=comment_reply_received
http://www.youtube.com/watch?v=j5-yKhDd64s&feature=email&email=comment_reply_received
The C2 youtube links was like this:
http://www.youtube.com/watch?v=_dKtoRU7Tlk
http://www.youtube.com/watch?v=_dKtoRU7Tlk
http://www.youtube.com/watch?v=_dKtoRU7Tlk
the difference is that there is the &feature... in the C1. I think its because my regex isnt fully optimal?
if (preg_match_all("#(?<=v=)[a-zA-Z0-9-]+(?=&)|(?<=[0-9]/)[^&\n]+|(?<=v=)[^&\n]+#", $content, $matches, PREG_SET_ORDER)) {
foreach($matches as $m) {
echo print_r($m);
}
}
$nContent = preg_replace("#(?:https?://)?(?:www\.)?youtube\.com/(?:[^\s]*)#", '', $content);
echo $nContent;
How can i fix this? Thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您的正则表达式:
归结为三个替代部分:
(?<=...)
称为后向断言,在其中两个部分中,您会看到它查找v=.
在第一个替代方案中,它查找
[a-zA-Z0-9-]+
后跟&
。 (这是一个前瞻断言:(?=...)
)第二种选择在这种情况下不适用。
在第三种选择中,它会查找任何内容,直到命中
&
或\n
。您的示例不适合其中任何一个。最简单的修复方法是将最后一部分更改为:
以便
它将停止匹配
&
或任何空格 (\s
)。或者更好的建议:只需重写整个内容,以正常方式真正解析 url,从而避免将来出现一些麻烦。
Your regexp:
boils down to three alternative parts:
The
(?<=...)
is called a lookbehind assertion and in two of these parts you see it looks forv=
.In the first alternative, it looks for
[a-zA-Z0-9-]+
followed by&
. (which is a lookahead assertion:(?=...)
)The second alternative doesn't apply in this case.
In the third alternative, it looks for anything until hitting
&
or\n
.Your example doesn't fit correctly on any of those. The easiest fix would be to change the last part:
to
so it will stop matching on
&
or any whitespace (\s
).Or a better advice: just rewrite the whole thing to really parse the url in a normal way, saving some headaches in the future.
以下是 mvds 的回答和评论:
Following mvds's answer and comments:
编辑:这个可以找出任何 YouTube 视频链接,
更改了它,使其在空格、换行符或“&”处停止
希望这能给你一个开始
Edit: this one fishes out any youtube video link,
changed it so it stops on whitespace, linebreak or "&"
hope this gives you a start