正则表达式预匹配“有时”不正确

发布于 2024-10-10 02:37:55 字数 1977 浏览 3 评论 0原文

我有一个带有正则表达式的 preg_match_all ,它应该获取 YouTube 的视频编号并将其放入数组中,因此 YouTube 视频越多,它生成的数组就越多。这是正确的结果:

C1

Array ( [0] => j5-yKhDd64s ) 1Array ( [0] => j5-yKhDd64s ) 1Array ( [0] => j5-yKhDd64s ) 1 

现在,我的问题是有时会发生这种情况:

C2

Array ( [0] => _dKtoRU7Tlk http://www.youtube.com/watch?v=_dKtoRU7Tlk http://www.youtube.com/watch?v=_dKtoRU7Tlk ) 1 

看到区别了吗?在C1处,它在数组中获取正确的视频ID,因此,C2抓取一个,然后失败并将其余部分放入数组中。

C1 youtube 链接如下:

http://www.youtube.com/观看?v=j5-yKhDd64s&feature=email&email=comment_reply_received http://www.youtube.com/watch?v=j5-yKhDd64s&feature=email&email=comment_reply_received http://www.youtube.com/watch?v =j5-yKhDd64s&feature=email&email=comment_reply_received

C2 youtube 链接如下:

http://www.youtube.com/watch?v=_dKtoRU7Tlk http://www.youtube.com/watch?v=_dKtoRU7Tlk http://www.youtube.com/watch?v=_dKtoRU7Tlk

区别在于C1 中有 &feature...。我认为这是因为我的正则表达式不是完全最优的?

    if (preg_match_all("#(?<=v=)[a-zA-Z0-9-]+(?=&)|(?<=[0-9]/)[^&\n]+|(?<=v=)[^&\n]+#", $content, $matches, PREG_SET_ORDER)) {
      foreach($matches as $m) {
   echo print_r($m);
      }
    }
    $nContent = preg_replace("#(?:https?://)?(?:www\.)?youtube\.com/(?:[^\s]*)#", '', $content);
    echo $nContent; 

我该如何解决这个问题?谢谢你!

I have a preg_match_all with a regexpression, that should take a youtube´s video number and place it in the array, so the more youtube videos there is, the more arrays it makes. Here's the result, that are correct:

C1

Array ( [0] => j5-yKhDd64s ) 1Array ( [0] => j5-yKhDd64s ) 1Array ( [0] => j5-yKhDd64s ) 1 

Now, my problem is sometimes this occurs:

C2

Array ( [0] => _dKtoRU7Tlk http://www.youtube.com/watch?v=_dKtoRU7Tlk http://www.youtube.com/watch?v=_dKtoRU7Tlk ) 1 

See the difference? At C1 it takes the video id correct in the ararys and so, the C2 grabs one and then fails and takes the rest in to the array.

The C1 youtube links was like this:

http://www.youtube.com/watch?v=j5-yKhDd64s&feature=email&email=comment_reply_received
http://www.youtube.com/watch?v=j5-yKhDd64s&feature=email&email=comment_reply_received
http://www.youtube.com/watch?v=j5-yKhDd64s&feature=email&email=comment_reply_received

The C2 youtube links was like this:

http://www.youtube.com/watch?v=_dKtoRU7Tlk
http://www.youtube.com/watch?v=_dKtoRU7Tlk
http://www.youtube.com/watch?v=_dKtoRU7Tlk

the difference is that there is the &feature... in the C1. I think its because my regex isnt fully optimal?

    if (preg_match_all("#(?<=v=)[a-zA-Z0-9-]+(?=&)|(?<=[0-9]/)[^&\n]+|(?<=v=)[^&\n]+#", $content, $matches, PREG_SET_ORDER)) {
      foreach($matches as $m) {
   echo print_r($m);
      }
    }
    $nContent = preg_replace("#(?:https?://)?(?:www\.)?youtube\.com/(?:[^\s]*)#", '', $content);
    echo $nContent; 

How can i fix this? Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

剪不断理还乱 2024-10-17 02:37:55

您的正则表达式:

#(?<=v=)[a-zA-Z0-9-]+(?=&)|(?<=[0-9]/)[^&\n]+|(?<=v=)[^&\n]+#

归结为三个替代部分:

(?<=v=)[a-zA-Z0-9-]+(?=&)
(?<=[0-9]/)[^&\n]+
(?<=v=)[^&\n]+

(?<=...) 称为后向断言,在其中两个部分中,您会看到它查找 v=.

在第一个替代方案中,它查找 [a-zA-Z0-9-]+ 后跟 &。 (这是一个前瞻断言:(?=...)

第二种选择在这种情况下不适用。

在第三种选择中,它会查找任何内容,直到命中 &\n

您的示例不适合其中任何一个。最简单的修复方法是将最后一部分更改为:

(?<=v=)[^&\n]+

以便

(?<=v=)[^&\s]+

它将停止匹配 & 或任何空格 (\s)。

或者更好的建议:只需重写整个内容,以正常方式真正解析 url,从而避免将来出现一些麻烦。

Your regexp:

#(?<=v=)[a-zA-Z0-9-]+(?=&)|(?<=[0-9]/)[^&\n]+|(?<=v=)[^&\n]+#

boils down to three alternative parts:

(?<=v=)[a-zA-Z0-9-]+(?=&)
(?<=[0-9]/)[^&\n]+
(?<=v=)[^&\n]+

The (?<=...) is called a lookbehind assertion and in two of these parts you see it looks for v=.

In the first alternative, it looks for [a-zA-Z0-9-]+ followed by &. (which is a lookahead assertion: (?=...))

The second alternative doesn't apply in this case.

In the third alternative, it looks for anything until hitting & or \n.

Your example doesn't fit correctly on any of those. The easiest fix would be to change the last part:

(?<=v=)[^&\n]+

to

(?<=v=)[^&\s]+

so it will stop matching on & or any whitespace (\s).

Or a better advice: just rewrite the whole thing to really parse the url in a normal way, saving some headaches in the future.

叫嚣ゝ 2024-10-17 02:37:55

以下是 mvds 的回答和评论:

$parsed_url = parse_url("http://www.youtube.com/watch?v=j5-yKhDd64s&feature=email&email=comment_reply_received");
parse_str($parsed_url["query"],$output);
echo $output['v'];

Following mvds's answer and comments:

$parsed_url = parse_url("http://www.youtube.com/watch?v=j5-yKhDd64s&feature=email&email=comment_reply_received");
parse_str($parsed_url["query"],$output);
echo $output['v'];
失与倦" 2024-10-17 02:37:55

编辑:这个可以找出任何 YouTube 视频链接,
更改了它,使其在空格、换行符或“&”处停止

希望这能给你一个开始

"{youtube.com/watch[?]v=([a-z0-9_-]*?)[^&\s]+}i"

Edit: this one fishes out any youtube video link,
changed it so it stops on whitespace, linebreak or "&"

hope this gives you a start

"{youtube.com/watch[?]v=([a-z0-9_-]*?)[^&\s]+}i"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文