youtube正则表达式吞掉剩余的文本

发布于 2024-12-05 07:22:27 字数 825 浏览 1 评论 0原文

我正在对一段文本执行 preg_match_all 和 str_replace 来获取 YouTube-url 并将其替换为正确的嵌入代码。

假设我有以下文本块：

"bla bla bla bla <-youtube-url-> last few words"

一切正常 - youtube-url 被嵌入代码等替换。但是，运行 str_replace 后，“最后几个单词”从最终输出中消失。我怀疑正则表达式吞掉了 url 之后的所有内容...这就是我用来匹配和提取 YouTube ID 的内容：

%(?:youtube\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})%i

任何帮助将不胜感激！

更新：

我刚刚发现只有当 youtube url 有任何尾随参数时才会出现问题。下面的输入吞掉了最后几个单词：

'www.youtube.com/watch?v=XXXXXXXXX&parameter=data last few words'

但是如果输入是这样的：

'www.youtube.com/watch?v=XXXXXXXXX last few words'

它工作得很好。任何人都可以帮助对正则表达式进行所需的调整吗？

原文

I'm doing preg_match_all and str_replace on a block of text to grab YouTube-urls and replace them with the correct embed code.

Let's say I have the following block of text:

"bla bla bla bla <-youtube-url-> last few words"

Everything works fine - the youtube-url is replaced with the embed code etc. However, the "last few words" disappears from the final output after str_replace is run. I'm suspecting that the regex is swallowing everything after the url... This is what I'm using to match and extract YouTube ID's:

%(?:youtube\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})%i

Any help would be greatly appreciated!

Update:

I just discovered that the problem only happens if the youtube url has any trailing parameters. The following input swallows last few words:

'www.youtube.com/watch?v=XXXXXXXXX¶meter=data last few words'

But if the input is like this:

'www.youtube.com/watch?v=XXXXXXXXX last few words'

it works fine. Can anyone help with the needed adjustments for the regular expression?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

贩梦商人 2024-12-12 07:22:28

我的不好。正如我最初怀疑的那样，正则表达式没有问题。

我将用户输入传递给 PHP 处理程序，而不通过 encodeURIComponent()< 转义输入/a> 首先。因此，处理程序假定 ¶meter=data 是下一个输入参数 - 导致 POST 变量损坏。

抱歉我的无能，感谢大家的帮助！

回复收藏 0 原文

且行且努力 2024-12-12 07:22:27

我通常会分解复杂的交替来找出发生了什么。
看来您可能对最后一个术语 [^"&?/ ]{11} 有疑问，但不确定
你想做什么。（下面是 Perl 语言）

$samp = 'www.youtube.com/watch?v=XXXXXXXXX¶meter=data last few words';

$regex = qr%

(?:
    youtube\.com/
    (?:
        ( [^/]+/.+/ )    # 1
      | 
        (                # 2 
            v
          | e(?:mbed)?/
        )
      |
        ( .*[?&]v= )     # 3
    )
  |

    ( youtu\.be/ )     #4
)

( [^"&?/ ]{1,11} )     # 5, was {11}

(.*)$                  # 6 the remainder

%xi;


if ( $samp =~ /$regex/ )
{
  # just print what matched
    print "all: 'amp;' \n";
    print "1:   '$1' \n";
    print "2:   '$2' \n";
    print "3:   '$3' \n";
    print "4:   '$4' \n";
    print "5:   '$5' \n";
    print "6:   '$6' \n";
}

输出：

all: 'youtube.com/watch?v=XXXXXXXXX¶meter=data last few words'
1:   ''
2:   ''
3:   'watch?v='
4:   ''
5:   'XXXXXXXXX'
6:   '¶meter=data last few words'

I usually break up complicated alternations to find out whats going on.
It appears you might have trouple with the last term [^"&?/ ]{11}, but not sure
what you are trying to do. (below is in Perl)

$samp = 'www.youtube.com/watch?v=XXXXXXXXX¶meter=data last few words';

$regex = qr%

(?:
    youtube\.com/
    (?:
        ( [^/]+/.+/ )    # 1
      | 
        (                # 2 
            v
          | e(?:mbed)?/
        )
      |
        ( .*[?&]v= )     # 3
    )
  |

    ( youtu\.be/ )     #4
)

( [^"&?/ ]{1,11} )     # 5, was {11}

(.*)$                  # 6 the remainder

%xi;


if ( $samp =~ /$regex/ )
{
  # just print what matched
    print "all: 'amp;' \n";
    print "1:   '$1' \n";
    print "2:   '$2' \n";
    print "3:   '$3' \n";
    print "4:   '$4' \n";
    print "5:   '$5' \n";
    print "6:   '$6' \n";
}

Output:

all: 'youtube.com/watch?v=XXXXXXXXX¶meter=data last few words'
1:   ''
2:   ''
3:   'watch?v='
4:   ''
5:   'XXXXXXXXX'
6:   '¶meter=data last few words'

回复收藏 0 原文

毅然前行 2024-12-12 07:22:27

将 .+ 更改为 \S+，这样就不会捕获空格作为正则表达式的一部分。

%(?:youtube\.com/(?:[^/]+/\S+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})%i

.* 捕获了整行，而正则表达式的其余部分没有执行任何操作。

Change the .+ to \S+ so that you don't capture whitespace as part of the regex.

%(?:youtube\.com/(?:[^/]+/\S+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})%i

The .* was capturing the entire line, and the rest of your regex wasn't doing anything.

回复收藏 0 原文

心意如水 2024-12-12 07:22:27

我不清楚你到底想做什么。但我建议您尝试使用正则表达式测试工具 - 例如这个，但还有其他的。它可以让您直观地检查正则表达式的结果。

在此处输入图像描述

回复收藏 0 原文

~没有更多了~

关于作者

jJeQQOZ5

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

youtube正则表达式吞掉剩余的文本

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

Gabu-gabumon

qq_CgiN62

荔枝明

赏烟花じ飞满天

独守阴晴ぅ圆缺

¤→小豸慧

友情链接

youtube正则表达式吞掉剩余的文本

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

Gabu-gabumon

qq_CgiN62

荔枝明

赏烟花じ飞满天

独守阴晴ぅ圆缺

¤→小豸慧

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。