youtube正则表达式吞掉剩余的文本

发布于 2024-12-05 07:22:27 字数 825 浏览 1 评论 0原文

我正在对一段文本执行 preg_match_allstr_replace 来获取 YouTube-url 并将其替换为正确的嵌入代码。

假设我有以下文本块:

"bla bla bla bla <-youtube-url-> last few words"

一切正常 - youtube-url 被嵌入代码等替换。但是,运行 str_replace 后,“最后几个单词”从最终输出中消失。我怀疑正则表达式吞掉了 url 之后的所有内容...这就是我用来匹配和提取 YouTube ID 的内容:

%(?:youtube\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})%i

任何帮助将不胜感激!

更新:

我刚刚发现只有当 youtube url 有任何尾随参数时才会出现问题。下面的输入吞掉了最后几个单词

'www.youtube.com/watch?v=XXXXXXXXX&parameter=data last few words'

但是如果输入是这样的:

'www.youtube.com/watch?v=XXXXXXXXX last few words'

它工作得很好。任何人都可以帮助对正则表达式进行所需的调整吗?

I'm doing preg_match_all and str_replace on a block of text to grab YouTube-urls and replace them with the correct embed code.

Let's say I have the following block of text:

"bla bla bla bla <-youtube-url-> last few words"

Everything works fine - the youtube-url is replaced with the embed code etc. However, the "last few words" disappears from the final output after str_replace is run. I'm suspecting that the regex is swallowing everything after the url... This is what I'm using to match and extract YouTube ID's:

%(?:youtube\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})%i

Any help would be greatly appreciated!

Update:

I just discovered that the problem only happens if the youtube url has any trailing parameters. The following input swallows last few words:

'www.youtube.com/watch?v=XXXXXXXXX¶meter=data last few words'

But if the input is like this:

'www.youtube.com/watch?v=XXXXXXXXX last few words'

it works fine. Can anyone help with the needed adjustments for the regular expression?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

贩梦商人 2024-12-12 07:22:28

我的不好。正如我最初怀疑的那样,正则表达式没有问题。

我将用户输入传递给 PHP 处理程序,而不通过 encodeURIComponent()< 转义输入/a> 首先。因此,处理程序假定 ¶meter=data 是下一个输入参数 - 导致 POST 变量损坏。

抱歉我的无能,感谢大家的帮助!

My bad. There was no problem with the regex, as I first suspected.

I was passing the user input to the PHP handler without escaping the input via encodeURIComponent() first. Thus, the handler assumed ¶meter=data was the next input parameter - resulting in a broken POST variable.

Sorry for my incompetence, and thanks for all the help!

且行且努力 2024-12-12 07:22:27

我通常会分解复杂的交替来找出发生了什么。
看来您可能对最后一个术语 [^"&?/ ]{11} 有疑问,但不确定
你想做什么。 (下面是 Perl 语言)

$samp = 'www.youtube.com/watch?v=XXXXXXXXX¶meter=data last few words';

$regex = qr%

(?:
    youtube\.com/
    (?:
        ( [^/]+/.+/ )    # 1
      | 
        (                # 2 
            v
          | e(?:mbed)?/
        )
      |
        ( .*[?&]v= )     # 3
    )
  |

    ( youtu\.be/ )     #4
)

( [^"&?/ ]{1,11} )     # 5, was {11}

(.*)$                  # 6 the remainder

%xi;


if ( $samp =~ /$regex/ )
{
  # just print what matched
    print "all: '
amp;' \n";
    print "1:   '$1' \n";
    print "2:   '$2' \n";
    print "3:   '$3' \n";
    print "4:   '$4' \n";
    print "5:   '$5' \n";
    print "6:   '$6' \n";
}

输出:

all: 'youtube.com/watch?v=XXXXXXXXX¶meter=data last few words'
1:   ''
2:   ''
3:   'watch?v='
4:   ''
5:   'XXXXXXXXX'
6:   '¶meter=data last few words'

I usually break up complicated alternations to find out whats going on.
It appears you might have trouple with the last term [^"&?/ ]{11}, but not sure
what you are trying to do. (below is in Perl)

$samp = 'www.youtube.com/watch?v=XXXXXXXXX¶meter=data last few words';

$regex = qr%

(?:
    youtube\.com/
    (?:
        ( [^/]+/.+/ )    # 1
      | 
        (                # 2 
            v
          | e(?:mbed)?/
        )
      |
        ( .*[?&]v= )     # 3
    )
  |

    ( youtu\.be/ )     #4
)

( [^"&?/ ]{1,11} )     # 5, was {11}

(.*)$                  # 6 the remainder

%xi;


if ( $samp =~ /$regex/ )
{
  # just print what matched
    print "all: '
amp;' \n";
    print "1:   '$1' \n";
    print "2:   '$2' \n";
    print "3:   '$3' \n";
    print "4:   '$4' \n";
    print "5:   '$5' \n";
    print "6:   '$6' \n";
}

Output:

all: 'youtube.com/watch?v=XXXXXXXXX¶meter=data last few words'
1:   ''
2:   ''
3:   'watch?v='
4:   ''
5:   'XXXXXXXXX'
6:   '¶meter=data last few words'
毅然前行 2024-12-12 07:22:27

.+ 更改为 \S+,这样就不会捕获空格作为正则表达式的一部分。

%(?:youtube\.com/(?:[^/]+/\S+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})%i

.* 捕获了整行,而正则表达式的其余部分没有执行任何操作。

Change the .+ to \S+ so that you don't capture whitespace as part of the regex.

%(?:youtube\.com/(?:[^/]+/\S+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})%i

The .* was capturing the entire line, and the rest of your regex wasn't doing anything.

心意如水 2024-12-12 07:22:27

我不清楚你到底想做什么。但我建议您尝试使用正则表达式测试工具 - 例如 这个,但还有其他的。它可以让您直观地检查正则表达式的结果。

在此处输入图像描述

I'm not clear on what exactly you are trying to do. But I suggest that you try a regex tester tool - like this one, but there are others. it lets you visually examine the results of regex.

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文