YouTube 网址 - 正则表达式

发布于 2024-12-08 03:34:44 字数 3825 浏览 0 评论 0原文

我的 antisamy 策略文件中有以下配置:

旧 YouTube 对象:

<object width="1280" height="720">
<param 
    name="movie" 
    value="http://www.youtube.com/v/Hl-zzrqQoSE
           ?version=3
           &amp;hl=en_US
           &amp;rel=0">
</param>
<param name="allowFullScreen" value="true">
</param>
<param name="allowscriptaccess" value="always">
</param>
<embed src="http://www.youtube.com/v/Hl-zzrqQoSE
            ?version=3
            &amp;hl=en_US
            &amp;rel=0" 
       type="application/x-shockwave-flash" 
       width="1280" 
       height="720" 
       allowscriptaccess="always" 
       allowfullscreen="true">
 </embed>
 </object>

AntiSamy 配置:

 <common-regexps>
     <regexp name="YouTubeURL" value="(\s)*(http(s?)://)www.youtube.com/v/[\p{L}\p{N}]+[\p{L}\p{N}\p{Zs}\.\#@\$%\+&amp;;:\-_~,\?=/!]*(\s)*"/>
 ....

<!-- Tags related to YouTube -->
<tag name="object" action="validate">
<attribute name="height"/>
<attribute name="width"/>
<attribute name="type">
    <literal-list>
        <literal value="application/x-shockwave-flash"/>
    </literal-list>
</attribute>
<attribute name="data">
    <regexp-list>
        <regexp name="YouTubeURL"/>
    </regexp-list>
</attribute>
</tag>
<tag name="embed" action="validate">
<attribute name="height"/>
<attribute name="width"/>
<attribute name="type">
    <literal-list>
        <literal value="application/x-shockwave-flash"/>
    </literal-list>
</attribute>
<attribute name="allowfullscreen">
    <regexp-list>
        <regexp name="boolean"/>
    </regexp-list>
</attribute>
<attribute name="allowscriptaccess">
    <literal-list>
        <literal value="always"/>
    </literal-list>
</attribute>
<attribute name="src">
    <regexp-list>
        <regexp name="YouTubeURL"/>
    </regexp-list>
</attribute>
<attribute name="movie">
    <regexp-list>
        <regexp name="YouTubeURL"/>
    </regexp-list>
</attribute>
</tag>

目前我在 iframe 上的配置:

    <!-- Frame & related tags -->

    <tag name="iframe" action="remove"/>
    <tag name="frameset" action="remove"/>
    <tag name="frame" action="remove"/>

新 YouTube iframe:

<iframe 
    width="1280" 
    height="720" 
    <!--   src="https://www.youtube-nocookie.com/embed/Hl-zzrqQoSE"  -->
    src="https://www.youtube.com/embed/Hl-zzrqQoSE" 
    frameborder="0" 
    allowfullscreen>
</iframe>

我认为 iframe 的代码应该是这样的:

<tag name="iframe" action="validate">
        <attribute name="height"/>
        <attribute name="width"/>
        <attribute name="frameborder"/>
        <attribute name="src">
            <regexp-list>
                <regexp name="YouTubeURL"/>
            </regexp-list>
        </attribute>

        <attribute name="allowfullscreen">
            <regexp-list>
                <regexp name="boolean"/>
            </regexp-list>
        </attribute>
</tag>

如何更改正则表达式,以便它接受新旧链接如:

    https://www.youtube-nocookie.com/embed/Hl-zzrqQoSE
    https://www.youtube.com/embed/Hl-zzrqQoSE
    https://www.youtube.com/v/Hl-zzrqQoSE
    http://www.youtube-nocookie.com/v/Hl-zzrqQoSE?version=3&amp;hl=en_US&amp;rel=0
    http://www.youtube.com/v/Hl-zzrqQoSE?version=3&amp;hl=en_US&amp;rel=0"

I have following config in my antisamy policy file:

Old YouTube Object:

<object width="1280" height="720">
<param 
    name="movie" 
    value="http://www.youtube.com/v/Hl-zzrqQoSE
           ?version=3
           &hl=en_US
           &rel=0">
</param>
<param name="allowFullScreen" value="true">
</param>
<param name="allowscriptaccess" value="always">
</param>
<embed src="http://www.youtube.com/v/Hl-zzrqQoSE
            ?version=3
            &hl=en_US
            &rel=0" 
       type="application/x-shockwave-flash" 
       width="1280" 
       height="720" 
       allowscriptaccess="always" 
       allowfullscreen="true">
 </embed>
 </object>

The AntiSamy config:

 <common-regexps>
     <regexp name="YouTubeURL" value="(\s)*(http(s?)://)www.youtube.com/v/[\p{L}\p{N}]+[\p{L}\p{N}\p{Zs}\.\#@\$%\+&;:\-_~,\?=/!]*(\s)*"/>
 ....

<!-- Tags related to YouTube -->
<tag name="object" action="validate">
<attribute name="height"/>
<attribute name="width"/>
<attribute name="type">
    <literal-list>
        <literal value="application/x-shockwave-flash"/>
    </literal-list>
</attribute>
<attribute name="data">
    <regexp-list>
        <regexp name="YouTubeURL"/>
    </regexp-list>
</attribute>
</tag>
<tag name="embed" action="validate">
<attribute name="height"/>
<attribute name="width"/>
<attribute name="type">
    <literal-list>
        <literal value="application/x-shockwave-flash"/>
    </literal-list>
</attribute>
<attribute name="allowfullscreen">
    <regexp-list>
        <regexp name="boolean"/>
    </regexp-list>
</attribute>
<attribute name="allowscriptaccess">
    <literal-list>
        <literal value="always"/>
    </literal-list>
</attribute>
<attribute name="src">
    <regexp-list>
        <regexp name="YouTubeURL"/>
    </regexp-list>
</attribute>
<attribute name="movie">
    <regexp-list>
        <regexp name="YouTubeURL"/>
    </regexp-list>
</attribute>
</tag>

Currently my config on iframe:

    <!-- Frame & related tags -->

    <tag name="iframe" action="remove"/>
    <tag name="frameset" action="remove"/>
    <tag name="frame" action="remove"/>

The new YouTube iframe:

<iframe 
    width="1280" 
    height="720" 
    <!--   src="https://www.youtube-nocookie.com/embed/Hl-zzrqQoSE"  -->
    src="https://www.youtube.com/embed/Hl-zzrqQoSE" 
    frameborder="0" 
    allowfullscreen>
</iframe>

I figure the code for iframe should like this:

<tag name="iframe" action="validate">
        <attribute name="height"/>
        <attribute name="width"/>
        <attribute name="frameborder"/>
        <attribute name="src">
            <regexp-list>
                <regexp name="YouTubeURL"/>
            </regexp-list>
        </attribute>

        <attribute name="allowfullscreen">
            <regexp-list>
                <regexp name="boolean"/>
            </regexp-list>
        </attribute>
</tag>

How do you change the regex so it will accept the old and new links like:

    https://www.youtube-nocookie.com/embed/Hl-zzrqQoSE
    https://www.youtube.com/embed/Hl-zzrqQoSE
    https://www.youtube.com/v/Hl-zzrqQoSE
    http://www.youtube-nocookie.com/v/Hl-zzrqQoSE?version=3&hl=en_US&rel=0
    http://www.youtube.com/v/Hl-zzrqQoSE?version=3&hl=en_US&rel=0"

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

(り薆情海 2024-12-15 03:34:44
\s*(https?://)www.youtube(-nocookie)?.com/(?:v|embed)/[\p{L}\p{N}]+[\p{L}\p{N}\p{Zs}.#@$%+&;:_~,?=!/-]*\s*

我冒昧地删除了不必要的捕获组、逃逸和角色。

尽管我个人会使用类似的方法:

\s*(https?://www.youtube(?:-nocookie)?.com/(?:v|embed)/([a-zA-Z0-9-]+).*)

这会将整个 youtube URL 放入匹配组 0 中,将视频 ID 放入匹配组 1 中。
此外,当 youtube 的 URL 不包含 unicode 字符时,使用 unicode 属性也没有多大意义。

演示:http://rubular.com/r/jv4zO9ys2L

\s*(https?://)www.youtube(-nocookie)?.com/(?:v|embed)/[\p{L}\p{N}]+[\p{L}\p{N}\p{Zs}.#@$%+&;:_~,?=!/-]*\s*

I took the liberty to remove unnecessary capture groups, escapes and characters.

Although I personally would use something like:

\s*(https?://www.youtube(?:-nocookie)?.com/(?:v|embed)/([a-zA-Z0-9-]+).*)

That puts the entire youtube URL in match group 0 and the video id in match group 1.
Also it doesn't make a whole lot of sense to use unicode properties when youtube's URLs don't contain unicode characters.

Demo: http://rubular.com/r/jv4zO9ys2L

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文