正则表达式查找和替换 HTML 注释标签的内容

发布于 2024-07-11 22:22:04 字数 453 浏览 10 评论 0原文

我有一个 CMS,它使用基于 HTML 注释的语法来让用户插入 Flash 视频播放器、幻灯片和其他用户无法轻松编写的“硬”代码。

一部 FLV 电影的语法如下所示:

我使用以下代码:

$find_players = preg_match("/

如果只有一名玩家,这非常有用,$match[1] 包含文件名(这就是我所需要的)

我对正则表达式的了解正在消失,所以我无法调整它以获取多于一场比赛。

如果页面上有更多内容,它会完全崩溃,因为它匹配得太贪婪了(从第一个

I have a CMS that uses a syntax based on HTML comments to let the user insert flash video players, slideshows, and other 'hard' code that the user could not easily write.

The syntax for one FLV movies looks like this:
<!--PLAYER=filename.flv-->

I use this code:

$find_players = preg_match("/<!--PLAYER\=(.*)-->/si", $html_content, $match);

This works great if there is only one player, $match[1] contains the filename (which is all I need)

My knowledge of regex is vanishing, so I'm not able to adjust this to grab more than one match.

If there are more on the page, it breaks totally, because it matches too greedily (from the first <!--PLAYER to the last -->

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

巷雨优美回忆 2024-07-18 22:22:04

您可能需要正则表达式修饰符 U(PCRE_UNGREEDY,不贪婪地匹配)。 这将获取尽可能短的匹配,这意味着您不会从第一个 的结尾进行匹配。

一个简短的示例:

<?php
$text = "blah\n<!-x=abc->blah<!-x=def->blah\n\nblah<!-x=ghi->\nblahblah" ;
$reg  = "/<!-x=(.*)->/U" ;
preg_match_all( $reg, $text, $matches ) ;
print_r( $matches ) ;

您的代码将变为:

$find_players = preg_match_all("/<!--PLAYER=(.*)-->/Ui", $html_content, $matches);
// print $matches[1] ;

您正在使用的 's' 修饰符 (PCRE_DOTALL) 可能也没有帮助; 你不太可能有一个带有换行符的文件名。

编辑:@Stevens 建议使用这种语法,我同意这种语法稍微清晰一些 - 将 U 修饰符移动到捕获括号。

$find_players = preg_match_all("/<!--PLAYER=(?U)(.*)-->/i", $html_content, $matches);

You probably want the regex modifier U (PCRE_UNGREEDY, to match ungreedily). This will fetch the shortest possible match, meaning that you won't match from the beginning of the first <!--PLAYER= to the end of the last -->

An abbreviated example:

<?php
$text = "blah\n<!-x=abc->blah<!-x=def->blah\n\nblah<!-x=ghi->\nblahblah" ;
$reg  = "/<!-x=(.*)->/U" ;
preg_match_all( $reg, $text, $matches ) ;
print_r( $matches ) ;

Your code then becomes:

$find_players = preg_match_all("/<!--PLAYER=(.*)-->/Ui", $html_content, $matches);
// print $matches[1] ;

The 's' modifier (PCRE_DOTALL) you're using probably isn't helpful, either; you're unlikely to have a filename with a linebreak in it.

EDIT: @Stevens suggests this syntax, which I agree is slightly clearer - moving the U modifier to the capturing parentheses.

$find_players = preg_match_all("/<!--PLAYER=(?U)(.*)-->/i", $html_content, $matches);
空城仅有旧梦在 2024-07-18 22:22:04

使用正则表达式时,使用更具体的表达式通常比使用“惰性点”性能更高,“惰性点”通常会导致过度回溯。 您可以使用负向前瞻来实现相同的结果,而不会使正则表达式引擎负担过重:

$find_players = preg_match("/<!--PLAYER=((?:[^-]+|-(?!->))*)-->/ig", $html_content, $match);

请注意,使用惰性点不太可能在像这样的简单情况下导致明显的问题,但始终告诉正则表达式引擎<强>正是你的意思。 在这种情况下,您希望收集尽可能多的字符(“贪婪”)而不传递注释终止符。 终止符是一个破折号,后跟另一个破折号和一个大于号。 因此,我们允许使用任意数量的任何字符,除了破折号或启动注释终止符的破折号。

When working with regular expressions, it's typically more performant to use a more specific expression rather than a "lazy dot", which generally causes excessive backtracking. You can use a negative lookahead to achieve the same results without overburdening the regex engine:

$find_players = preg_match("/<!--PLAYER=((?:[^-]+|-(?!->))*)-->/ig", $html_content, $match);

Mind you, it's unlikely that using the lazy dot will cause noticeable problems with a simple case like this, but it's a good habit to always tell the regex engine exactly what you mean. In this case, you want to collect as many characters as possible ("greedy") without passing a comment terminator. A terminator is a dash followed by another dash and a greater-than sign. So, we allow any number of any character except dash or dashes that don't start a comment terminator.

欢烬 2024-07-18 22:22:04
$find_players = preg_match("/<!--PLAYER\=(.*?)-->/i", $html_content, $match);

(.*?)

应该可以正常工作。

$find_players = preg_match("/<!--PLAYER\=(.*?)-->/i", $html_content, $match);

(.*?)

should work just fine.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文