使用 PHP 去除 HTML 注释但保留条件

发布于 2024-07-24 19:41:19 字数 803 浏览 8 评论 0原文

我目前正在使用 PHP 和正则表达式来删除页面中的所有 HTML 注释。 剧本运作良好……有点太好了。 它删除了所有评论,包括我在 . 这就是我所得到的:

<?php
  function callback($buffer)
  {
        return preg_replace('/<!--(.|\s)*?-->/', '', $buffer);
  }

  ob_start("callback");
?>
... HTML source goes here ...
<?php ob_end_flush(); ?>

由于我的正则表达式不太热,我在尝试找出如何修改模式以排除条件注释时遇到了麻烦,例如:

<!--[if !IE]><!-->
<link rel="stylesheet" href="/css/screen.css" type="text/css" media="screen" />
<!-- <![endif]-->

<!--[if IE 7]>
<link rel="stylesheet" href="/css/ie7.css" type="text/css" media="screen" />
<![endif]-->

<!--[if IE 6]>
<link rel="stylesheet" href="/css/ie6.css" type="text/css" media="screen" />
<![endif]-->

干杯

I'm currently using PHP and a regular expression to strip out all HTML comments from a page. The script works well... a little too well. It strips out all comments including my conditional comments in the . Here's what I've got:

<?php
  function callback($buffer)
  {
        return preg_replace('/<!--(.|\s)*?-->/', '', $buffer);
  }

  ob_start("callback");
?>
... HTML source goes here ...
<?php ob_end_flush(); ?>

Since my regex isn't too hot I'm having trouble trying to figure out how to modify the pattern to exclude Conditional comments such as:

<!--[if !IE]><!-->
<link rel="stylesheet" href="/css/screen.css" type="text/css" media="screen" />
<!-- <![endif]-->

<!--[if IE 7]>
<link rel="stylesheet" href="/css/ie7.css" type="text/css" media="screen" />
<![endif]-->

<!--[if IE 6]>
<link rel="stylesheet" href="/css/ie6.css" type="text/css" media="screen" />
<![endif]-->

Cheers

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

披肩女神 2024-07-31 19:41:19

由于注释不能嵌套在 HTML 中,因此理论上,正则表达式可以完成这项工作。 尽管如此,使用某种解析器将是更好的选择,特别是如果您的输入不能保证格式良好。

这是我的尝试。 为了只匹配正常的评论,这可以工作。 它已经变成了一个怪物,对此感到抱歉。 我已经对其进行了广泛的测试,看起来效果很好,但我不提供任何保证。

<!--(?!\s*(?:\[if [^\]]+]|<!|>))(?:(?!-->).)*-->

说明:

<!--                #01: "<!--"
(?!                 #02: look-ahead: a position not followed by:
  \s*               #03:   any number of space
  (?:               #04:   non-capturing group, any of:
    \[if [^\]]+]    #05:     "[if ...]"
    |<!             #06:     or "<!"
    |>              #07:     or ">"
  )                 #08:   end non-capturing group
)                   #09: end look-ahead
(?:                 #10: non-capturing group:
  (?!-->)           #11:   a position not followed by "-->"
  .                 #12:   eat the following char, it's part of the comment
)*                  #13: end non-capturing group, repeat
-->                 #14: "-->"

步骤 #02 和 #11 至关重要。 #02 确保以下字符不表示条件注释。 之后,#11 确保后面的字符不指示注释的结束,而 #12 和 #13 导致实际匹配。

使用“global”和“dotall”标志应用。

要执行相反的操作(仅匹配条件注释),将类似于以下内容:

<!(--)?(?=\[)(?:(?!<!\[endif\]\1>).)*<!\[endif\]\1>

说明:

<!                  #01: "<!"
(--)?               #02: two dashes, optional
(?=\[)              #03: a position followed by "["
(?:                 #04: non-capturing group:
  (?!               #05:   a position not followed by
    <!\[endif\]\1>  #06:     "<![endif]>" or "<![endif]-->" (depends on #02)
  )                 #07:   end of look-ahead
  .                 #08:   eat the following char, it's part of the comment
)*                  #09: end of non-capturing group, repeat
<!\[endif\]\1>      #10: "<![endif]>" or "<![endif]-->" (depends on #02)

再次应用“global”和“dotall”标志。

步骤#02是因为“downlevel-revealed”语法,请参阅:“MSDN - 关于条件注释”

我不完全确定哪里允许或期望有空格。 在适当的位置将 \s* 添加到表达式中。

Since comments cannot be nested in HTML, a regex can do the job, in theory. Still, using some kind of parser would be the better choice, especially if your input is not guaranteed to be well-formed.

Here is my attempt at it. To match only normal comments, this would work. It has become quite a monster, sorry for that. I have tested it quite extensively, it seems to do it well, but I give no warranty.

<!--(?!\s*(?:\[if [^\]]+]|<!|>))(?:(?!-->).)*-->

Explanation:

<!--                #01: "<!--"
(?!                 #02: look-ahead: a position not followed by:
  \s*               #03:   any number of space
  (?:               #04:   non-capturing group, any of:
    \[if [^\]]+]    #05:     "[if ...]"
    |<!             #06:     or "<!"
    |>              #07:     or ">"
  )                 #08:   end non-capturing group
)                   #09: end look-ahead
(?:                 #10: non-capturing group:
  (?!-->)           #11:   a position not followed by "-->"
  .                 #12:   eat the following char, it's part of the comment
)*                  #13: end non-capturing group, repeat
-->                 #14: "-->"

Steps #02 and #11 are crucial. #02 makes sure that the following characters do not indicate a conditional comment. After that, #11 makes sure that the following characters do not indicate the end of the comment, while #12 and #13 cause the actual matching.

Apply with "global" and "dotall" flags.

To do the opposite (match only conditional comments), it would be something like this:

<!(--)?(?=\[)(?:(?!<!\[endif\]\1>).)*<!\[endif\]\1>

Explanation:

<!                  #01: "<!"
(--)?               #02: two dashes, optional
(?=\[)              #03: a position followed by "["
(?:                 #04: non-capturing group:
  (?!               #05:   a position not followed by
    <!\[endif\]\1>  #06:     "<![endif]>" or "<![endif]-->" (depends on #02)
  )                 #07:   end of look-ahead
  .                 #08:   eat the following char, it's part of the comment
)*                  #09: end of non-capturing group, repeat
<!\[endif\]\1>      #10: "<![endif]>" or "<![endif]-->" (depends on #02)

Again, apply with "global" and "dotall" flags.

Step #02 is because of the "downlevel-revealed" syntax, see: "MSDN - About Conditional Comments".

I'm not entirely sure where spaces are allowed or expected. Add \s* to the expression where appropriate.

抚你发端 2024-07-31 19:41:19

如果您无法让它与一个正则表达式一起使用,或者您发现想要保留更多注释,您可以使用 preg_replace_callback。 然后,您可以定义一个函数来单独处理注释。

<?php
function callback($buffer) {
    return preg_replace_callback('/<!--.*-->/U', 'comment_replace_func', $buffer);
}

function comment_replace_func($m) {
    if (preg_match( '/^\<\!--\[if \!/i', $m[0])) {
        return $m[0];   
    }              

    return '';
}   

ob_start("callback");
?>

... HTML source goes here ...

<?php ob_end_flush(); ?>

If you can't get it to work with one regular expression or you find you want to preserve more comments you could use preg_replace_callback. You can then define a function to handle the comments individually.

<?php
function callback($buffer) {
    return preg_replace_callback('/<!--.*-->/U', 'comment_replace_func', $buffer);
}

function comment_replace_func($m) {
    if (preg_match( '/^\<\!--\[if \!/i', $m[0])) {
        return $m[0];   
    }              

    return '';
}   

ob_start("callback");
?>

... HTML source goes here ...

<?php ob_end_flush(); ?>
伴随着你 2024-07-31 19:41:19

总而言之,这似乎是最好的解决方案:

<?php
  function callback($buffer) {
    return preg_replace('/<!--[^\[](.|\s)*?-->/', '', $buffer);
  }
  ob_start("callback");
?>
... HTML source goes here ...
<?php ob_end_flush(); ?>

它删除了所有注释并留下条件语句,但最上面的条件语句除外:

<!--[if !IE]><!-->
<link rel="stylesheet" href="/css/screen.css" type="text/css" media="screen" />
<!-- <![endif]-->

附加语句似乎导致了问题。

如果有人可以建议正则表达式考虑到这一点并保留该条件,那么那就完美了。

Tomalak的解决方案看起来不错,但作为一个新手,没有进一步的指导方针,我不知道如何实现它,尽管我想尝试一下,如果有人可以详细说明如何应用它?

谢谢

In summary this seems to be the best solution:

<?php
  function callback($buffer) {
    return preg_replace('/<!--[^\[](.|\s)*?-->/', '', $buffer);
  }
  ob_start("callback");
?>
... HTML source goes here ...
<?php ob_end_flush(); ?>

It strips out all comments and leaves conditionals with the exception of the top one:

<!--[if !IE]><!-->
<link rel="stylesheet" href="/css/screen.css" type="text/css" media="screen" />
<!-- <![endif]-->

where the additional seems to be causing the problem.

If anyone can suggest the regex which would take this into account and leave that condtional in place too then that would be perfect.

Tomalak's solution looks good but as a newbie and no further guidelines I don't know how to implement it although I would like to try it if anyone can elaborate on how to apply it?

Thanks

蓝眸 2024-07-31 19:41:19

我不确定 PHP 的正则表达式引擎是否会喜欢以下内容,但请尝试以下模式:

'/<!--(.|\s)*(\[if .*\]){0}(.|\s)*?-->/'

I'm not sure if PHP's regex engine will like the following, but try this pattern:

'/<!--(.|\s)*(\[if .*\]){0}(.|\s)*?-->/'
绝情姑娘 2024-07-31 19:41:19

像这样的东西可能会起作用:

/<!--[^\[](.|\s)*?-->/

它与您的相同,只是它忽略紧跟在评论开始标记后面的左括号的评论。

Something like this might work:

/<!--[^\[](.|\s)*?-->/

It's the same as yours, except that it ignores comments have an opening bracket immediately following the comment start tag.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文