正则表达式去除注释、多行注释和空行

发布于 2024-07-14 19:54:10 字数 308 浏览 7 评论 0原文

我想解析一个文件,我想使用 php 和 regex 来删除:

  • 空白或空行
  • 单行注释
  • 多行注释

基本上我想删除任何包含

/* text */ 

或多行注释的

/***
some
text
*****/

行如果可能的话,另一个正则表达式来检查该行是否是空(删除空行)

可以吗? 有人可以向我发布一个可以做到这一点的正则表达式吗?

多谢。

I want to parse a file and I want to use php and regex to strip:

  • blank or empty lines
  • single line comments
  • multi line comments

basically I want to remove any line containing

/* text */ 

or multi line comments

/***
some
text
*****/

If possible, another regex to check if the line is empty (Remove blank lines)

Is that possible? can somebody post to me a regex that does just that?

Thanks a lot.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

删除会话 2024-07-21 19:54:11

这应该可以将所有 /* 替换为 */。

$string = preg_replace('/(\s+)\/\*([^\/]*)\*\/(\s+)/s', "\n", $string);

This should work in replacing all /* to */.

$string = preg_replace('/(\s+)\/\*([^\/]*)\*\/(\s+)/s', "\n", $string);
橘寄 2024-07-21 19:54:11

这是一个很好的功能,并且有效!

<?
if (!defined('T_ML_COMMENT')) {
   define('T_ML_COMMENT', T_COMMENT);
} else {
   define('T_DOC_COMMENT', T_ML_COMMENT);
}
function strip_comments($source) {
    $tokens = token_get_all($source);
    $ret = "";
    foreach ($tokens as $token) {
       if (is_string($token)) {
          $ret.= $token;
       } else {
          list($id, $text) = $token;

          switch ($id) { 
             case T_COMMENT: 
             case T_ML_COMMENT: // we've defined this
             case T_DOC_COMMENT: // and this
                break;

             default:
                $ret.= $text;
                break;
          }
       }
    }    
    return trim(str_replace(array('<?','?>'),array('',''),$ret));
}
?>

现在使用此函数“strip_comments”来传递某些变量中包含的代码:

<?
$code = "
<?php 
    /* this is comment */
   // this is also a comment
   # me too, am also comment
   echo "And I am some code...";
?>";

$code = strip_comments($code);

echo htmlspecialchars($code);
?>

将结果输出为

<?
echo "And I am some code...";
?>

从 php 文件加载:

<?
$code = file_get_contents("some_code_file.php");
$code = strip_comments($code);

echo htmlspecialchars($code);
?>

加载 php 文件,剥离注释并将其保存回

<?
$file = "some_code_file.php"
$code = file_get_contents($file);
$code = strip_comments($code);

$f = fopen($file,"w");
fwrite($f,$code);
fclose($f);
?>

来源:http://www.php.net/manual/en/tokenizer.examples.php

This is a good function, and WORKS!

<?
if (!defined('T_ML_COMMENT')) {
   define('T_ML_COMMENT', T_COMMENT);
} else {
   define('T_DOC_COMMENT', T_ML_COMMENT);
}
function strip_comments($source) {
    $tokens = token_get_all($source);
    $ret = "";
    foreach ($tokens as $token) {
       if (is_string($token)) {
          $ret.= $token;
       } else {
          list($id, $text) = $token;

          switch ($id) { 
             case T_COMMENT: 
             case T_ML_COMMENT: // we've defined this
             case T_DOC_COMMENT: // and this
                break;

             default:
                $ret.= $text;
                break;
          }
       }
    }    
    return trim(str_replace(array('<?','?>'),array('',''),$ret));
}
?>

Now using this function 'strip_comments' for passing code contained in some variable:

<?
$code = "
<?php 
    /* this is comment */
   // this is also a comment
   # me too, am also comment
   echo "And I am some code...";
?>";

$code = strip_comments($code);

echo htmlspecialchars($code);
?>

Will result output as

<?
echo "And I am some code...";
?>

Loading from a php file:

<?
$code = file_get_contents("some_code_file.php");
$code = strip_comments($code);

echo htmlspecialchars($code);
?>

Loading a php file, stripping comments and saving it back

<?
$file = "some_code_file.php"
$code = file_get_contents($file);
$code = strip_comments($code);

$f = fopen($file,"w");
fwrite($f,$code);
fclose($f);
?>

Source: http://www.php.net/manual/en/tokenizer.examples.php

梓梦 2024-07-21 19:54:11

如果不习惯正则表达式,这是我的解决方案。 以下代码删除所有由 # 分隔的注释并检索此样式中变量的值 NAME=VALUE

  $reg = array();
  $handle = @fopen("/etc/chilli/config", "r");
  if ($handle) {
   while (($buffer = fgets($handle, 4096)) !== false) {
    $start = strpos($buffer,"#") ;
    $end   = strpos($buffer,"\n");
     // echo $start.",".$end;
       // echo $buffer ."<br>";



     if ($start !== false)

        $res = substr($buffer,0,$start);
    else
        $res = $buffer; 
        $a = explode("=",$res);

        if (count($a)>0)
        {
            if (count($a) == 1 && !empty($a[0]) && trim($a[0])!="")
                $reg[ $a[0] ] = "";
            else
            {
                if (!empty($a[0]) && trim($a[0])!="")
                    $reg[ $a[0] ] = $a[1];
            }
        }




    }

    if (!feof($handle)) {
        echo "Error: unexpected fgets() fail\n";
    }
    fclose($handle);
}

This is my solution , if one is not used to regexp. The following code remove all comment delimited by # and retrieves the values of variable in this style NAME=VALUE

  $reg = array();
  $handle = @fopen("/etc/chilli/config", "r");
  if ($handle) {
   while (($buffer = fgets($handle, 4096)) !== false) {
    $start = strpos($buffer,"#") ;
    $end   = strpos($buffer,"\n");
     // echo $start.",".$end;
       // echo $buffer ."<br>";



     if ($start !== false)

        $res = substr($buffer,0,$start);
    else
        $res = $buffer; 
        $a = explode("=",$res);

        if (count($a)>0)
        {
            if (count($a) == 1 && !empty($a[0]) && trim($a[0])!="")
                $reg[ $a[0] ] = "";
            else
            {
                if (!empty($a[0]) && trim($a[0])!="")
                    $reg[ $a[0] ] = $a[1];
            }
        }




    }

    if (!feof($handle)) {
        echo "Error: unexpected fgets() fail\n";
    }
    fclose($handle);
}
琉璃梦幻 2024-07-21 19:54:11

我发现这个更适合我, (\s+)\/\*([^\/]*)\*/\n* 它删除了多行、选项卡式或非注释以及间隔在其后面。 我将留下一个该正则表达式将匹配的注释示例。

/**
 * The AdditionalCategory
 * Meta informations extracted from the WSDL
 * - minOccurs : 0
 * - nillable : true
 * @var TestStructAdditionalCategorizationExternalIntegrationCUDListDataContract
 */

I found this one to suit me better, (\s+)\/\*([^\/]*)\*/\n* it removes multi-line, tabbed or not comments and the spaced behind it. I'll leave a comment example which this regex would match.

/**
 * The AdditionalCategory
 * Meta informations extracted from the WSDL
 * - minOccurs : 0
 * - nillable : true
 * @var TestStructAdditionalCategorizationExternalIntegrationCUDListDataContract
 */
一页 2024-07-21 19:54:10
$text = preg_replace('!/\*.*?\*/!s', '', $text);
$text = preg_replace('/\n\s*\n/', "\n", $text);
$text = preg_replace('!/\*.*?\*/!s', '', $text);
$text = preg_replace('/\n\s*\n/', "\n", $text);
花间憩 2024-07-21 19:54:10

请记住,如果您正在解析的文件具有包含与这些条件匹配的内容的字符串,那么您使用的任何正则表达式都将失败。 例如,它将把这个:

print "/* a comment */";

变成这个:

print "";

这可能不是您想要的。 但也许是这样,我不知道。 无论如何,正则表达式在技术上无法以某种方式解析数据来避免该问题。 我说从技术上讲是因为现代 PCRE 正则表达式添加了许多技巧,使它们能够做到这一点,更重要的是,不再是正则表达式,而是无论如何。 如果您想避免在引号内或其他情况下剥离这些内容,那么成熟的解析器是无可替代的(尽管它仍然非常简单)。

Keep in mind that any regex you use will fail if the file you're parsing has a string containing something that matches these conditions. For example, it would turn this:

print "/* a comment */";

Into this:

print "";

Which is probably not what you want. But maybe it is, I don't know. Anyway, regexes technically can't parse data in a manner to avoid that problem. I say technically because modern PCRE regexes have tacked on a number of hacks to make them both capable of doing this and, more importantly, no longer regular expressions, but whatever. If you want to avoid stripping these things inside quotes or in other situations, there is no substitute for a full-blown parser (albeit it can still be pretty simple).

简美 2024-07-21 19:54:10
//  Removes multi-line comments and does not create
//  a blank line, also treats white spaces/tabs 
$text = preg_replace('!^[ \t]*/\*.*?\*/[ \t]*[\r\n]!s', '', $text);

//  Removes single line '//' comments, treats blank characters
$text = preg_replace('![ \t]*//.*[ \t]*[\r\n]!', '', $text);

//  Strip blank lines
$text = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $text);
//  Removes multi-line comments and does not create
//  a blank line, also treats white spaces/tabs 
$text = preg_replace('!^[ \t]*/\*.*?\*/[ \t]*[\r\n]!s', '', $text);

//  Removes single line '//' comments, treats blank characters
$text = preg_replace('![ \t]*//.*[ \t]*[\r\n]!', '', $text);

//  Strip blank lines
$text = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $text);
薄情伤 2024-07-21 19:54:10
$string = preg_replace('#/\*[^*]*\*+([^/][^*]*\*+)*/#', '', $string);
$string = preg_replace('#/\*[^*]*\*+([^/][^*]*\*+)*/#', '', $string);
后eg是否自 2024-07-21 19:54:10

这是可能的,但我不会这样做。 您需要解析整个 php 文件,以确保您没有删除任何必要的空格(字符串、关键字/标识符之间的空格(publicfuntiondoStuff())等)。 最好使用 PHP 的分词器扩展

It is possible, but I wouldn't do it. You need to parse the whole php file to make sure that you're not removing any necessary whitespace (strings, whitespace beween keywords/identifiers (publicfuntiondoStuff()), etc). Better use the tokenizer extension of PHP.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文