删除所有注释（单行/多行）&源文件中的空行

发布于 2025-01-01 13:05:58 字数 420 浏览 2 评论 0 原文

如何从 C# 源文件中删除所有注释和空行。请记住，可能存在嵌套注释。一些示例：

string text = @"//not a comment"; // a comment

/* multiline
comment */ string newText = "/*not a comment*/"; // a comment

/* multiline // not a comment 
/* comment */ string anotherText = "/* not a comment */ // some text here\"// not a comment"; // a comment

我们可以拥有比上面三个示例更复杂的源。有人可以建议一种正则表达式模式或其他方法来解决这个问题吗？我已经在互联网上浏览了很多东西，但找不到任何有用的东西。

原文

How can I remove all comments and blank lines from a C# source file. Have in mind that there could be a nested comments. Some examples:

string text = @"//not a comment"; // a comment

/* multiline
comment */ string newText = "/*not a comment*/"; // a comment

/* multiline // not a comment 
/* comment */ string anotherText = "/* not a comment */ // some text here\"// not a comment"; // a comment

We can have much more complex source than those three examples above.
Can some one suggest a regex pattern or other way to solve this. I've already browsed a lot a stuff over the internet and coudn't find anything that works.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

笔落惊风雨 2025-01-08 13:05:58

您可以使用此答案中的函数：

static string StripComments(string code)
{
    var re = @"(@(?:""[^""]*"")+|""(?:[^""\n\\]+|\\.)*""|'(?:[^'\n\\]+|\\.)*')|//.*|/\*(?s:.*?)\*/";
    return Regex.Replace(code, re, "$1");
}

然后删除空行。

You could use the function in this answer:

static string StripComments(string code)
{
    var re = @"(@(?:""[^""]*"")+|""(?:[^""\n\\]+|\\.)*""|'(?:[^'\n\\]+|\\.)*')|//.*|/\*(?s:.*?)\*/";
    return Regex.Replace(code, re, "$1");
}

And then remove empty lines.

回复收藏 0 原文

蹲在坟头点根烟 2025-01-08 13:05:58

不幸的是，在没有边缘情况的情况下，使用正则表达式确实很难可靠地做到这一点。我没有调查得很远，但您也许可以使用 Visual Studio 语言服务来解析注释。

回复收藏 0 原文

罪#恶を代价 2025-01-08 13:05:58

如果您想使用正则表达式识别注释，那么您确实需要使用正则表达式作为标记器。即，它识别并提取字符串中的第一个内容，无论该内容是字符串文字、注释还是既不是字符串文字也不是注释的内容块。然后，您抓住字符串的其余部分，并从开头拉出下一个标记。

这可以帮助您解决上下文问题。如果您只是想查找字符串中间的内容，则没有好方法来识别特定的“注释”是否在字符串文字内 - 事实上，很难识别字符串文字在哪里首先，因为像 \" 这样的东西。但是如果你总是取字符串中的第一个东西，很容易说“哦，字符串以 ”开头 ，所以直到下一个都没有转义“ is more string.” 上下文会自行处理。

因此您需要三个正则表达式：

一个标识从字符串开头开始的注释（//）。或 /* 注释）。
它标识从字符串开头开始的字符串文字。请记住检查 " 和 @"。 > 字符串；每个都有其自己的边缘情况，
可以识别不属于其中任何一个的情况。上面的内容，并匹配，直到第一个可能是注释或字符串的东西为止，

编写实际的正则表达式模式留给读者作为练习，因为编写和测试需要几个小时。这一切我都不愿意免费做。（笑）但是如果你对正则表达式有很好的理解（或者有一个像 StackOverflow 这样的地方可以在你遇到困难时询问特定问题）并且愿意为你的代码编写一堆自动化测试，那么这当然是可行的。不过，请注意最后一个（“任何其他”）情况 - 如果后面跟着 "，则您希望在 @ 之前停止，但如果它是@ 转义关键字以用作标识符。