面向开发人员的正则表达式

发布于 2024-07-05 03:46:37 字数 200 浏览 13 评论 0原文

我一直在尝试找出一个正则表达式,以允许我在自动跳过注释的同时搜索特定字符串。 有人有这样的 RE 或知道这样的 RE 吗? 它甚至不需要足够复杂来跳过 #if 0 块; 我只是希望它跳过 ///* 块。 相反,即仅在注释块内搜索,也将非常有用。

环境:VS 2003

I've been trying to figure out a regex to allow me to search for a particular string while automatically skipping comments. Anyone have an RE like this or know of one? It doesn't even need to be sophisticated enough to skip #if 0 blocks; I just want it to skip over // and /* blocks. The converse, that is only search inside comment blocks, would be very useful too.

Environment: VS 2003

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

这是一个比乍一看更难的问题,因为您需要考虑字符串内的注释标记、本身被注释掉的注释标记等。

我为 C# 编写了一个字符串和注释解析器,让我看看是否可以挖掘出一些东西这会有所帮助...如果我发现任何东西,我会更新。

编辑:
...好吧,所以我找到了我的旧“codemasker”项目。 事实证明,我是分阶段完成的,而不是使用单个正则表达式。 基本上,我会慢慢浏览源文件,寻找起始标记,当我找到一个结束标记时,我就会寻找结束标记并掩盖其间的所有内容。 这考虑了开始标记的上下文...如果您找到“字符串开始”的标记,那么您可以安全地忽略注释标记,直到找到字符串的结尾,反之亦然。 一旦代码被屏蔽(我使用 guid 作为屏蔽,并使用哈希表来跟踪),那么您就可以安全地进行搜索和替换,然后最终恢复屏蔽的代码。

希望有帮助。

This is a harder problem than it might at first appear, since you need to consider comment tokens inside strings, comment tokens that are themselves commented out etc.

I wrote a string and comment parser for C#, let me see if I can dig out something that will help... I'll update if I find anything.

EDIT:
... ok, so I found my old 'codemasker' project. Turns out that I did this in stages, not with a single regex. Basically I inch through a source file looking for start tokens, when I find one I then look for an end-token and mask everything in between. This takes into account the context of the start token... if you find a token for "string start" then you can safely ignore comment tokens until you find the end of the string, and vice versa. Once the code is masked (I used guids as masks, and a hashtable to keep track) then you can safely do your search and replace, then finally restore the masked code.

Hope that helps.

眼波传意 2024-07-12 03:46:37

对于字符串要特别小心。 字符串通常具有转义序列,当您找到它们的结尾时也必须遵守这些转义序列。

例如“这是一个测试”。 你不能盲目地寻找双引号来终止。 还要注意``“This is \”`,它表明你不能只说“除非双引号前面有反斜杠”。

总之,进行一些残酷的单元测试!

Be especially careful with strings. Strings often have escape sequences which you also have to respect while you're finding the end of them.

So e.g. "This is \"a test\"". You cannot blindly look for a double-quote to terminate. Also beware of ``"This is \"`, which shows that you cannot just say "unless double-quote is preceded by a backslash."

In summary, make some brutal unit tests!

歌枕肩 2024-07-12 03:46:37

正则表达式并不是完成这项工作的最佳工具。

Perl 常见问题解答

C 注释:

#!/usr/bin/perl
$/ = undef;
$_ = <>; 

s#/\*[^*]*\*+([^/*][^*]*\*+)*/|([^/"']*("[^"\\]*(\\[\d\D][^"\\]*)*"[^/"']*|'[^'\\]*(\\[\d\D][^'\\]*)*'[^/"']*|/+[^*/][^/"']*)*)#$2#g;
print; 

C++ 注释:

#!/usr/local/bin/perl
$/ = undef;
$_ = <>;

s#//(.*)|/\*[^*]*\*+([^/*][^*]*\*+)*/|"(\\.|[^"\\])*"|'(\\.|[^'\\])*'|[^/"']+#  $1 ? "/*$1 */" : 
amp; #ge;
print;

A regexp is not the best tool for the job.

Perl FAQ:

C comments:

#!/usr/bin/perl
$/ = undef;
$_ = <>; 

s#/\*[^*]*\*+([^/*][^*]*\*+)*/|([^/"']*("[^"\\]*(\\[\d\D][^"\\]*)*"[^/"']*|'[^'\\]*(\\[\d\D][^'\\]*)*'[^/"']*|/+[^*/][^/"']*)*)#$2#g;
print; 

C++ comments:

#!/usr/local/bin/perl
$/ = undef;
$_ = <>;

s#//(.*)|/\*[^*]*\*+([^/*][^*]*\*+)*/|"(\\.|[^"\\])*"|'(\\.|[^'\\])*'|[^/"']+#  $1 ? "/*$1 */" : 
amp; #ge;
print;
娇柔作态 2024-07-12 03:46:37

我会先复制一份并删除注释,然后以常规方式搜索字符串。

I would make a copy and strip out the comments first, then search the string the regular way.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文