如何使用Java的正则表达式查找(*注释*)?
我不知道如何使用评论中的“(”、“)”和“*”。评论是多行的。
I don't know how to work with '(', ')', and '*' that can be in comment. Comments are multiline.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
处理该问题的简单模式是:
示例:http://www.rubular.com/r/afqLCDssIx
您可能还想设置单行标志,
(?s)\(\*(.*?)\*\)
请注意,它不处理像
这样的情况(*
字符串或其他奇怪的组合。最好的选择是使用解析器,例如 ANTLR,它已经有一个现成的 Pascal 语法 (直接链接)。A simple pattern to handle that is:
Example: http://www.rubular.com/r/afqLCDssIx
You probably also want to set the single-line flag,
(?s)\(\*(.*?)\*\)
Note that is doesn't handle cases like
(*
in strings, or other weird combination. Your best bet is to use a parser, for example ANTLR, which alread has a ready Pascal grammar (direct link).如果你想找到 /* */ 示例
正则
表达式的最内部嵌套注释,将会找到
If you want to find the most inner nested comment for /* */ example
regular expression will be
this will find
关于嵌套注释的处理,虽然您确实无法使用 Java 正则表达式来匹配最外层注释,但您可以制作一个匹配最内层注释的正则表达式。 评论(有一些值得注意的例外 - 请参阅下面的警告)。 (请注意:
\(\*(.*?)\*\)
表达式在这种情况下不起作用,因为它与最里面的注释不正确匹配。)以下是经过测试的 java 程序它使用(大量注释的)正则表达式,仅匹配最里面的注释,并以迭代方式应用它来正确地去除嵌套注释:这是正则表达式的简短版本(采用本机正则表达式格式):
请注意,此正则表达式实现了 Jeffrey Friedl 的 < em>“展开循环”高效的技术并且速度相当快。 (参见:掌握正则表达式(第三版))。
注意事项:如果任何注释分隔符(即
(*
或*)
)出现在字符串文字中,这肯定无法正常工作,因此,应该不用于一般解析。但是像这样的正则表达式有时使用起来很方便——例如在编辑器中进行快速而肮脏的搜索。对于想要处理嵌套 C 的人,另请参阅我对类似问题的回答-风格评论。
Regarding the handling of nested comments, although it is true that you cannot use a Java regex to match an outermost comment, you can craft one which will match an innermost comment (with some notable exceptions - see caveats below). (Note that the:
\(\*(.*?)\*\)
expression will NOT work in this case as it does not correctly match an innermost comment.) The following is a tested java program which uses a (heavily commented) regex which matches only innermost comments, and applies this in an iterative manner to correctly strip nested comments:Here is the short version of the regex (in native regex format):
Note that this regex implements Jeffrey Friedl's "Unrolling-the-loop" efficient technique and is quite fast. (See: Mastering Regular Expressions (3rd Edition)).
Caveats: This will certainly NOT work correctly if any comment delimiter (i.e.
(*
or*)
) appears within a string literal and thus, should NOT be used for general parsing. But a regex like this one is handy to use from time to time - for quick and dirty searching within an editor for example.See also my answer to a similar question for someone wanting to handle nested C-style comments.