如何使用Java的正则表达式查找(*注释*)?

发布于 2024-10-26 10:57:46 字数 38 浏览 1 评论 0原文

我不知道如何使用评论中的“(”、“)”和“*”。评论是多行的。

I don't know how to work with '(', ')', and '*' that can be in comment. Comments are multiline.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

不美如何 2024-11-02 10:57:46

处理该问题的简单模式是:

\(\*(.*?)\*\)

示例:http://www.rubular.com/r/afqLCDssIx

您可能还想设置单行标志, (?s)\(\*(.*?)\*\)

请注意,它不处理像 这样的情况(* 字符串或其他奇怪的组合。最好的选择是使用解析器,例如 ANTLR,它已经有一个现成的 Pascal 语法 (直接链接)。

A simple pattern to handle that is:

\(\*(.*?)\*\)

Example: http://www.rubular.com/r/afqLCDssIx

You probably also want to set the single-line flag, (?s)\(\*(.*?)\*\)

Note that is doesn't handle cases like (* in strings, or other weird combination. Your best bet is to use a parser, for example ANTLR, which alread has a ready Pascal grammar (direct link).

挽你眉间 2024-11-02 10:57:46

如果你想找到 /* */ 示例

/* 
/*
comment1
/*
comment2
*/
*/
*/

正则

\/\*[^/*]*(?:(?!\/\*|\*\/)[/*][^/*]*)*\*\/

表达式的最内部嵌套注释,将会找到

/*
comment2
*/

If you want to find the most inner nested comment for /* */ example

/* 
/*
comment1
/*
comment2
*/
*/
*/

regular expression will be

\/\*[^/*]*(?:(?!\/\*|\*\/)[/*][^/*]*)*\*\/

this will find

/*
comment2
*/
原来分手还会想你 2024-11-02 10:57:46

关于嵌套注释的处理,虽然您确实无法使用 Java 正则表达式来匹配最外层注释,但您可以制作一个匹配最内层注释的正则表达式。 评论(有一些值得注意的例外 - 请参阅下面的警告)。 (请注意:\(\*(.*?)\*\) 表达式在这种情况下不起作用,因为它与最里面的注释不正确匹配。)以下是经过测试的 java 程序它使用(大量注释的)正则表达式,仅匹配最里面的注释,并以迭代方式应用它来正确地去除嵌套注释:

public class TEST {
    public static void main(String[] args) {
        String subjectString = "out1 (* c1 *) out2 (* c2 (* c3 *) c2 *) out3";
        String regex = "" +
            "# Match an innermost pascal '(*...*)' style comment.\n" +
            "\\(\\*      # Comment opening literal delimiter.\n" +
            "[^(*]*      # {normal*} Zero or more non'(', non-'*'.\n" +
            "(?:         # Begin {(special normal*)*} construct.\n" +
            "  (?!       # If we are not at the start of either...\n" +
            "    \\(\\*  # a nested comment\n" +
            "  | \\*\\)  # or the end of this comment,\n" +
            "  ) [(*]    # then ok to match a '(' or '*'.\n" +
            "  [^(*]*    # more {normal*}.\n" +
            ")*          # end {(special normal*)*} construct.\n" +
            "\\*\\)      # Comment closing literal delimiter.";
        String resultString = null;
        java.util.regex.Pattern p = java.util.regex.Pattern.compile(
                    regex,
                    java.util.regex.Pattern.COMMENTS);
        java.util.regex.Matcher m = p.matcher(subjectString);
        while (m.find())
        { // Iterate until there are no more "(* comments *)".
            resultString = m.replaceAll("");
            m = p.matcher(resultString);
        }
        System.out.println(resultString);
    }
}

这是正则表达式的简短版本(采用本机正则表达式格式):

\(\*[^(*]*(?:(?!\(\*|\*\))[(*][^(*]*)*\*\)

请注意,此正则表达式实现了 Jeffrey Friedl 的 < em>“展开循环”高效的技术并且速度相当快。 (参见:掌握正则表达式(第三版))。

注意事项:如果任何注释分隔符(即 (**))出现在字符串文字中,这肯定无法正常工作,因此,应该不用于一般解析。但是像这样的正则表达式有时使用起来很方便——例如在编辑器中进行快速而肮脏的搜索。

对于想要处理嵌套 C 的人,另请参阅我对类似问题的回答-风格评论。

Regarding the handling of nested comments, although it is true that you cannot use a Java regex to match an outermost comment, you can craft one which will match an innermost comment (with some notable exceptions - see caveats below). (Note that the: \(\*(.*?)\*\) expression will NOT work in this case as it does not correctly match an innermost comment.) The following is a tested java program which uses a (heavily commented) regex which matches only innermost comments, and applies this in an iterative manner to correctly strip nested comments:

public class TEST {
    public static void main(String[] args) {
        String subjectString = "out1 (* c1 *) out2 (* c2 (* c3 *) c2 *) out3";
        String regex = "" +
            "# Match an innermost pascal '(*...*)' style comment.\n" +
            "\\(\\*      # Comment opening literal delimiter.\n" +
            "[^(*]*      # {normal*} Zero or more non'(', non-'*'.\n" +
            "(?:         # Begin {(special normal*)*} construct.\n" +
            "  (?!       # If we are not at the start of either...\n" +
            "    \\(\\*  # a nested comment\n" +
            "  | \\*\\)  # or the end of this comment,\n" +
            "  ) [(*]    # then ok to match a '(' or '*'.\n" +
            "  [^(*]*    # more {normal*}.\n" +
            ")*          # end {(special normal*)*} construct.\n" +
            "\\*\\)      # Comment closing literal delimiter.";
        String resultString = null;
        java.util.regex.Pattern p = java.util.regex.Pattern.compile(
                    regex,
                    java.util.regex.Pattern.COMMENTS);
        java.util.regex.Matcher m = p.matcher(subjectString);
        while (m.find())
        { // Iterate until there are no more "(* comments *)".
            resultString = m.replaceAll("");
            m = p.matcher(resultString);
        }
        System.out.println(resultString);
    }
}

Here is the short version of the regex (in native regex format):

\(\*[^(*]*(?:(?!\(\*|\*\))[(*][^(*]*)*\*\)

Note that this regex implements Jeffrey Friedl's "Unrolling-the-loop" efficient technique and is quite fast. (See: Mastering Regular Expressions (3rd Edition)).

Caveats: This will certainly NOT work correctly if any comment delimiter (i.e. (* or *)) appears within a string literal and thus, should NOT be used for general parsing. But a regex like this one is handy to use from time to time - for quick and dirty searching within an editor for example.

See also my answer to a similar question for someone wanting to handle nested C-style comments.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文