当前位置：文江博客话题详情

如何使用Java的正则表达式查找(注释)？

发布于 2024-10-26 10:57:46 字数 38 浏览 8 评论 0原文

我不知道如何使用评论中的“(”、“)”和“*”。评论是多行的。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不美如何 2024-11-02 10:57:46

处理该问题的简单模式是：

\(\*(.*?)\*\)

示例：http://www.rubular.com/r/afqLCDssIx

您可能还想设置单行标志， (?s)\(\*(.*?)\*\)

请注意，它不处理像 这样的情况(* 字符串或其他奇怪的组合。最好的选择是使用解析器，例如 ANTLR，它已经有一个现成的 Pascal 语法 (直接链接）。

A simple pattern to handle that is:

\(\*(.*?)\*\)

Example: http://www.rubular.com/r/afqLCDssIx

You probably also want to set the single-line flag, (?s)\(\*(.*?)\*\)

Note that is doesn't handle cases like (* in strings, or other weird combination. Your best bet is to use a parser, for example ANTLR, which alread has a ready Pascal grammar (direct link).

回复收藏 0 原文

挽你眉间 2024-11-02 10:57:46

如果你想找到 /* */ 示例

/* 
/*
comment1
/*
comment2
*/
*/
*/

正则

\/\*[^/*]*(?:(?!\/\*|\*\/)[/*][^/*]*)*\*\/

表达式的最内部嵌套注释，将会找到

/*
comment2
*/

If you want to find the most inner nested comment for /* */ example

/* 
/*
comment1
/*
comment2
*/
*/
*/

regular expression will be

\/\*[^/*]*(?:(?!\/\*|\*\/)[/*][^/*]*)*\*\/

this will find

/*
comment2
*/

回复收藏 0 原文

原来分手还会想你 2024-11-02 10:57:46

关于嵌套注释的处理，虽然您确实无法使用 Java 正则表达式来匹配最外层注释，但您可以制作一个匹配最内层注释的正则表达式。 评论（有一些值得注意的例外 - 请参阅下面的警告）。（请注意：\(\*(.*?)\*\) 表达式在这种情况下不起作用，因为它与最里面的注释不正确匹配。）以下是经过测试的 java 程序它使用（大量注释的）正则表达式，仅匹配最里面的注释，并以迭代方式应用它来正确地去除嵌套注释：

public class TEST {
    public static void main(String[] args) {
        String subjectString = "out1 (* c1 *) out2 (* c2 (* c3 *) c2 *) out3";
        String regex = "" +
            "# Match an innermost pascal '(*...*)' style comment.\n" +
            "\\(\\*      # Comment opening literal delimiter.\n" +
            "[^(*]*      # {normal*} Zero or more non'(', non-'*'.\n" +
            "(?:         # Begin {(special normal*)*} construct.\n" +
            "  (?!       # If we are not at the start of either...\n" +
            "    \\(\\*  # a nested comment\n" +
            "  | \\*\\)  # or the end of this comment,\n" +
            "  ) [(*]    # then ok to match a '(' or '*'.\n" +
            "  [^(*]*    # more {normal*}.\n" +
            ")*          # end {(special normal*)*} construct.\n" +
            "\\*\\)      # Comment closing literal delimiter.";
        String resultString = null;
        java.util.regex.Pattern p = java.util.regex.Pattern.compile(
                    regex,
                    java.util.regex.Pattern.COMMENTS);
        java.util.regex.Matcher m = p.matcher(subjectString);
        while (m.find())
        { // Iterate until there are no more "(* comments *)".
            resultString = m.replaceAll("");
            m = p.matcher(resultString);
        }
        System.out.println(resultString);
    }
}

这是正则表达式的简短版本（采用本机正则表达式格式）：

\(\*[^(*]*(?:(?!\(\*|\*\))[(*][^(*]*)*\*\)

请注意，此正则表达式实现了 Jeffrey Friedl 的 < em>“展开循环”高效的技术并且速度相当快。（参见：掌握正则表达式（第三版））。

注意事项：如果任何注释分隔符（即 (* 或 *)）出现在字符串文字中，这肯定无法正常工作，因此，应该不用于一般解析。但是像这样的正则表达式有时使用起来很方便——例如在编辑器中进行快速而肮脏的搜索。

对于想要处理嵌套 C 的人，另请参阅我对类似问题的回答-风格评论。

Regarding the handling of nested comments, although it is true that you cannot use a Java regex to match an outermost comment, you can craft one which will match an innermost comment (with some notable exceptions - see caveats below). (Note that the: \(\*(.*?)\*\) expression will NOT work in this case as it does not correctly match an innermost comment.) The following is a tested java program which uses a (heavily commented) regex which matches only innermost comments, and applies this in an iterative manner to correctly strip nested comments:

public class TEST {
    public static void main(String[] args) {
        String subjectString = "out1 (* c1 *) out2 (* c2 (* c3 *) c2 *) out3";
        String regex = "" +
            "# Match an innermost pascal '(*...*)' style comment.\n" +
            "\\(\\*      # Comment opening literal delimiter.\n" +
            "[^(*]*      # {normal*} Zero or more non'(', non-'*'.\n" +
            "(?:         # Begin {(special normal*)*} construct.\n" +
            "  (?!       # If we are not at the start of either...\n" +
            "    \\(\\*  # a nested comment\n" +
            "  | \\*\\)  # or the end of this comment,\n" +
            "  ) [(*]    # then ok to match a '(' or '*'.\n" +
            "  [^(*]*    # more {normal*}.\n" +
            ")*          # end {(special normal*)*} construct.\n" +
            "\\*\\)      # Comment closing literal delimiter.";
        String resultString = null;
        java.util.regex.Pattern p = java.util.regex.Pattern.compile(
                    regex,
                    java.util.regex.Pattern.COMMENTS);
        java.util.regex.Matcher m = p.matcher(subjectString);
        while (m.find())
        { // Iterate until there are no more "(* comments *)".
            resultString = m.replaceAll("");
            m = p.matcher(resultString);
        }
        System.out.println(resultString);
    }
}

Here is the short version of the regex (in native regex format):

\(\*[^(*]*(?:(?!\(\*|\*\))[(*][^(*]*)*\*\)

Note that this regex implements Jeffrey Friedl's "Unrolling-the-loop" efficient technique and is quite fast. (See: Mastering Regular Expressions (3rd Edition)).

Caveats: This will certainly NOT work correctly if any comment delimiter (i.e. (* or *)) appears within a string literal and thus, should NOT be used for general parsing. But a regex like this one is handy to use from time to time - for quick and dirty searching within an editor for example.

See also my answer to a similar question for someone wanting to handle nested C-style comments.

回复收藏 0 原文

~没有更多了~