正则表达式去除除特定前缀之后的方括号之外的所有方括号

发布于 2024-12-08 17:26:05 字数 2098 浏览 0 评论 0原文

所以,我有一根绳子。大多数时候,如果字符串中有方括号,就会发生不好的事情。然而,在少数情况下,有必要保留括号。这些需要保留的括号由某个前缀来标识。例如,如果字符串是:

苹果][s [梨]前缀:[橙子]柠檬]柿子[豌豆[ches ap]ricots [][[]]][]

我想把它变成:

苹果梨前缀:[橙子]柠檬柿子桃子杏

我想出了一个鲁布戈德堡混乱的解决方案,看起来像这样:

public class Debracketizer
{
    public static void main( String[] args )
    {
        String orig = "apples [pears] prefix:[oranges] lemons ]persimmons[ pea[ches ap]ricots";
        String result = debracketize(orig);
        System.out.println(result);
    }

    private static void debracketize( String orig )
    {
        String result1 = replaceAll(orig,
                                    Pattern.compile("\\["), 
                                    "",
                                    ".*prefix:$");

        String result2 = replaceAll(result1,
                                    Pattern.compile("\\]"),
                                    "",
                                    ".*prefix:\\[[^\\]]+$");

        System.out.println(result2);
    }

    private static String replaceAll( String orig, Pattern pattern, 
                                      String replacement, String skipPattern )
    {
        String quotedReplacement = Matcher.quoteReplacement(replacement);
        Matcher matcher = pattern.matcher(orig);
        StringBuffer sb = new StringBuffer();
        while( matcher.find() )
        {
            String resultSoFar = orig.substring(0, matcher.start());
            if (resultSoFar.matches(skipPattern)) {
                matcher.appendReplacement(sb, matcher.group());
            } else {
                matcher.appendReplacement(sb, quotedReplacement);
            }
        }
        matcher.appendTail(sb);
        return sb.toString();
    }
}

我确信一定有更好的方法来做到这一点 - 理想情况下使用一个简单的正则表达式和一个简单的 String.replaceAll()。但我一直想不出来。

(我问了这个问题的部分版本之前,但我不知道如何使答案适应完整的案例。这将教我提出部分问题。)

So, I have a string. Most of the time, if the string has square brackets in it, bad things will happen. In a few cases, however, it's necessary to keep the brackets. These brackets that need to be kept are identified by a certain prefix. E.g., if the string is:

apple][s [pears] prefix:[oranges] lemons ]persimmons[ pea[ches ap]ricots [][[]]][]

what I want to turn it into is:

apples pears prefix:[oranges] lemons persimmons peaches apricots

I've come up with a Rube Goldberg mess of a solution, which looks like this:

public class Debracketizer
{
    public static void main( String[] args )
    {
        String orig = "apples [pears] prefix:[oranges] lemons ]persimmons[ pea[ches ap]ricots";
        String result = debracketize(orig);
        System.out.println(result);
    }

    private static void debracketize( String orig )
    {
        String result1 = replaceAll(orig,
                                    Pattern.compile("\\["), 
                                    "",
                                    ".*prefix:$");

        String result2 = replaceAll(result1,
                                    Pattern.compile("\\]"),
                                    "",
                                    ".*prefix:\\[[^\\]]+$");

        System.out.println(result2);
    }

    private static String replaceAll( String orig, Pattern pattern, 
                                      String replacement, String skipPattern )
    {
        String quotedReplacement = Matcher.quoteReplacement(replacement);
        Matcher matcher = pattern.matcher(orig);
        StringBuffer sb = new StringBuffer();
        while( matcher.find() )
        {
            String resultSoFar = orig.substring(0, matcher.start());
            if (resultSoFar.matches(skipPattern)) {
                matcher.appendReplacement(sb, matcher.group());
            } else {
                matcher.appendReplacement(sb, quotedReplacement);
            }
        }
        matcher.appendTail(sb);
        return sb.toString();
    }
}

I'm sure there must be a better way to do this -- ideally with one simple regex and one simple String.replaceAll(). But I haven't been able to come up with it.

(I asked a partial version of this question earlier, but I can't see how to adapt the answer to the full case. Which will teach me to ask partial questions.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

一笑百媚生 2024-12-15 17:26:05

这一衬里:

String resultString = subjectString.replaceAll("(?<!prefix:(?:\\[\\w{0,2000000})?)[\\[\\]]", "");

当应用于:苹果][s [梨]前缀:[橙子]柠檬]柿子[豌豆[ches ap]ricots [][[]]][]

会给你你想要的结果:

apples pears prefix:[oranges] lemons persimmons peaches apricots 

你唯一的限制是 prefix:[] 之间的单词可以拥有的最大字符数。在本例中,限制为 2000000。该限制来自 java,因为它不支持负向后查找中的无限重复。

This one liner :

String resultString = subjectString.replaceAll("(?<!prefix:(?:\\[\\w{0,2000000})?)[\\[\\]]", "");

when applied to : apple][s [pears] prefix:[oranges] lemons ]persimmons[ pea[ches ap]ricots [][[]]][]

will give you the result you seek :

apples pears prefix:[oranges] lemons persimmons peaches apricots 

Your only limitation is the maximum number of character that the word between prefix:[] can have. In this case the limit is 2000000. The limitation comes from java since it does not support infinite repetition in negative lookbehind.

逆光下的微笑 2024-12-15 17:26:05

不要走正则表达式的路,因为这条路将永远让你的道路变得黑暗。考虑以下内容或其变体。根据合理的分隔符(可能是“前缀[”)分割字符串,并明智地删除其余的大括号。

这是一个即兴算法(StringUtils 是 org.apache.commons.lang.StringUtils):

  1. 按“prefix[”分割字符串。 StringUtils.splitByWholeSeparator() 似乎是一个很好的候选者(在此,返回值存储在 blam 中)。
  2. 从结果字符串中去除所有“[”字符。也许可以做 StringUtils.stripAll(blam)
  3. 对于 blam 中的每个字符串执行以下操作:
    1. 如果是第一个字符串,则删除所有“]”字符。 StringUtils.strip(blam[0], ']');。将 blam[0] 替换为该字符串。
    2. 如果不是第一个字符串,
    3. 使用分隔符“]”分割字符串(此时,返回值存储在 kapow 中)。
    4. 根据 kapow 的每个元素构造一个字符串(名为 smacky)。添加第 0 个元素后,将“]”附加到 smacky。
    5. 将 blam[index] 处的字符串替换为 smacky。
  4. 通过附加 blam 数组中的所有字符串来构造最终结果。
  5. 跳一支幸福的吉格舞。

Don't go the way of regex, for that path that will forever darken your way. Consider the following or a variation thereof. Split the string based on a reasonable seperator (maybe "prefix[") and be smart about removing the rest of the braces.

Here is an off the cuff algorithm (StringUtils is org.apache.commons.lang.StringUtils):

  1. Split the string by "prefix[". StringUtils.splitByWholeSeparator() appears to be a good candidate for this (in this, the return value is stored in blam).
  2. Strip all "[" chars from the result strings. Maybe do StringUtils.stripAll(blam)
  3. For each string in blam do the following:
    1. If the first string, strip all "]" chars. StringUtils.strip(blam[0], ']');. Replace blam[0] with this string.
    2. If not the first string,
    3. Split the string using the seperator ']' (in this, the return value is stored in kapow).
    4. Construct a string (named smacky) based on each element of kapow. After adding the 0th element append ']' to smacky.
    5. replace the string at blam[index] with smacky.
  4. Construct the final result by appending all the strings in the blam array.
  5. Dance a jig of happiness.
蓦然回首 2024-12-15 17:26:05

有趣的问题。这是一个经过测试的替代解决方案,它不使用lookbehind。

public class TEST
{
    public static void main( String[] args )
    {
        String orig = "apples [pears] prefix:[oranges] lemons ]persimmons[ pea[ches ap]ricots";
        String result = debracketize(orig);
        System.out.println(result);
    }

    private static String debracketize( String orig )
    {
        String re = // Don't indent to allow wide regex comments.
"(?x)                         # Set free-spacing mode.            \n" +
"# Either capture (and put back via replace) stuff to be kept...  \n" +
"  (                          # $1: Stuff to be kept.             \n" +
"    prefix:\\[[^\\[\\]]+\\]  # Either the special sequence,      \n" +
"  | (?:                      # or...                             \n" +
"      (?!                    # (Begin negative lookahead.)       \n" +
"        prefix:              # If this is NOT the start          \n" +
"        \\[[^\\[\\]]+\\]     # of the special sequence,          \n" +
"      )                      # (End negative lookahead.)         \n" +
"      [^\\[\\]]              # then match one non-bracket char.  \n" +
"    )+                       # Do this one char at a time.       \n" +
"  )                          # End $1: Stuff to be kept.         \n" +
"| # Or... Don't capture stuff to be removed (un-special brackets)\n" +
"  [\\[\\]]+                  # One or more non-special brackets.";
        return orig.replaceAll(re, "$1");
    }
}

此方法使用两个全局替代方案。第一个替代方案捕获(然后替换)特殊序列和非括号字符,第二个替代方案匹配(并删除)非特殊括号。

Interesting problem. Here is an alternative tested solution which does not use lookbehind.

public class TEST
{
    public static void main( String[] args )
    {
        String orig = "apples [pears] prefix:[oranges] lemons ]persimmons[ pea[ches ap]ricots";
        String result = debracketize(orig);
        System.out.println(result);
    }

    private static String debracketize( String orig )
    {
        String re = // Don't indent to allow wide regex comments.
"(?x)                         # Set free-spacing mode.            \n" +
"# Either capture (and put back via replace) stuff to be kept...  \n" +
"  (                          # $1: Stuff to be kept.             \n" +
"    prefix:\\[[^\\[\\]]+\\]  # Either the special sequence,      \n" +
"  | (?:                      # or...                             \n" +
"      (?!                    # (Begin negative lookahead.)       \n" +
"        prefix:              # If this is NOT the start          \n" +
"        \\[[^\\[\\]]+\\]     # of the special sequence,          \n" +
"      )                      # (End negative lookahead.)         \n" +
"      [^\\[\\]]              # then match one non-bracket char.  \n" +
"    )+                       # Do this one char at a time.       \n" +
"  )                          # End $1: Stuff to be kept.         \n" +
"| # Or... Don't capture stuff to be removed (un-special brackets)\n" +
"  [\\[\\]]+                  # One or more non-special brackets.";
        return orig.replaceAll(re, "$1");
    }
}

This method uses two global alternatives. The first alternative captures (and then replaces) the special sequence and non-bracket chars, and the second alternative matches (and removes) the non-special brackets.

污味仙女 2024-12-15 17:26:05

如果您有一对不担心出现在原始字符中的字符(例如 <>),那么您可以首先将您希望保留的方括号翻译成这些字符,剥离其余部分,并将翻译后的括号改回来。

这是用 ruby​​ 编写的(希望移植到 java 并不太难,您只需要使用捕获组进行全局搜索替换):

>> s = 'apple][s [pears] prefix:[oranges] lemons ]persimmons[ pea[ches ap]ricots [][[]]][]'
=> "apple][s [pears] prefix:[oranges] lemons ]persimmons[ pea[ches ap]ricots [][[]]][]"
>> s.gsub(/([^\[\]]+):\[([^\[\]]+)\]/, '\1:<\2>').gsub(/[\[\]]/,'').gsub(/</,'[').gsub(/>/,']')
=> "apples pears prefix:[oranges] lemons persimmons peaches apricots "

If you have a pair of characters that you aren't worried about appearing in the raw (such as <>), then you can first translate the square brackets you wish to keep into these, strip the remainder, and change the translated brackets back.

Here it is in ruby (porting to java hopefully isn't too hard, you just need a global search-replace with capture groups):

>> s = 'apple][s [pears] prefix:[oranges] lemons ]persimmons[ pea[ches ap]ricots [][[]]][]'
=> "apple][s [pears] prefix:[oranges] lemons ]persimmons[ pea[ches ap]ricots [][[]]][]"
>> s.gsub(/([^\[\]]+):\[([^\[\]]+)\]/, '\1:<\2>').gsub(/[\[\]]/,'').gsub(/</,'[').gsub(/>/,']')
=> "apples pears prefix:[oranges] lemons persimmons peaches apricots "
东京女 2024-12-15 17:26:05

1 找出带有 前缀:\[[^\]]+\] 的匹配

2 使用相同的正则表达式来分割字符串

3 为每个数组元素,删除 ] 或 [ (您的示例有两个元素)

4 将元素与步骤 1 中的结果连接起来。

1 find out the match(es) with prefix:\[[^\]]+\]

2 using the same regex to split the string

3 for each array element, remove ] or [ (your example has two elements)

4 join the elements with the result(s) in step 1.

甜中书 2024-12-15 17:26:05

这是您的正则表达式解决方案:

input.replaceAll("((?<!prefix:)\\[(?!oranges)|(?<!prefix:\\[oranges)\\])", "");

它使用两个负向后查找来防止删除受保护前缀周围的方括号。如果您想保护多个术语,可以通过将正则表达式中的 oranges 更改为 (oranges|apples|pears) 来实现。

这是使用您的数据进行的测试:

public static void main(String... args) throws InterruptedException {
     String input = "apple][s [pears] prefix:[oranges] lemons ]persimmons[ pea[ches ap]ricots [][[]]][]";
     String result = input.replaceAll("((?<!prefix:)\\[(?!oranges)|(?<!prefix:\\[oranges)\\])", "");
     System.out.println(result);
}

输出:

apples pears prefix:[oranges] lemons persimmons peaches apricots

Here's your regex solution:

input.replaceAll("((?<!prefix:)\\[(?!oranges)|(?<!prefix:\\[oranges)\\])", "");

It uses two negative look behinds to prevent the removal of square brackets around the protected prefix. If you wanted to protect several terms, you can do this by changing oranges to (oranges|apples|pears) in the regex.

Here's a test using your data:

public static void main(String... args) throws InterruptedException {
     String input = "apple][s [pears] prefix:[oranges] lemons ]persimmons[ pea[ches ap]ricots [][[]]][]";
     String result = input.replaceAll("((?<!prefix:)\\[(?!oranges)|(?<!prefix:\\[oranges)\\])", "");
     System.out.println(result);
}

Output:

apples pears prefix:[oranges] lemons persimmons peaches apricots
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文