正则表达式去除除特定前缀之后的方括号之外的所有方括号
所以,我有一根绳子。大多数时候,如果字符串中有方括号,就会发生不好的事情。然而,在少数情况下,有必要保留括号。这些需要保留的括号由某个前缀来标识。例如,如果字符串是:
苹果][s [梨]前缀:[橙子]柠檬]柿子[豌豆[ches ap]ricots [][[]]][]
我想把它变成:
苹果梨前缀:[橙子]柠檬柿子桃子杏
我想出了一个鲁布戈德堡混乱的解决方案,看起来像这样:
public class Debracketizer
{
public static void main( String[] args )
{
String orig = "apples [pears] prefix:[oranges] lemons ]persimmons[ pea[ches ap]ricots";
String result = debracketize(orig);
System.out.println(result);
}
private static void debracketize( String orig )
{
String result1 = replaceAll(orig,
Pattern.compile("\\["),
"",
".*prefix:$");
String result2 = replaceAll(result1,
Pattern.compile("\\]"),
"",
".*prefix:\\[[^\\]]+$");
System.out.println(result2);
}
private static String replaceAll( String orig, Pattern pattern,
String replacement, String skipPattern )
{
String quotedReplacement = Matcher.quoteReplacement(replacement);
Matcher matcher = pattern.matcher(orig);
StringBuffer sb = new StringBuffer();
while( matcher.find() )
{
String resultSoFar = orig.substring(0, matcher.start());
if (resultSoFar.matches(skipPattern)) {
matcher.appendReplacement(sb, matcher.group());
} else {
matcher.appendReplacement(sb, quotedReplacement);
}
}
matcher.appendTail(sb);
return sb.toString();
}
}
我确信一定有更好的方法来做到这一点 - 理想情况下使用一个简单的正则表达式和一个简单的 String.replaceAll()
。但我一直想不出来。
(我问了这个问题的部分版本之前,但我不知道如何使答案适应完整的案例。这将教我提出部分问题。)
So, I have a string. Most of the time, if the string has square brackets in it, bad things will happen. In a few cases, however, it's necessary to keep the brackets. These brackets that need to be kept are identified by a certain prefix. E.g., if the string is:
apple][s [pears] prefix:[oranges] lemons ]persimmons[ pea[ches ap]ricots [][[]]][]
what I want to turn it into is:
apples pears prefix:[oranges] lemons persimmons peaches apricots
I've come up with a Rube Goldberg mess of a solution, which looks like this:
public class Debracketizer
{
public static void main( String[] args )
{
String orig = "apples [pears] prefix:[oranges] lemons ]persimmons[ pea[ches ap]ricots";
String result = debracketize(orig);
System.out.println(result);
}
private static void debracketize( String orig )
{
String result1 = replaceAll(orig,
Pattern.compile("\\["),
"",
".*prefix:$");
String result2 = replaceAll(result1,
Pattern.compile("\\]"),
"",
".*prefix:\\[[^\\]]+$");
System.out.println(result2);
}
private static String replaceAll( String orig, Pattern pattern,
String replacement, String skipPattern )
{
String quotedReplacement = Matcher.quoteReplacement(replacement);
Matcher matcher = pattern.matcher(orig);
StringBuffer sb = new StringBuffer();
while( matcher.find() )
{
String resultSoFar = orig.substring(0, matcher.start());
if (resultSoFar.matches(skipPattern)) {
matcher.appendReplacement(sb, matcher.group());
} else {
matcher.appendReplacement(sb, quotedReplacement);
}
}
matcher.appendTail(sb);
return sb.toString();
}
}
I'm sure there must be a better way to do this -- ideally with one simple regex and one simple String.replaceAll()
. But I haven't been able to come up with it.
(I asked a partial version of this question earlier, but I can't see how to adapt the answer to the full case. Which will teach me to ask partial questions.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
这一衬里:
当应用于:苹果][s [梨]前缀:[橙子]柠檬]柿子[豌豆[ches ap]ricots [][[]]][]
会给你你想要的结果:
你唯一的限制是 prefix:[] 之间的单词可以拥有的最大字符数。在本例中,限制为 2000000。该限制来自 java,因为它不支持负向后查找中的无限重复。
This one liner :
when applied to : apple][s [pears] prefix:[oranges] lemons ]persimmons[ pea[ches ap]ricots [][[]]][]
will give you the result you seek :
Your only limitation is the maximum number of character that the word between prefix:[] can have. In this case the limit is 2000000. The limitation comes from java since it does not support infinite repetition in negative lookbehind.
不要走正则表达式的路,因为这条路将永远让你的道路变得黑暗。考虑以下内容或其变体。根据合理的分隔符(可能是“前缀[”)分割字符串,并明智地删除其余的大括号。
这是一个即兴算法(StringUtils 是 org.apache.commons.lang.StringUtils):
StringUtils.splitByWholeSeparator()
似乎是一个很好的候选者(在此,返回值存储在 blam 中)。StringUtils.stripAll(blam)
StringUtils.strip(blam[0], ']');
。将 blam[0] 替换为该字符串。Don't go the way of regex, for that path that will forever darken your way. Consider the following or a variation thereof. Split the string based on a reasonable seperator (maybe "prefix[") and be smart about removing the rest of the braces.
Here is an off the cuff algorithm (StringUtils is org.apache.commons.lang.StringUtils):
StringUtils.splitByWholeSeparator()
appears to be a good candidate for this (in this, the return value is stored in blam).StringUtils.stripAll(blam)
StringUtils.strip(blam[0], ']');
. Replace blam[0] with this string.有趣的问题。这是一个经过测试的替代解决方案,它不使用lookbehind。
此方法使用两个全局替代方案。第一个替代方案捕获(然后替换)特殊序列和非括号字符,第二个替代方案匹配(并删除)非特殊括号。
Interesting problem. Here is an alternative tested solution which does not use lookbehind.
This method uses two global alternatives. The first alternative captures (and then replaces) the special sequence and non-bracket chars, and the second alternative matches (and removes) the non-special brackets.
如果您有一对不担心出现在原始字符中的字符(例如
<>
),那么您可以首先将您希望保留的方括号翻译成这些字符,剥离其余部分,并将翻译后的括号改回来。这是用 ruby 编写的(希望移植到 java 并不太难,您只需要使用捕获组进行全局搜索替换):
If you have a pair of characters that you aren't worried about appearing in the raw (such as
<>
), then you can first translate the square brackets you wish to keep into these, strip the remainder, and change the translated brackets back.Here it is in ruby (porting to java hopefully isn't too hard, you just need a global search-replace with capture groups):
1 找出带有
前缀:\[[^\]]+\]
的匹配2 使用相同的正则表达式来分割字符串
3 为每个数组元素,删除 ] 或 [ (您的示例有两个元素)
4 将元素与步骤 1 中的结果连接起来。
1 find out the match(es) with
prefix:\[[^\]]+\]
2 using the same regex to split the string
3 for each array element, remove ] or [ (your example has two elements)
4 join the elements with the result(s) in step 1.
这是您的正则表达式解决方案:
它使用两个负向后查找来防止删除受保护前缀周围的方括号。如果您想保护多个术语,可以通过将正则表达式中的
oranges
更改为(oranges|apples|pears)
来实现。这是使用您的数据进行的测试:
输出:
Here's your regex solution:
It uses two negative look behinds to prevent the removal of square brackets around the protected prefix. If you wanted to protect several terms, you can do this by changing
oranges
to(oranges|apples|pears)
in the regex.Here's a test using your data:
Output: