当前位置：文江博客话题详情

正则表达式问题 - 引号内的文本块之外的一个或多个空格

发布于 2024-07-08 13:25:05 字数 103 浏览 6 评论 0原文

我想将任何出现的多个空格替换为单个空格，但在引号之间的文本中不采取任何操作。

有没有办法用 Java 正则表达式来做到这一点？如果是这样，您可以尝试一下或给我提示吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

几度春秋 2024-07-15 13:25:05

这是另一种方法，它使用前瞻来确定当前位置之后的所有引号都成对出现。

text = text.replaceAll("  ++(?=(?:[^\"]*+\"[^\"]*+\")*+[^\"]*+$)", " ");

如果需要，可以调整前瞻以处理引用部分内的转义引号。

Here's another approach, that uses a lookahead to determine that all quotation marks after the current position come in matched pairs.

text = text.replaceAll("  ++(?=(?:[^\"]*+\"[^\"]*+\")*+[^\"]*+$)", " ");

If needed, the lookahead can be adapted to handle escaped quotation marks inside the quoted sections.

回复收藏 0 原文

握住我的手 2024-07-15 13:25:05

当尝试匹配可以包含在其他内容中的内容时，构造一个与两者都匹配的正则表达式会很有帮助，如下所示：

("[^"\\]*(?:\\.[^"\\]*)*")|(  +)

这将匹配带引号的字符串或两个或多个空格。因为两个表达式是组合在一起的，所以它将匹配带引号的字符串或两个或多个空格，但不匹配引号内的空格。使用此表达式，您需要检查每个匹配项以确定它是带引号的字符串还是两个或更多空格，并采取相应的操作：

Pattern spaceOrStringRegex = Pattern.compile( "(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")|(  +)" );

StringBuffer replacementBuffer = new StringBuffer();

Matcher spaceOrStringMatcher = spaceOrStringRegex.matcher( text );

while ( spaceOrStringMatcher.find() ) 
{
    // if the space group is the match
    if ( spaceOrStringMatcher.group( 2 ) != null ) 
    {
        // replace with a single space
        spaceOrStringMatcher.appendReplacement( replacementBuffer, " " );
    }
}

spaceOrStringMatcher.appendTail( replacementBuffer );

When trying to match something that can be contained within something else, it can be helpful to construct a regular expression that matches both, like this:

("[^"\\]*(?:\\.[^"\\]*)*")|(  +)

This will match a quoted string or two or more spaces. Because the two expressions are combined, it will match a quoted string OR two or more spaces, but not spaces within quotes. Using this expression, you will need to examine each match to determine if it is a quoted string or two or more spaces and act accordingly:

Pattern spaceOrStringRegex = Pattern.compile( "(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")|(  +)" );

StringBuffer replacementBuffer = new StringBuffer();

Matcher spaceOrStringMatcher = spaceOrStringRegex.matcher( text );

while ( spaceOrStringMatcher.find() ) 
{
    // if the space group is the match
    if ( spaceOrStringMatcher.group( 2 ) != null ) 
    {
        // replace with a single space
        spaceOrStringMatcher.appendReplacement( replacementBuffer, " " );
    }
}

spaceOrStringMatcher.appendTail( replacementBuffer );

回复收藏 0 原文

橘寄 2024-07-15 13:25:05

引号之间的文本：引号是在同一行还是多行内？

回复收藏 0 原文

未央 2024-07-15 13:25:05

将其标记化并在标记之间发出一个空格。快速谷歌搜索“处理引号的 java tokenizer”出现：
此链接

YMMV

编辑：所以没有就像那个链接一样。这是谷歌搜索链接：谷歌。这是第一个结果。

回复收藏 0 原文

猥琐帝 2024-07-15 13:25:05

就我个人而言，我不使用 Java，但是这个 RegExp 可以解决这个问题：

([^\" ])*(\\\".*?\\\")*

尝试使用 RegExBuddy 的表达式，它会生成以下代码，对我来说看起来不错：

try {
    Pattern regex = Pattern.compile("([^\" ])*(\\\".*?\\\")*", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
    Matcher regexMatcher = regex.matcher(subjectString);
    while (regexMatcher.find()) {
        for (int i = 1; i <= regexMatcher.groupCount(); i++) {
            // matched text: regexMatcher.group(i)
            // match start: regexMatcher.start(i)
            // match end: regexMatcher.end(i)

            // I suppose here you must use something like
            // sstr += regexMatcher.group(i) + " "
        }
    }
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

至少，它似乎在 Python 中工作正常：

import re

text = """
este  es   un texto de   prueba "para ver  como se comporta  " la funcion   sobre esto
"para ver  como se comporta  " la funcion   sobre esto  "o sobre otro" lo q sea
"""

ret = ""
print text  

reobj = re.compile(r'([^\" ])*(\".*?\")*', re.IGNORECASE)

for match in reobj.finditer(text):
    if match.group() <> "":
        ret = ret + match.group() + "|"

print ret

Personally, I don't use Java, but this RegExp could do the trick:

([^\" ])*(\\\".*?\\\")*

Trying the expression with RegExBuddy, it generates this code, looks fine to me:

try {
    Pattern regex = Pattern.compile("([^\" ])*(\\\".*?\\\")*", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
    Matcher regexMatcher = regex.matcher(subjectString);
    while (regexMatcher.find()) {
        for (int i = 1; i <= regexMatcher.groupCount(); i++) {
            // matched text: regexMatcher.group(i)
            // match start: regexMatcher.start(i)
            // match end: regexMatcher.end(i)

            // I suppose here you must use something like
            // sstr += regexMatcher.group(i) + " "
        }
    }
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

At least, it seems to work fine in Python:

import re

text = """
este  es   un texto de   prueba "para ver  como se comporta  " la funcion   sobre esto
"para ver  como se comporta  " la funcion   sobre esto  "o sobre otro" lo q sea
"""

ret = ""
print text  

reobj = re.compile(r'([^\" ])*(\".*?\")*', re.IGNORECASE)

for match in reobj.finditer(text):
    if match.group() <> "":
        ret = ret + match.group() + "|"

print ret

回复收藏 0 原文

迟月 2024-07-15 13:25:05

解析出引用的内容后，根据需要批量或逐段运行其余内容：

String text = "ABC   DEF GHI   JKL";
text = text.replaceAll("( )+", " ");
// text: "ABC DEF GHI JKL"

After you parse out the quoted content, run this on the rest, in bulk or piece by piece as necessary:

String text = "ABC   DEF GHI   JKL";
text = text.replaceAll("( )+", " ");
// text: "ABC DEF GHI JKL"

回复收藏 0 原文

听闻余生 2024-07-15 13:25:05

Jeff，您的方向是正确的，但是您的代码中有一些错误，即：（1）您忘记转义否定字符类中的引号； (2) 第一个捕获组内的括号应该是非捕获类型； (3) 如果第二组捕获括号不参与匹配，group(2) 返回 null，并且您不会对此进行测试； (4) 如果您在正则表达式中测试两个或更多 空格而不是一个或多个，则稍后无需检查匹配的长度。这是修改后的代码：

import java.util.regex.*;

public class Test
{
  public static void main(String[] args) throws Exception
  {
    String text = "blah    blah  \"boo   boo boo\"  blah  blah";
    Pattern p = Pattern.compile( "(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")|(  +)" );
    StringBuffer sb = new StringBuffer();
    Matcher m = p.matcher( text );
    while ( m.find() ) 
    {
      if ( m.group( 2 ) != null ) 
      {
        m.appendReplacement( sb, " " );
      }
    }
    m.appendTail( sb );
    System.out.println( sb.toString() );
  }
}

Jeff, you're on the right track, but there are a few errors in your code, to wit: (1) You forgot to escape the quotation marks inside the negated character classes; (2) The parens inside the first capturing group should have been of the non-capturing variety; (3) If the second set of capturing parens doesn't participate in a match, group(2) returns null, and you're not testing for that; and (4) If you test for two or more spaces in the regex instead of one or more, you don't need to check the length of the match later on. Here's the revised code:

import java.util.regex.*;

public class Test
{
  public static void main(String[] args) throws Exception
  {
    String text = "blah    blah  \"boo   boo boo\"  blah  blah";
    Pattern p = Pattern.compile( "(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")|(  +)" );
    StringBuffer sb = new StringBuffer();
    Matcher m = p.matcher( text );
    while ( m.find() ) 
    {
      if ( m.group( 2 ) != null ) 
      {
        m.appendReplacement( sb, " " );
      }
    }
    m.appendTail( sb );
    System.out.println( sb.toString() );
  }
}

回复收藏 0 原文

~没有更多了~