正则表达式问题 - 引号内的文本块之外的一个或多个空格
我想将任何出现的多个空格替换为单个空格,但在引号之间的文本中不采取任何操作。
有没有办法用 Java 正则表达式来做到这一点? 如果是这样,您可以尝试一下或给我提示吗?
I want to be replace any occurrence of more than one space with a single space, but take no action in text between quotes.
Is there any way of doing this with a Java regex? If so, can you please attempt it or give me a hint?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
这是另一种方法,它使用前瞻来确定当前位置之后的所有引号都成对出现。
如果需要,可以调整前瞻以处理引用部分内的转义引号。
Here's another approach, that uses a lookahead to determine that all quotation marks after the current position come in matched pairs.
If needed, the lookahead can be adapted to handle escaped quotation marks inside the quoted sections.
当尝试匹配可以包含在其他内容中的内容时,构造一个与两者都匹配的正则表达式会很有帮助,如下所示:
这将匹配带引号的字符串或两个或多个空格。 因为两个表达式是组合在一起的,所以它将匹配带引号的字符串或两个或多个空格,但不匹配引号内的空格。 使用此表达式,您需要检查每个匹配项以确定它是带引号的字符串还是两个或更多空格,并采取相应的操作:
When trying to match something that can be contained within something else, it can be helpful to construct a regular expression that matches both, like this:
This will match a quoted string or two or more spaces. Because the two expressions are combined, it will match a quoted string OR two or more spaces, but not spaces within quotes. Using this expression, you will need to examine each match to determine if it is a quoted string or two or more spaces and act accordingly:
引号之间的文本:引号是在同一行还是多行内?
text between quotes : Are the quotes within the same line or multiple lines ?
将其标记化并在标记之间发出一个空格。 快速谷歌搜索“处理引号的 java tokenizer”出现:
此链接
YMMV
编辑:所以没有就像那个链接一样。 这是谷歌搜索链接: 谷歌。 这是第一个结果。
Tokenize it and emit a single space between tokens. A quick google for "java tokenizer that handles quotes" turned up:
this link
YMMV
edit: SO didn't like that link. Here's the google search link: google. It was the first result.
就我个人而言,我不使用 Java,但是这个 RegExp 可以解决这个问题:
尝试使用 RegExBuddy 的表达式,它会生成以下代码,对我来说看起来不错:
至少,它似乎在 Python 中工作正常:
Personally, I don't use Java, but this RegExp could do the trick:
Trying the expression with RegExBuddy, it generates this code, looks fine to me:
At least, it seems to work fine in Python:
解析出引用的内容后,根据需要批量或逐段运行其余内容:
After you parse out the quoted content, run this on the rest, in bulk or piece by piece as necessary:
Jeff,您的方向是正确的,但是您的代码中有一些错误,即:(1)您忘记转义否定字符类中的引号; (2) 第一个捕获组内的括号应该是非捕获类型; (3) 如果第二组捕获括号不参与匹配,
group(2)
返回 null,并且您不会对此进行测试; (4) 如果您在正则表达式中测试两个或更多 空格而不是一个或多个,则稍后无需检查匹配的长度。 这是修改后的代码:Jeff, you're on the right track, but there are a few errors in your code, to wit: (1) You forgot to escape the quotation marks inside the negated character classes; (2) The parens inside the first capturing group should have been of the non-capturing variety; (3) If the second set of capturing parens doesn't participate in a match,
group(2)
returns null, and you're not testing for that; and (4) If you test for two or more spaces in the regex instead of one or more, you don't need to check the length of the match later on. Here's the revised code: