分割字符串(尤其是在 Java 中使用 java.util.regex 或其他东西)
有谁知道如何在考虑到转义序列的情况下拆分字符上的字符串?
例如,如果字符为“:”,则“a:b”将被拆分为两部分(“a”和“b”),而“a:b”则根本不会拆分。
我认为用正则表达式很难(不可能?)做到这一点。
提前谢谢你,
基达尔
Does anyone know how to split a string on a character taking into account its escape sequence?
For example, if the character is ':', "a:b" is split into two parts ("a" and "b"), whereas "a:b" is not split at all.
I think this is hard (impossible?) to do with regular expressions.
Thank you in advance,
Kedar
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
由于 Java 支持可变长度后向查找(只要它们是有限的),您可以这样做:
(?<=(? 在后面查找偶数个反斜杠(包括零个,最多 10 个)。
输出:
另一种方法是匹配各部分本身,而不是在分隔符处拆分。
奇怪的语法源于它需要处理字符串开头和结尾处的空片段的情况。 当匹配恰好跨越零个字符时,下一次尝试将从其末尾的一个字符开始。 如果没有,它将匹配另一个空字符串,另一个空字符串,无限…
(?<=\A|\G:)
将在后面查找字符串的开头(第一段),或上一个匹配的结尾,后跟分隔符。 如果我们这样做(?:\A|\G:)
,如果第一部分为空(输入以分隔符开头),则会失败。\\.
匹配任何转义字符。[^:\\]
匹配不在转义序列中的任何字符(因为\\.
消耗了这两个字符)。((?:\\.|[^:\\])*)
将第一个非转义分隔符之前的所有字符捕获到捕获组 1 中。Since Java supports variable-length look-behinds (as long as they are finite), you could do do it like this:
(?<=(?<!\\)(?:\\\\){0,10})
looks behind for an even number of back-slashes (including zero, up to a maximum of 10).Output:
Another way would be to match the parts themselves, instead of split at the delimiters.
The strange syntax stems from that it need to handle the case of empty pieces at the start and end of the string. When a match spans exactly zero characters, the next attempt will start one character past the end of it. If it didn't, it would match another empty string, and another, ad infinitum…
(?<=\A|\G:)
will look behind for either the start of the string (the first piece), or the end of the previous match, followed by the separator. If we did(?:\A|\G:)
, it would fail if the first piece is empty (input starts with a separator).\\.
matches any escaped character.[^:\\]
matches any character that is not in an escape sequence (because\\.
consumed both of those).((?:\\.|[^:\\])*)
captures all characters up until the first non-escaped delimiter into capture-group 1.(?<=^|[^\\]):
让您接近,但不解决转义斜杠。 (这是一个文字正则表达式,当然你必须转义其中的斜杠才能将其转换为java字符串)(?<=(^|[^\\])(\\\\)*):怎么样? 我认为这应该满足前面有偶数个斜杠的任何“:”。
编辑:不要对此投赞成票。 MizardX 的解决方案更好:)
(?<=^|[^\\]):
gets you close, but doesn't address escaped slashes. (That's a literal regex, of course you have to escape the slashes in it to get it into a java string)(?<=(^|[^\\])(\\\\)*):
How about that? I think that should satisfy any ':' that is preceded by an even number of slashes.Edit: don't vote this up. MizardX's solution is better :)