正则表达式以及转义和未转义分隔符
与this相关的问题
我有一个字符串
a\;b\\;c;d
,在Java中看起来
String s = "a\\;b\\\\;c;d"
我需要按照以下规则用分号分隔它:
如果分号前面有反斜杠,不应将其视为分隔符(a 和 b 之间)。
如果反斜杠本身被转义,因此不会转义自己的分号,则该分号应该是分隔符(在 b 和 c 之间)。
因此,如果分号前面有零个或偶数个反斜杠,则应将其视为分隔符。
例如上面的例子,我想得到以下字符串(java编译器的双反斜杠):
a\;b\\
c
d
question related to this
I have a string
a\;b\\;c;d
which in Java looks like
String s = "a\\;b\\\\;c;d"
I need to split it by semicolon with following rules:
If semicolon is preceded by backslash, it should not be treated as separator (between a and b).
If backslash itself is escaped and therefore does not escape itself semicolon, that semicolon should be separator (between b and c).
So semicolon should be treated as separator if there is either zero or even number of backslashes before it.
For example above, I want to get following strings (double backslashes for java compiler):
a\;b\\
c
d
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以使用正则表达式
来匹配未转义分号之间的所有文本:
说明:
所有格匹配 (
++
) 对于避免由于嵌套量词而导致灾难性回溯非常重要。You can use the regex
to match all text between unescaped semicolons:
Explanation:
The possessive match (
++
) is important to avoid catastrophic backtracking because of the nested quantifiers.我不相信用任何类型的正则表达式来检测这些情况。我通常会为此类事情做一个简单的循环,我将使用
C
绘制它的草图,因为我上次接触Java
已经是很久以前的事情了;-)优点 are:
编辑:
我添加了一个完整的 C++ 示例来进行说明。
I do not trust to detect those cases with any kind of regular expression. I usually do a simple loop for such things, I'll sketch it using
C
since it's ages ago I last touchedJava
;-)The advantages are:
EDIT:
I have added a complete C++ example for clarification.
这应该有效。
解释:
因此,您只需匹配分号,而分号前面不正好有一个
\
。编辑:
这将处理任何奇数个 .如果 \ 数量超过 4000000 个,当然会失败。编辑答案的解释:
This should work.
Explanation :
So you just match the semicolons not preceded by exactly one
\
.EDIT :
This will take care of any odd number of . It will of course fail if you have more than 4000000 number of \. Explanation of edited answer :
此方法假设您的字符串中不包含
char '\0'
。如果这样做,您可以使用其他字符。This approach assumes that your string will not have
char '\0'
in your string. If you do, you can use some other char.这是我认为的真实答案。
就我而言,我尝试使用
|
进行拆分,转义字符是&
。在此代码中,我使用 Lookbehind 来转义 &特点。
请注意,后面的外观必须具有最大长度。
这意味着除了
((?:[^&]|^)(&&){0,10000}&))
之后的任何|
这部分表示任意奇数个&
。(?:[^&]|^)
部分对于确保计算|&
非常重要code> 开头或一些其他字符。This is the real answer i think.
In my case i am trying to split using
|
and escape character is&
.In this code i am using Lookbehind to escape & character.
note that the look behind must have maximum length.
this means any
|
except those that are following((?:[^&]|^)(&&){0,10000}&))
and this part means any odd number of&
s.the part
(?:[^&]|^)
is important to make sure that you are counting all of the&
s behind the|
to the beginning or some other characters.