我已经陷入困境了几天,但我无法弄清楚。我正在尝试为支持逃脱字符的双引号字符串文字写一条正则。正则应接受一个字符串,例如“ 1 \ t2”
(\“ 1 \\ t2 \”),并拒绝一个字符串,例如“ invalid \ seccape”
(\) “无效\\逃生\”)。这将处理有效而无效的逃脱, \”(\\(?= [bnrt \'\'\“ \\))。只需接受一切(即,^\“(。当它循环循环时,前面的字符是 \\
(\\\\),它允许在它之后放置任何字符。从头开始,我已经两次去了办公室,我无法弄清楚这一点不希望得到答案。
I've been stuck on this for days and I just can't figure it out. I'm trying to write a regex for double-quoted string literals that supports escape characters. The regex should accept a string such as "1\t2"
(\"1\\t2\") and reject a string such as "invalid\escape"
(\"invalid\\escape\"). This will handle valid and invalid escapes, \"(\\(?=[bnrt\'\"\\]).)*\"
but as soon as I introduce anything to handle a string it just accepts everything (i.e., ^\"(.*(\\(?=[bnrt\\\'\"]))*)*\"$
. I'm pretty sure the issue is that when it loops back around and the preceding character is \\
(\\\\) it allows any character to be placed after it. I just cannot figure it out. I've deleted my work and started from scratch more times than I can remember, I've gone to office hours twice, and I just cannot figure it out. I need fresh eyes on this because I'm blind to it and at my wits end. I'm not looking to be given the answer. I just want help figuring out what I'm doing wrong.
发布评论
评论(1)
打开报价后,您的字符串REGEX以
。*
开始。无论其内容如何,这都与任何字符串匹配。您可以使用(< evave_regex>)*
遵循它,但是由于*
还允许零匹配,因此Regex Engine只是忽略了它。换句话说,您的字符串正则等同于^\“。*\” $
。要改变此行为,您必须确保只有在也有有效的角色之后,后斜切才能匹配。这可以通过使用
[^\\]
更改点来完成,该与后斜切以外的每个字符匹配。生成的正则是这样的,并且可以在您的给定样品上工作:^\“([^\\]*(\\(?= [Bnrt \'\'\'\\])。)*)*)*\“ $ 。
后斜线,然后是有效的逃生角色。执行n次,您的字符串仅填充有效的逃生字符:
^\“([^\\] | \\ [Bnrt \'\'\ \ \])*\“ $
。此言论也适用于您的样品,并且会表现更好,因为您建议的正则罚款遭受。Your string regex starts with
.*
after the opening quote. This matches any string, regardless of its content. You follow it with(<escape_regex>)*
, but since a*
also allows zero matches, the regex engine just ignores it. In other words, your string regex is equivalent to^\".*\"$
.To change this behaviour, you have to ensure that a backslash can only match if it is also followed by a valid character. This can be done by changing your dot with
[^\\]
, which matches every character but a backslash. The resulting regex looks like this and works on your given samples:^\"([^\\]*(\\(?=[bnrt\'\"\\]).)*)*\"$
.There is, however, a much simpler approach.
([^\\]|\\[bnrt\'\"\\])
matches either a non-backslash character, or a backslash followed by a valid escape character. Do that n times and you have a string filled with only valid escape characters:^\"([^\\]|\\[bnrt\'\"\\])*\"$
. This regex also works on your samples, and will perform a lot better, as your suggested regex suffers from catastrophic backtracking.