用于查找未终止字符串的正则表达式
我需要在 CSV 文件中搜索以未终止的双引号字符串结尾的行。
例如:
1,2,a,b,"dog","rabbit
会匹配而
1,2,a,b,"dog","rabbit","cat bird"
1,2,a,b,"dog",rabbit
不会匹配。
我对正则表达式的经验非常有限,我唯一能想到的就是“
"[^"]*$
但是,将最后一个引号与行尾相匹配”。
这将如何完成?
I need to search for lines in a CSV file that end in an unterminated, double-quoted string.
For example:
1,2,a,b,"dog","rabbit
would match whereas
1,2,a,b,"dog","rabbit","cat bird"
1,2,a,b,"dog",rabbit
would not.
I have very limited experience with regular expressions, and the only thing I could think of is something like
"[^"]*$
However, that matches the last quote to the end of the line.
How would this be done?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
假设引号无法转义,您需要测试引号的奇偶性(确保它们的数量是偶数而不是奇数)。正则表达式非常适合:
它将匹配具有偶数个引号的所有行。您可以反转所有奇数字符串的结果。或者您可以在开头添加另一个
([^"]*")
部分:同样,如果您可以使用不情愿的运算符而不是贪婪的运算符,则可以使用看起来更简单的表达式:
现在,如果引号可以被转义,这完全是一个不同的问题,但方法是相似的:确定未转义引号的奇偶性。
Assuming quotes can't be escaped, you need to test the parity of quotes (making sure that there's an even number of them instead of odd). Regular expressions are great for that:
That will match all lines with an even number of quotes. You can invert the result for all strings with an odd number. Or you can just add another
([^"]*")
part at the beginning:Similarly, if you have access to reluctant operators instead of greedy ones you can use a simpler-looking expression:
Now, if quotes can be escaped, it's a different question entirely, but the approach would be similar: determine the parity of unescaped quotes.
假设字符串不能包含
"
,则需要匹配具有奇数个引号的字符串,如下所示:请注意,这容易受到 DDOS 攻击。
将匹配零组或多组不带引号的运行,后跟带引号的字符串。
Assuming that the strings cannot contain
"
, you need to match a string that has an odd number of quotes, like this:Note that this is vulnerable to a DDOS attack.
This will match zero or more sets of unquoted run, followed by quoted strings.
试试这个:
它匹配一个引号(行中的任何位置),后面(贪婪地)跟任何但是行尾之前的另一个引号或逗号。
最终影响是它只会匹配带有悬空引号字符串的行。
我认为它甚至不受“嵌套扩展攻击”的影响(我们确实生活在一个非常危险的世界......)
Try this one:
This matches a quote (anywhere in the line), followed (greedily) by anything but another quote before the end of the line or a comma.
The net affect is that it will only match lines with dangling quoted strings.
I think it's even immune to 'nested expandos attacks' (we do live in a very dangerous world ...)
为了避免“嵌套扩展”:
To avoid "nested expandos":