正则表达式匹配 CSV 文件嵌套引号
我知道这个问题已经被讨论过一百万次了。我尝试在论坛中搜索并看到一些接近的正则表达式并尝试修改它们但无济于事。
假设 CSV 文件中有一行如下所示:
"123", 456, "701 "B" Street", 910
^^^
是否有一个简单的正则表达式来检测 "B"
(因为它是正常 CSV 引号中的一组非转义引号)并将其替换为 \"B\"
> ?最终的字符串最终看起来像这样:
"123", 456, "701 \"B\" Street", 910
非常感谢您的帮助!
I know this has been discussed a million times. I tried searching through the forums and have seen some close regex expressions and tried to modify them but to no avail.
Say there is a line in a CSV file like this:
"123", 456, "701 "B" Street", 910
^^^
Is there an easy regex to detect "B"
(since it's a non-escaped set of quotes within the normal CSV quotes) and replace it with something like \"B\"
? The final string would end up looking like this:
"123", 456, "701 \"B\" Street", 910
Help would be greatly appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
相信我,您不想使用正则表达式执行此操作。您需要类似 Java CSV 库 的东西。
Trust me you don't want to do this with regex. You want something like Java CSV Library.
有数不清的库可以帮助您解析 CSV,但如果您出于学术原因想要使用正则表达式,这可能会有所帮助:
"(\\.|[^\\"])*"
我不使用 CSV 文件,所以我不确定“其他 csv 字段”的有效性(匹配 456,例如上面),或者 /、*/ 是否是您想要的分隔符。
无论如何,组合以上内容将匹配一个字段和一个分隔符(或字符串结尾):
There are a few zillion libraries to help you parse CSV, but if you're wanting to use a regexp for academic reasons, this may help:
"(\\.|[^\\"])*"
I don't use CSV files, so I'm not sure about the 'other csv field' validity (matching 456, for example above), or whether /, */ is the delimiter you want..
At any rate, combining the above will match one field and one delimiter (or end of string):
我得到了这个工作,我想如果其他人正在寻找答案我会发布它
I got this to work, thought I would post it if anyone else is looking for an answer
我会使用定制的 sed 表达式
I would use a tailored sed expression as
您的示例不是正确的 CSV:
这实际上应该是:(
当然,CSV 有很多变体,但由于大多数时候人们希望它与 excel 或 access 一起使用,所以我坚持使用 Microsoft 定义。)
正则表达式可以如下所示:
因此,其 组(括号中)将是双引号,其余部分确保在另一组引号中找到它们。
这涵盖了您需求的查找部分。替换部分取决于您正在编程的内容。
Your example is not proper CSV:
this should actually be:
(There are plenty of variations of CSV, of course, but since most of the time people want it for use with excel or access I stick to the Microsoft definition.)
Therefore the regex for this can look like:
The groups (in parentheses) will be your double quotes, and the rest ensures that they are found within another set of quotes.
That covers the find part of your needs. The replace part depends on what you are programming in.