正则表达式匹配 CSV 文件嵌套引号

发布于 2024-09-05 09:10:55 字数 442 浏览 4 评论 0原文

我知道这个问题已经被讨论过一百万次了。我尝试在论坛中搜索并看到一些接近的正则表达式并尝试修改它们但无济于事。

假设 CSV 文件中有一行如下所示:

"123", 456, "701 "B" Street", 910
                 ^^^

是否有一个简单的正则表达式来检测 "B" (因为它是正常 CSV 引号中的一组非转义引号)并将其替换为 \"B\" > ?最终的字符串最终看起来像这样:

"123", 456, "701 \"B\" Street", 910

非常感谢您的帮助!

I know this has been discussed a million times. I tried searching through the forums and have seen some close regex expressions and tried to modify them but to no avail.

Say there is a line in a CSV file like this:

"123", 456, "701 "B" Street", 910
                 ^^^

Is there an easy regex to detect "B" (since it's a non-escaped set of quotes within the normal CSV quotes) and replace it with something like \"B\" ? The final string would end up looking like this:

"123", 456, "701 \"B\" Street", 910

Help would be greatly appreciated!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

永言不败 2024-09-12 09:10:55

相信我,您不想使用正则表达式执行此操作。您需要类似 Java CSV 库 的东西。

Trust me you don't want to do this with regex. You want something like Java CSV Library.

草莓味的萝莉 2024-09-12 09:10:55

有数不清的库可以帮助您解析 CSV,但如果您出于学术原因想要使用正则表达式,这可能会有所帮助:

  • 带有转义支持的引用字符串。
    "(\\.|[^\\"])*"
  • 未引用字段: [^",]*
  • 分隔符: , *

我不使用 CSV 文件,所以我不确定“其他 csv 字段”的有效性(匹配 456,例如上面),或者 /、*/ 是否是您想要的分隔符。

无论如何,组合以上内容将匹配一个字段和一个分隔符(或字符串结尾):

(quotedstring|unquoted)(delimiter|$)

There are a few zillion libraries to help you parse CSV, but if you're wanting to use a regexp for academic reasons, this may help:

  • quoted string with escape support.
    "(\\.|[^\\"])*"
  • unquoted field: [^",]*
  • delimiter: , *

I don't use CSV files, so I'm not sure about the 'other csv field' validity (matching 456, for example above), or whether /, */ is the delimiter you want..

At any rate, combining the above will match one field and one delimiter (or end of string):

(quotedstring|unquoted)(delimiter|$)
微暖i 2024-09-12 09:10:55
(?<!^)(?<!",)(?<!\d,)"(?!,")(?!,\d)(?!$)(?!,-\d)

我得到了这个工作,我想如果其他人正在寻找答案我会发布它

(?<!^)(?<!",)(?<!\d,)"(?!,")(?!,\d)(?!$)(?!,-\d)

I got this to work, thought I would post it if anyone else is looking for an answer

七七 2024-09-12 09:10:55

我会使用定制的 sed 表达式

's/\(.*\),\(.*\),\(.*\)"\(.*\)\" \(.*\),\(.*\)/\1,\2,\3 \4 \5 \6/g'

I would use a tailored sed expression as

's/\(.*\),\(.*\),\(.*\)"\(.*\)\" \(.*\),\(.*\)/\1,\2,\3 \4 \5 \6/g'
复古式 2024-09-12 09:10:55

您的示例不是正确的 CSV:

"123", 456, "701 "B" Street", 910

这实际上应该是:(

"123", 456, "701 ""B"" Street", 910

当然,CSV 有很多变体,但由于大多数时候人们希望它与 excel 或 access 一起使用,所以我坚持使用 Microsoft 定义。)

正则表达式可以如下所示:

".+("").+("").+"

因此,其 组(括号中)将是双引号,其余部分确保在另一组引号中找到它们。

这涵盖了您需求的查找部分。替换部分取决于您正在编程的内容。

Your example is not proper CSV:

"123", 456, "701 "B" Street", 910

this should actually be:

"123", 456, "701 ""B"" Street", 910

(There are plenty of variations of CSV, of course, but since most of the time people want it for use with excel or access I stick to the Microsoft definition.)

Therefore the regex for this can look like:

".+("").+("").+"

The groups (in parentheses) will be your double quotes, and the rest ensures that they are found within another set of quotes.

That covers the find part of your needs. The replace part depends on what you are programming in.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文