使用 FasterCSV gem 克服 CSV 解析的基本问题

发布于 2024-08-12 13:53:47 字数 671 浏览 3 评论 0原文

我发现 FasterCSV (1.5.0) 存在 CSV 解析问题,这似乎是一个真正的错误,但我希望有一个解决方法。

基本上,当字段用引号括起来时,在分隔符后添加一个空格(在我的例子中是逗号)会生成 MalformedCSVError

这是一个简单的例子:

# No quotes on fields -- works fine
FasterCSV.parse_line("one,two,three")
=> ["one", "two", "three"]

# Quotes around fields with no spaces after separators -- works fine
FasterCSV.parse_line("\"one\",\"two\",\"three\"")
=> ["one", "two", "three"]

# Quotes around fields but with a space after the first separator -- fails!
FasterCSV.parse_line("\"one\", \"two\",\"three\"")
=> FasterCSV::MalformedCSVError: Illegal quoting on line 1.

我疯了,还是这是 FasterCSV 中的一个错误?

I have found a CSV parsing issue with FasterCSV (1.5.0) which seems like a genuine bug, but which I'm hoping there's a workaround for.

Basically, adding a space after the separator (in my case a comma) when the fields are enclosed in quotes generates a MalformedCSVError.

Here's a simple example:

# No quotes on fields -- works fine
FasterCSV.parse_line("one,two,three")
=> ["one", "two", "three"]

# Quotes around fields with no spaces after separators -- works fine
FasterCSV.parse_line("\"one\",\"two\",\"three\"")
=> ["one", "two", "three"]

# Quotes around fields but with a space after the first separator -- fails!
FasterCSV.parse_line("\"one\", \"two\",\"three\"")
=> FasterCSV::MalformedCSVError: Illegal quoting on line 1.

Am I going mad, or is this a bug in FasterCSV?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

浅唱々樱花落 2024-08-19 13:53:48

MalformedCSVError 在这里是正确的。

CSV 格式中的前导/尾随空格不会被忽略,它们被视为字段的一部分。因此,这意味着您已经以空格开始了一个字段,然后在该字段中包含了未转义的双引号,这将导致非法引用错误。

也许这个库只是比您使用过的其他库更严格。

The MalformedCSVError is correct here.

Leading/trailing spaces in CSV format are not ignored, they are considered part of a field. So this means you have started a field with a space, and then included unescaped double quotes in that field, which would cause the illegal quoting error.

Maybe this library is just more strict than others you have used.

悲欢浪云 2024-08-19 13:53:48

也许您可以将 :col_sep: 选项设置为 ', ' 以使其解析这样的文件。

Maybe you could set the :col_sep: option to ', ' to make it parse files like that.

墨洒年华 2024-08-19 13:53:48

我本来希望 :col_sep 选项可以允许正则表达式,但它似乎既可用于读取又可用于写入,这很遗憾。 文档 没有抱太大希望,您的需求可能比可能的更直接通过请求更改或提交补丁来满足;-)

如果您显式调用#parse_line,那么您始终可以

gsub(/,\s*/, ',')

在输入行上调用。如果您预计带引号的字符串中可能存在逗号空格,则该正则表达式可能需要进行重大更改。 (我建议在此处重新发布这样的问题,并使用合适的标签,并让 RegEx 专家在这种情况下放松处理)。

I had hoped that the :col_sep option might allow a regular expression, but it seems to be used for both reading and writing, which is a shame. The documentation doesn't hold out much hope and your need is probably more immediate than could be satisfied by requesting a change or submitting a patch ;-)

If you're calling #parse_line explicitly, then you could always call

gsub(/,\s*/, ',')

on your input line. That regular expression might need to change significantly if you anticipate the possibility of comma-space within quoted strings. (I'd suggest reposting such a question here with a suitable tag and let the RegEx mavens loose on it should that be the case).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文