使用 FasterCSV gem 克服 CSV 解析的基本问题

发布于 2024-08-12 13:53:47 字数 671 浏览 3 评论 0原文

我发现 FasterCSV (1.5.0) 存在 CSV 解析问题，这似乎是一个真正的错误，但我希望有一个解决方法。

基本上，当字段用引号括起来时，在分隔符后添加一个空格（在我的例子中是逗号）会生成 MalformedCSVError。

这是一个简单的例子：

# No quotes on fields -- works fine
FasterCSV.parse_line("one,two,three")
=> ["one", "two", "three"]

# Quotes around fields with no spaces after separators -- works fine
FasterCSV.parse_line("\"one\",\"two\",\"three\"")
=> ["one", "two", "three"]

# Quotes around fields but with a space after the first separator -- fails!
FasterCSV.parse_line("\"one\", \"two\",\"three\"")
=> FasterCSV::MalformedCSVError: Illegal quoting on line 1.

我疯了，还是这是 FasterCSV 中的一个错误？

原文

I have found a CSV parsing issue with FasterCSV (1.5.0) which seems like a genuine bug, but which I'm hoping there's a workaround for.

Basically, adding a space after the separator (in my case a comma) when the fields are enclosed in quotes generates a MalformedCSVError.

Here's a simple example:

# No quotes on fields -- works fine
FasterCSV.parse_line("one,two,three")
=> ["one", "two", "three"]

# Quotes around fields with no spaces after separators -- works fine
FasterCSV.parse_line("\"one\",\"two\",\"three\"")
=> ["one", "two", "three"]

# Quotes around fields but with a space after the first separator -- fails!
FasterCSV.parse_line("\"one\", \"two\",\"three\"")
=> FasterCSV::MalformedCSVError: Illegal quoting on line 1.

Am I going mad, or is this a bug in FasterCSV?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅唱々樱花落 2024-08-19 13:53:48

MalformedCSVError 在这里是正确的。

CSV 格式中的前导/尾随空格不会被忽略，它们被视为字段的一部分。因此，这意味着您已经以空格开始了一个字段，然后在该字段中包含了未转义的双引号，这将导致非法引用错误。

也许这个库只是比您使用过的其他库更严格。

回复收藏 0 原文

悲欢浪云 2024-08-19 13:53:48

也许您可以将 :col_sep: 选项设置为 ', ' 以使其解析这样的文件。

回复收藏 0 原文

墨洒年华 2024-08-19 13:53:48

我本来希望 :col_sep 选项可以允许正则表达式，但它似乎既可用于读取又可用于写入，这很遗憾。文档没有抱太大希望，您的需求可能比可能的更直接通过请求更改或提交补丁来满足;-)

如果您显式调用#parse_line，那么您始终可以

gsub(/,\s*/, ',')

在输入行上调用。如果您预计带引号的字符串中可能存在逗号空格，则该正则表达式可能需要进行重大更改。（我建议在此处重新发布这样的问题，并使用合适的标签，并让 RegEx 专家在这种情况下放松处理）。

I had hoped that the :col_sep option might allow a regular expression, but it seems to be used for both reading and writing, which is a shame. The documentation doesn't hold out much hope and your need is probably more immediate than could be satisfied by requesting a change or submitting a patch ;-)

If you're calling #parse_line explicitly, then you could always call

gsub(/,\s*/, ',')

on your input line. That regular expression might need to change significantly if you anticipate the possibility of comma-space within quoted strings. (I'd suggest reposting such a question here with a suitable tag and let the RegEx mavens loose on it should that be the case).

回复收藏 0 原文

~没有更多了~