使用 FasterCSV gem 克服 CSV 解析的基本问题
我发现 FasterCSV (1.5.0) 存在 CSV 解析问题,这似乎是一个真正的错误,但我希望有一个解决方法。
基本上,当字段用引号括起来时,在分隔符后添加一个空格(在我的例子中是逗号)会生成 MalformedCSVError
。
这是一个简单的例子:
# No quotes on fields -- works fine
FasterCSV.parse_line("one,two,three")
=> ["one", "two", "three"]
# Quotes around fields with no spaces after separators -- works fine
FasterCSV.parse_line("\"one\",\"two\",\"three\"")
=> ["one", "two", "three"]
# Quotes around fields but with a space after the first separator -- fails!
FasterCSV.parse_line("\"one\", \"two\",\"three\"")
=> FasterCSV::MalformedCSVError: Illegal quoting on line 1.
我疯了,还是这是 FasterCSV 中的一个错误?
I have found a CSV parsing issue with FasterCSV (1.5.0) which seems like a genuine bug, but which I'm hoping there's a workaround for.
Basically, adding a space after the separator (in my case a comma) when the fields are enclosed in quotes generates a MalformedCSVError
.
Here's a simple example:
# No quotes on fields -- works fine
FasterCSV.parse_line("one,two,three")
=> ["one", "two", "three"]
# Quotes around fields with no spaces after separators -- works fine
FasterCSV.parse_line("\"one\",\"two\",\"three\"")
=> ["one", "two", "three"]
# Quotes around fields but with a space after the first separator -- fails!
FasterCSV.parse_line("\"one\", \"two\",\"three\"")
=> FasterCSV::MalformedCSVError: Illegal quoting on line 1.
Am I going mad, or is this a bug in FasterCSV?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
MalformedCSVError
在这里是正确的。CSV 格式中的前导/尾随空格不会被忽略,它们被视为字段的一部分。因此,这意味着您已经以空格开始了一个字段,然后在该字段中包含了未转义的双引号,这将导致非法引用错误。
也许这个库只是比您使用过的其他库更严格。
The
MalformedCSVError
is correct here.Leading/trailing spaces in CSV format are not ignored, they are considered part of a field. So this means you have started a field with a space, and then included unescaped double quotes in that field, which would cause the illegal quoting error.
Maybe this library is just more strict than others you have used.
也许您可以将 :col_sep: 选项设置为 ', ' 以使其解析这样的文件。
Maybe you could set the :col_sep: option to ', ' to make it parse files like that.
我本来希望
:col_sep
选项可以允许正则表达式,但它似乎既可用于读取又可用于写入,这很遗憾。 文档 没有抱太大希望,您的需求可能比可能的更直接通过请求更改或提交补丁来满足;-)如果您显式调用
#parse_line
,那么您始终可以在输入行上调用。如果您预计带引号的字符串中可能存在逗号空格,则该正则表达式可能需要进行重大更改。 (我建议在此处重新发布这样的问题,并使用合适的标签,并让 RegEx 专家在这种情况下放松处理)。
I had hoped that the
:col_sep
option might allow a regular expression, but it seems to be used for both reading and writing, which is a shame. The documentation doesn't hold out much hope and your need is probably more immediate than could be satisfied by requesting a change or submitting a patch ;-)If you're calling
#parse_line
explicitly, then you could always callon your input line. That regular expression might need to change significantly if you anticipate the possibility of comma-space within quoted strings. (I'd suggest reposting such a question here with a suitable tag and let the RegEx mavens loose on it should that be the case).