当前位置：文江博客话题详情

Ruby：如何处理带有“坏逗号”的 CSV 文件？

发布于 2024-09-27 15:47:06 字数 290 浏览 3 评论 0原文

我需要处理来自 FedEx.com 的包含运输历史记录的 CSV 文件。不幸的是，FedEx 似乎并没有实际测试其 CSV 文件，因为它没有引用其中包含逗号的字符串。

例如，公司名称可能是“Dog Widgets, Inc.”。但 CSV 不引用该字符串，因此任何 CSV 解析器都会认为“Inc.”之前的逗号。是一个新领域的开始。

有什么方法可以使用 Ruby 可靠地解析这些行吗？

我能发现的唯一区别特征是作为字符串一部分的逗号后面有一个空格。分隔字段的逗号没有空格。不知道这如何帮助我解析这个，但这是我注意到的。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

玉环 2024-10-04 15:47:06

你可以使用负前瞻

>> "foo,bar,baz,pop, blah,foobar".split(/,(?![ \t])/)
=> ["foo", "bar", "baz", "pop, blah", "foobar"]

you can use a negative lookahead

>> "foo,bar,baz,pop, blah,foobar".split(/,(?![ \t])/)
=> ["foo", "bar", "baz", "pop, blah", "foobar"]

回复收藏 0 原文

月亮是我掰弯的 2024-10-04 15:47:06

好吧，这里有一个想法：您可以将逗号后跟空格的每个实例替换为唯一字符，然后照常解析 CSV，然后遍历结果行并反转替换。

回复收藏 0 原文

羁拥 2024-10-04 15:47:06

也许沿着这些思路..

使用 gsub 将 ', ' 更改为其他内容

ruby-1.9.2-p0 > "foo,bar,baz,pop, blah,foobar".gsub(/,\ /,'| ').split(',')
[
    [0] "foo",
    [1] "bar",
    [2] "baz",
    [3] "pop| blah",
    [4] "foobar"
]

，然后删除 |话后。

Perhaps something along these lines..

using gsub to change the ', ' to something else

ruby-1.9.2-p0 > "foo,bar,baz,pop, blah,foobar".gsub(/,\ /,'| ').split(',')
[
    [0] "foo",
    [1] "bar",
    [2] "baz",
    [3] "pop| blah",
    [4] "foobar"
]

and then remove the | after words.

回复收藏 0 原文

め七分饶幸 2024-10-04 15:47:06

如果您很幸运只有一个这样的字段，您可以从头开始解析前导字段，然后从末尾解析尾随字段，并假设剩下的内容是有问题的字段。在Python（没有habla ruby）中，这看起来像：

fields = line.split(',') # doesn't work if some fields are quoted
fields = fields[:5] + [','.join(fields[5:-3])] + fields[-3:]

无论你做什么，你至少应该能够确定有问题的逗号的数量，这应该给你一些东西（如果没有其他的话，进行健全性检查）。

If you are so lucky as to only have one field like that, you can parse the leading fields off the start, the trailing fields off than end and assume whatever is left is the offending field. In python (no habla ruby) this would look something like:

fields = line.split(',') # doesn't work if some fields are quoted
fields = fields[:5] + [','.join(fields[5:-3])] + fields[-3:]

Whatever you do, you should be able at a minimum determine the number of offending commas and that should give you something (a sanity check if nothing else).

回复收藏 0 原文

~没有更多了~