CSV +文件助手 +双引号 = 噩梦

发布于 2024-12-03 07:32:23 字数 802 浏览 1 评论 0原文

我似乎无法处理我收到的 CSV。这是银行生成的文件,如下所示:(

"000,""PLN"",""XYZ"",""2011-08-31"",""2011-08-31"",""0,00"""
1,""E"",""2011-08-30"",""2011-08-31"",""2011-08-31"",""399,00"",""0000103817846977"",""UZNANIE OTRZYMANE ELIXIR"",""23103015080000000550217023"",""XXX"",""POLISA UBEZPIECZENIA NR XXX  "",""000""
3,""E"",""2011-08-31"",""2011-08-31"",""2011-08-31"",""1433,00"",""0000154450232753"",""UZNANIE OTRZYMANE ELIXIR"",""000"",""XXX"",""POLISA UBEZPIECZENIA XXX  "",""000""

我更改了所有敏感信息)。

从早上起我就一直在尝试解析它,但没什么大不了的。我使用了在网上找到的 LINQ to CSV 示例,CodeProject 示例(它们都抛出了一个错误,表示 CSV 已损坏),最后我以 FileHelpers 结束,它似乎可以工作,但是:

  1. 它分割了“399” ,00”和类似的值放入两个字段中。
  2. 当我使用 [(FieldQuoted()] 属性时,一切都会陷入困境,因为所有字段都用双引号引起来。我怀疑这就是其他解析器无法工作的原因。

有什么想法如何处理它吗?

I can't seem to handle a CSV I got. It's a file generated by a bank, which looks like this:

"000,""PLN"",""XYZ"",""2011-08-31"",""2011-08-31"",""0,00"""
1,""E"",""2011-08-30"",""2011-08-31"",""2011-08-31"",""399,00"",""0000103817846977"",""UZNANIE OTRZYMANE ELIXIR"",""23103015080000000550217023"",""XXX"",""POLISA UBEZPIECZENIA NR XXX  "",""000""
3,""E"",""2011-08-31"",""2011-08-31"",""2011-08-31"",""1433,00"",""0000154450232753"",""UZNANIE OTRZYMANE ELIXIR"",""000"",""XXX"",""POLISA UBEZPIECZENIA XXX  "",""000""

(I changed all sensitive information).

I've been trying to parse it since morning but no biggie. I used the LINQ to CSV example found somwhere on the net, the CodeProject one (both of them threw an error which said that the CSV is corrupted) and I ended with FileHelpers which SEEMS to work BUT:

  1. It splits the "399,00" and similar values into two fields.
  2. When I use the [(FieldQuoted()] attribute it all goes to hell, since all the fields are quoted in DOUBLE quotation marks. I suspect that is the reason why the other parsers wouldn't work.

Any ideas how to handle it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

笑咖 2024-12-10 07:32:23

如果问题似乎是双引号,您可以通过用单双引号替换双双引号来预处理每一行:

line = line.Replace( "\"\"", "\"" );

处理整个文件后,您可以让它由任何其他 CSV 处理器处理。
无论如何,自己编写可能会更容易。

If the problem seems to be the double quote, you could preprocess each line by substituting the double double quotes by single double quotes:

line = line.Replace( "\"\"", "\"" );

Once the whole file has been processed, you can let it handled by any other CSV processor.
It will be probably easier to write your own, anyway.

指尖上得阳光 2024-12-10 07:32:23

我一直在使用 Lumen、CommonLibrary、FileHelpers 等,最终得到了 TextFieldParser 类(来自 Visual Basic 命名空间,但可以在 C# 中使用,没有任何问题)。我建议你尝试一下。唯一的缺点是速度相对较慢。但它似乎可以很好地应对边缘情况。

我什至发明了一个技巧,让它可以处理明显无效的 CSV 文件(“””等;OpenOffice Calc 无法正确处理它们)——当我遇到这样的行并得到 MalformedLineException ,我仍然会在 catch 块中解析它,并将 HasFieldsEnheldInQuotes 属性设置为 false,这样

它会正确地分割该行 。 ,刚刚离开然后我所要做的就是“手动”删除这些双引号。

I have been using Lumen, CommonLibrary, FileHelpers etc. and I ended up with TextFieldParser class (from Visual Basic namespace, but can be used in C# without any problem). I recommend you try that. The only downside is that it's relatively slow. But it seems to cope with edge cases quite well.

I even invented a trick getting it to work with obviously invalid CSV files (""" etc.; OpenOffice Calc couldn't handle them properly) - when I'd encounter such a line and got a MalformedLineException, I'd still parse it within the catch block with the HasFieldsEnclosedInQuotes property set to false, for a change.

It would split the line properly, just leaving all the values in double apostrophes. All I had to do then was to remove these double quotes "manually".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文