解析高级 CSV 文件

发布于 2024-10-26 17:08:36 字数 303 浏览 6 评论 0原文

我必须加载以下 CSV 文件

head1, head2, head3, head4; head5
34 23; 2; "abc";"abc \"sdjh";8
34 23; 2; "abc";"abc 
sdj\;h
jshd";8
34 23; 2; "abc";"abc";8

该函数必须处理转义字符,例如 \" \; \n\r< /code> 和字符串中的新行。 有没有好的库可以解决这个问题?

I have to load the following CSV file

head1, head2, head3, head4; head5
34 23; 2; "abc";"abc \"sdjh";8
34 23; 2; "abc";"abc 
sdj\;h
jshd";8
34 23; 2; "abc";"abc";8

The function must handle escape characters such as \" \; \n and \r and new line in the strings.
Are there any good library to solve this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

深巷少女 2024-11-02 17:08:36

我使用 .Net 的 CSV Reader 获得了良好的结果:http://www.codeproject。 com/KB/database/CsvReader.aspx

I've had good results using CSV Reader for .Net: http://www.codeproject.com/KB/database/CsvReader.aspx.

勿忘初心 2024-11-02 17:08:36

这不是有效的 CSV 文件...

标题行将被解释为

"head1"," head2"," head3"," head4; head5"

每隔一行只有一列。

我认为没有任何图书馆能够开箱即用地处理这个问题。看起来标题行有多个分隔符,所有其他行也可能有多个分隔符。如果您还提供了实际的列是什么,那么帮助会更容易。

您可以尝试 CsvHelper (我维护的库)。它非常灵活。您可以更改标题和行的配置并使它们不同。您可以设置所需的分隔符和引号字段。它还处理 \r、\n 和 \r\n 的行结尾,即使每行都使用不同的行结尾。

That's not a valid CSV file...

The header row would be interpreted as

"head1"," head2"," head3"," head4; head5"

Every other row only has a single column in it.

I don't think any library will be able to handle this out of the box. It looks like the header row has more than one delimiter, and all the other rows might have multiple delimiters too. If you also provided what the actual columns were, it would be easier to help with.

You could give CsvHelper (a library I maintain) a try. It is pretty flexible. You could change the configuration for the headers and rows and make them different. You can set what you want the delimiter and quoted field to be. It also handles line endings of \r, \n, and \r\n even if every line is using a different line ending.

断念 2024-11-02 17:08:36

我无法获得任何东西来通过 CSV 解析的所有测试,所以我最终编写了一些简单的东西来完成它。 AnotherCsvParser

它可以完成我需要的一切......但也应该很容易分叉并扩展到您的需求。

给定:

 public class ABCD
 {
     public string A;
     public string B;
     public string C;
     public string D;
 }

它假设列按照字段定义的顺序排列..(但很容易扩展以读取属性或其他内容)

这有效:

    var output = NigelThorne.CSVParser.ReadCSVAs<ABCD>(
"a,\"b\",c,d\n1,2,3,4\n\"something, with a comma\",\"something \\\"in\\\" quotes\",\" a \\\\ slash \",\n,,\"\n\",");

这样:

  Assert.AreEqual(4, output.ToArray().Length);
  var row1 = output.ToArray()[0];
  Assert.AreEqual("a", row1.A);
  Assert.AreEqual("b", row1.B);
  Assert.AreEqual("c", row1.C);
  Assert.AreEqual("d", row1.D);

注意:对于大量数据,它可能也不是很快..对我来说又不是问题。

I couldn't get anything to pass all my tests for CSV Parsing, so I ended up writing something simple to do it. AnotherCsvParser

It does everything I need... but should be easy to fork and extend to your needs too.

Given:

 public class ABCD
 {
     public string A;
     public string B;
     public string C;
     public string D;
 }

It assumes the columns are in the order the fields are defined..(but would be easy to extend to read an attribute or something)

This works:

    var output = NigelThorne.CSVParser.ReadCSVAs<ABCD>(
"a,\"b\",c,d\n1,2,3,4\n\"something, with a comma\",\"something \\\"in\\\" quotes\",\" a \\\\ slash \",\n,,\"\n\",");

Such that:

  Assert.AreEqual(4, output.ToArray().Length);
  var row1 = output.ToArray()[0];
  Assert.AreEqual("a", row1.A);
  Assert.AreEqual("b", row1.B);
  Assert.AreEqual("c", row1.C);
  Assert.AreEqual("d", row1.D);

Note: It's probably not very fast with lots of data either.. again not a problem for me.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文