解析大型 CSV 文件,处理逗号和引号
我需要加载一个大的 CSV 文件(>1MB)并解析它。 一般来说,这很容易做到,只需先按换行符再按逗号进行分割即可。 问题是有些条目包含的字符串包含自己的逗号。当此电子表格转换为 CSV 时,包含逗号的行会用引号引起来。
我编写了一个解析器,它首先转义这些字符串中的所有逗号,然后按换行符和逗号将其拆分,然后再次对值进行转义。
对于这么长的字符串来说,这是一个相当慢的过程,因为我需要迭代整个字符串。 有谁知道处理这个问题的更快或更优化的方法?
I need to load in a large CSV file (>1MB) and parse it.
Generally this is quite easy to do by splitting first on linebreaks and then commas.
The problem is though that some entries contain Strings that include their own commas. When this spreadsheet is converted to CSV, the lines containing commas are wrapped in quotes.
I've written a parser that first escapes all the commas in these strings, then splits it on linebreaks and then commas, and then unescapes the values again.
This is quite a slow process for such a long string, as I need to iterate through the whole string.
Does anyone know a faster or more optimised method of dealing with this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您看过 csvlib 了吗?它是 ActionScript 3 的解析器库。它声称旨在正确处理带引号的字符串。
希望您已经将字符串括在引号中,尤其是包含逗号的字符串。 CSV 解析器无法区分作为字符串一部分的逗号和分隔两个字符串的逗号,除非字符串周围有引号。
Have you had a look at csvlib yet? It is a parser library for ActionScript 3. It claims to be designed to properly handle quoted strings.
Hopefully, you are already enclosing your strings in quotes, especially the ones containing the commas. CSV parsers cannot distinguish a comma that is part of a string from a comma that separates two strings, unless the strings have quotes around them.
一次处理文件将减少时间。这可以通过使用简单的状态机来处理嵌入在值中的逗号的复杂性来实现。
问候
Processing the file in a single pass will reduce the time. This can be achieved by using a simple state machine to handle the complexity of commas embedded in the values.
Regards
VisualBasic 但它在 C# 中也同样可以工作 - 请记住,在
结束这一切都只是 IL)
Microsoft.VisualBasic.FileIO.TextFieldParser
类来解析CSV 文件
以下是示例代码:
Microsoft.VisualBasic
(yes, it saysVisualBasic but it works in C# just as well - remember that at the
end it is all just IL)
Microsoft.VisualBasic.FileIO.TextFieldParser
class to parse theCSV file
Here is the sample code: