使用 FasterCSV 解析这一行的正确方法？

发布于 2024-08-21 17:48:27 字数 395 浏览 3 评论 0原文

我在 CSV 文件中有以下行，在解析时出现问题：

312,'997639','2','John, Doe. 《JD》'、''、'2000'、''、'街道'、'城市'、'NY'、'99999'、''、'2010-02-17 19:12:04'、'2010- 02-17 19:12:04';

我正在使用以下参数进行解析：

FasterCSV.foreach(file, {:headers => true, :quote_char => '"', :col_sep => "','"} ) do |row|

但是，由于行列中的“JD”，它会在像上面这样的行上爆炸。如何使用 FasterCSV 正确解析该行？

谢谢！

原文

I have the following line in a CSV file that's giving me issues when parsing:

312,'997639',' 2','John, Doe. "J.D." ',' ','2000 ',' ','Street ','City ','NY','99999','','2010-02-17 19:12:04','2010-02-17 19:12:04';

I'm parsing with the following parameters:

FasterCSV.foreach(file, {:headers => true, :quote_char => '"', :col_sep => "','"} ) do |row|

However, it's blowing up on rows like the one above due to the "J.D" inside a row column. How do I properly parse that line with FasterCSV?

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

聽兲甴掵 2024-08-28 17:48:27

在我看来，您的 :quote_char 应该是 ' ，而您的 :col_sep 应该是 , 。在这种情况下：

FasterCSV.foreach(file, {:headers => true, :quote_char => "'", :col_sep => ','} ) ...

It looks to me like your :quote_char should be ' and your :col_sep should be ,. In that case:

FasterCSV.foreach(file, {:headers => true, :quote_char => "'", :col_sep => ','} ) ...

回复收藏 0 原文

凉月流沐 2024-08-28 17:48:27

你不能那样做。 FasterCSV 仅允许选择一种引号字符，而您的应用程序需要两种。没有办法做一些可爱的事情，比如传入正则表达式而不是字符，因为 FasterCSV 预编译匹配器，其中引号字符转义如下：

# prebuild Regexps for faster parsing
esc_col_sep = Regexp.escape(@col_sep)
esc_row_sep = Regexp.escape(@row_sep)
esc_quote   = Regexp.escape(@quote_char)
@parsers = {
  :any_field      => Regexp.new( "[^#{esc_col_sep}]+",
                                 Regexp::MULTILINE,
                                 @encoding ),
  :quoted_field   => Regexp.new( "^#{esc_quote}(.*)#{esc_quote}$",
                                 Regexp::MULTILINE,
                                 @encoding ),
  ...
}

You can't do that. FasterCSV only allows one choice of quote character, and your application needs two. There isn't a way to do cute stuff like pass in a regex instead of a character because FasterCSV precompiles matchers with the quote character escaped as follows:

# prebuild Regexps for faster parsing
esc_col_sep = Regexp.escape(@col_sep)
esc_row_sep = Regexp.escape(@row_sep)
esc_quote   = Regexp.escape(@quote_char)
@parsers = {
  :any_field      => Regexp.new( "[^#{esc_col_sep}]+",
                                 Regexp::MULTILINE,
                                 @encoding ),
  :quoted_field   => Regexp.new( "^#{esc_quote}(.*)#{esc_quote}$",
                                 Regexp::MULTILINE,
                                 @encoding ),
  ...
}

回复收藏 0 原文