如何使用 ruby​​ FasterCSV 删除重复列

发布于 2024-10-30 21:22:32 字数 997 浏览 0 评论 0原文

我正在使用 Ruby 1.8 和 FasterCSV。

我正在读取的 csv 文件有几个重复的列。

| acct_id | amount | acct_num | color | acct_id | acct_type | acct_num |
|     345 |  12.34 |      123 |   red |     345 | 'savings' |      123 |
|     678 |  11.34 |      432 | green |     678 | 'savings' |      432 |

...等等

我想将其浓缩为:

| acct_id | amount | acct_num | color | acct_type |
|     345 |  12.34 |      123 |   red | 'savings' |
|     678 |  11.34 |      432 | green | 'savings' |

是否有通用的方法可以做到这一点?

目前我的解决方案是这样的:

headers = CSV.read_line(file)
headers = CSV.read_line # get rid of garbage line between headers and data
FasterCSV.filter(file, :headers => headers) do |row|
  row.delete(6) #delete second acct_num field
  row.delete(4) #delete second acct_id field

  # additional processing on the data
  row['color'] = color_to_number(row['color'])
  row['acct_type'] = acct_type_to_number(row['acct_type'])
end

I'm using Ruby 1.8 and FasterCSV.

The csv file I'm reading in has several repeated columns.

| acct_id | amount | acct_num | color | acct_id | acct_type | acct_num |
|     345 |  12.34 |      123 |   red |     345 | 'savings' |      123 |
|     678 |  11.34 |      432 | green |     678 | 'savings' |      432 |

...etc

I'd like to condense it to:

| acct_id | amount | acct_num | color | acct_type |
|     345 |  12.34 |      123 |   red | 'savings' |
|     678 |  11.34 |      432 | green | 'savings' |

Is there a general purpose way to do this?

Currently my solution is something like:

headers = CSV.read_line(file)
headers = CSV.read_line # get rid of garbage line between headers and data
FasterCSV.filter(file, :headers => headers) do |row|
  row.delete(6) #delete second acct_num field
  row.delete(4) #delete second acct_id field

  # additional processing on the data
  row['color'] = color_to_number(row['color'])
  row['acct_type'] = acct_type_to_number(row['acct_type'])
end

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

慕烟庭风 2024-11-06 21:22:32

假设您想摆脱硬编码的删除

  row.delete(6) #delete second acct_num field
  row.delete(4) #delete second acct_id field

可以替换为

row = row.to_hash

这将破坏重复项。发布的其余代码将继续工作。

Assuming you want to get rid of the hardcoded deletions

  row.delete(6) #delete second acct_num field
  row.delete(4) #delete second acct_id field

Can be replaced by

row = row.to_hash

This will clobber duplicates. The rest of the posted code will keep working.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文