使用 Ruby CSV 提取一列

发布于 2024-11-06 22:55:52 字数 919 浏览 1 评论 0原文

我一直在尝试从 csv 文件中获取单列。

我已经阅读了文档 http://www.ruby -doc.org/stdlib/libdoc/csv/rdoc/index.html 但还是不太明白如何使用它。

如果我使用 CSV.table，与 CSV.read 相比，响应速度非常慢。我承认我正在加载的数据集非常大，这正是我只想从中获取单个列的原因。

我的请求目前看起来像这样

@dataTable = CSV.table('path_to_csv.csv')

，当我调试时，我得到的响应

#<CSV::Table mode:col_or_row row_count:2104 >

The document said I should be able to use by_col()，但是当我尝试输出时

<%= debug @dataTable.by_col('col_name or index') %>

它给了我“未定义的方法'col'错误”

有人可以吗向我解释一下我应该如何使用 CSV？是否有一种方法可以使用“读取”而不是“表”更快地获取列？

我使用的是 Ruby 1.92，它说它正在使用 fastCSV，所以我不需要使用 FasterCSV gem。

原文

I've been trying to work with getting a single column out of a csv file.

I've gone through the documentation, http://www.ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html
but still don't really understand how to use it.

If I use CSV.table, the response is incredibly slow compared to CSV.read. I admit the dataset I'm loading is quite large, which is exactly the reason I only want to get a single column from it.

My request is simply currently looks like this

@dataTable = CSV.table('path_to_csv.csv')

and when I debug I get a response of

#<CSV::Table mode:col_or_row row_count:2104 >

The documentation says I should be able to use by_col(), but when I try to output

<%= debug @dataTable.by_col('col_name or index') %>

It gives me "undefined method 'col' error"

Can somebody explain to me how I'm supposed to use CSV? and if there is a way to get columns faster using 'read' instead of 'table'?

I'm using Ruby 1.92, which says that it is using fasterCSV, so I don't need to use the FasterCSV gem.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

孤独患者 2024-11-13 22:55:52

要从 csv 中提取一列，我可能会执行如下操作：

col_data = []
CSV.foreach(FILENAME) {|row| col_data << row[COL_INDEX]}

这应该比 CSV.Table 上的任何操作都要快得多

To pluck a column out of a csv I'd probably do something like the following:

col_data = []
CSV.foreach(FILENAME) {|row| col_data << row[COL_INDEX]}

That should be substantially faster than any operations on CSV.Table

回复收藏 0 原文

桃酥萝莉 2024-11-13 22:55:52

您可以使用以下代码片段从 csv 文件的单列中获取值。

@dataTable = CSV.table('path_to_csv.csv')
@dataTable[:columnname]

You can get the values from single column of the csv files using the following snippet.

@dataTable = CSV.table('path_to_csv.csv')
@dataTable[:columnname]

回复收藏 0 原文

甜柠檬 2024-11-13 22:55:52

我发现这对我有用（我在这里使用 OP 的变量名称）：

@dataTable = CSV.read('path_to_csv.csv')
@dataTable.by_col!
p @dataTable.values_at('Field1')

这会将 Field1 列中的所有值打印为具有一个元素的数组数组：[value1 ]、[值2]、[值3]... 等等。因此

p @dataTable.values_at('Field1').flatten

将在单个数组中打印 Field1 列中的所有值。

如果您想一一循环遍历表中的所有字段，那么这是一种方法。首先，您必须使用 by_col! 进行转换，以便索引引用列而不是行。然后索引将引用列而不是行，您可以执行以下操作：

@dataTable = CSV.read('path_to_csv.csv')
@dataTable.by_col!

0.upto(@dataTable.headers.size - 1) do |i|
  p @dataTable.values_at(i).flatten.compact.size # Or whatever you want here
end

这是一种从 CSV 文件中计算汇总值的方法，然后可以使用该方法创建数据透视表。如果需要从 CSV 文件输入数据并以数据透视表的形式输出汇总数据，这可能是一种简单的方法。

I found that this works for me (I'm using the OP's variable name here):

@dataTable = CSV.read('path_to_csv.csv')
@dataTable.by_col!
p @dataTable.values_at('Field1')

This prints all the values in the column Field1, as an array of arrays with one element: [value1],[value2],[value3]... and so on. So

p @dataTable.values_at('Field1').flatten

will print all the values in the column Field1 in a single array.

If you want to loop through all the fields in a table one by one, then here's one way to do that. First, you have to convert so that indexes reference columns rather than rows, with by_col!. Then indexes will reference columns instead of rows, and you can do something like this:

@dataTable = CSV.read('path_to_csv.csv')
@dataTable.by_col!

0.upto(@dataTable.headers.size - 1) do |i|
  p @dataTable.values_at(i).flatten.compact.size # Or whatever you want here
end

This is a way to work up summary values from a CSV file, which can then be used to create a pivot table. If there's a requirement to input data from a CSV file and output summary data in the form of a pivot table, this might be a straightforward way to go.

回复收藏 0 原文

~没有更多了~