使用 Ruby CSV 提取一列
我一直在尝试从 csv 文件中获取单列。
我已经阅读了文档 http://www.ruby -doc.org/stdlib/libdoc/csv/rdoc/index.html 但还是不太明白如何使用它。
如果我使用 CSV.table
,与 CSV.read
相比,响应速度非常慢。我承认我正在加载的数据集非常大,这正是我只想从中获取单个列的原因。
我的请求目前看起来像这样
@dataTable = CSV.table('path_to_csv.csv')
,当我调试时,我得到的响应
#<CSV::Table mode:col_or_row row_count:2104 >
The document said I should be able to use by_col(),但是当我尝试输出时
<%= debug @dataTable.by_col('col_name or index') %>
它给了我“未定义的方法'col'错误”
有人可以吗向我解释一下我应该如何使用 CSV?是否有一种方法可以使用“读取”而不是“表”更快地获取列?
我使用的是 Ruby 1.92,它说它正在使用 fastCSV,所以我不需要使用 FasterCSV gem。
I've been trying to work with getting a single column out of a csv file.
I've gone through the documentation, http://www.ruby-doc.org/stdlib/libdoc/csv/rdoc/index.html
but still don't really understand how to use it.
If I use CSV.table
, the response is incredibly slow compared to CSV.read
. I admit the dataset I'm loading is quite large, which is exactly the reason I only want to get a single column from it.
My request is simply currently looks like this
@dataTable = CSV.table('path_to_csv.csv')
and when I debug I get a response of
#<CSV::Table mode:col_or_row row_count:2104 >
The documentation says I should be able to use by_col(), but when I try to output
<%= debug @dataTable.by_col('col_name or index') %>
It gives me "undefined method 'col' error"
Can somebody explain to me how I'm supposed to use CSV? and if there is a way to get columns faster using 'read' instead of 'table'?
I'm using Ruby 1.92, which says that it is using fasterCSV, so I don't need to use the FasterCSV gem.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
要从 csv 中提取一列,我可能会执行如下操作:
这应该比 CSV.Table 上的任何操作都要快得多
To pluck a column out of a csv I'd probably do something like the following:
That should be substantially faster than any operations on CSV.Table
您可以使用以下代码片段从 csv 文件的单列中获取值。
You can get the values from single column of the csv files using the following snippet.
我发现这对我有用(我在这里使用 OP 的变量名称):
这会将
Field1
列中的所有值打印为具有一个元素的数组数组:[value1 ]、[值2]、[值3]...
等等。因此将在单个数组中打印
Field1
列中的所有值。如果您想一一循环遍历表中的所有字段,那么这是一种方法。首先,您必须使用
by_col!
进行转换,以便索引引用列而不是行。然后索引将引用列而不是行,您可以执行以下操作:这是一种从 CSV 文件中计算汇总值的方法,然后可以使用该方法创建数据透视表。如果需要从 CSV 文件输入数据并以数据透视表的形式输出汇总数据,这可能是一种简单的方法。
I found that this works for me (I'm using the OP's variable name here):
This prints all the values in the column
Field1
, as an array of arrays with one element:[value1],[value2],[value3]...
and so on. Sowill print all the values in the column
Field1
in a single array.If you want to loop through all the fields in a table one by one, then here's one way to do that. First, you have to convert so that indexes reference columns rather than rows, with
by_col!
. Then indexes will reference columns instead of rows, and you can do something like this:This is a way to work up summary values from a CSV file, which can then be used to create a pivot table. If there's a requirement to input data from a CSV file and output summary data in the form of a pivot table, this might be a straightforward way to go.