从 CSV 导入 Ruby 数组,第一个字段作为哈希键,然后查找给定标题行的字段值
也许有人可以帮助我。
从这样的 CSV 文件开始:
Ticker,"Price","Market Cap"
ZUMZ,30.00,933.90
XTEX,16.02,811.57
AAC,9.83,80.02
我设法将它们读入数组:
require 'csv'
tickers = CSV.read("stocks.csv", {:headers => true, :return_headers => true, :header_converters => :symbol, :converters => :all} )
为了验证数据,这可行:
puts tickers[1][:ticker]
ZUMZ
但是这不行:
puts tickers[:ticker => "XTEX"][:price]
我如何使用股票字段作为唯一键将此数组转换为哈希,例如我可以轻松地查找输入第 1 行中定义的任何其他关联字段吗?处理更多的列和行。
非常感谢!
Maybe somebody can help me.
Starting with a CSV file like so:
Ticker,"Price","Market Cap"
ZUMZ,30.00,933.90
XTEX,16.02,811.57
AAC,9.83,80.02
I manage to read them into an array:
require 'csv'
tickers = CSV.read("stocks.csv", {:headers => true, :return_headers => true, :header_converters => :symbol, :converters => :all} )
To verify data, this works:
puts tickers[1][:ticker]
ZUMZ
However this doesn't:
puts tickers[:ticker => "XTEX"][:price]
How would I go about turning this array into a hash using the ticker field as unique key, such that I could easily look up any other field associatively as defined in line 1 of the input? Dealing with many more columns and rows.
Much appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
像这样(它也适用于其他 CSV,而不仅仅是您指定的 CSV):
结果:
您可以像这样访问此数据结构中的元素:
编辑(根据注释):对于选择元素,您可以执行类似的操作
Like this (it works with other CSVs too, not just the one you specified):
Result:
You can access elements in this data structure like this:
Edit (according to comment): For selecting elements, you can do something like
要添加 Michael Kohl 的答案,如果您想通过以下方式访问元素,
您可以尝试以下代码片段:
To add on to Michael Kohl's answer, if you want to access the elements in the following manner
You can try the following code snippet:
为了两全其美(从一个大文件中快速读取以及本机 Ruby CSV 对象的好处),我的代码后来演变成了这种方法:
它更接近我原来的方法,但只读取一条记录加一行1 包含标题的输入 csv 文件。内联 sed 指令可以解决这个问题——而且整个过程非常即时。这比 last 更好,因为现在我可以从 Ruby 访问所有字段,并且关联地不再关心列号就像
awk
的情况一样。To get the best of both worlds (very fast reading from a huge file AND the benefits of a native Ruby CSV object) my code had since evolved into this method:
which is closer to my original method, but only reads in one record plus line 1 of the input csv file containing the headers. The inline
sed
instructions take care of that--and the whole thing is noticably instant. This this is better than last because now I can access all the fields from Ruby, and associatively, not caring about column numbers anymore as was the case withawk
.不是 1-liner-ie,但这对我来说更清楚。
Not as 1-liner-ie but this was more clear to me.
虽然这不是原始问题的 100% 原生 Ruby 解决方案,但如果其他人在这里绊倒并想知道我现在最终使用了什么 awk 调用,那就是:
其中 $stock 是我之前分配给公司股票的变量符号(想要的关键字段)。
如果出现以下情况,可以通过返回 0.0 方便地解决问题:未找到代码或文件或字段 #9/为空,或者如果值无法转换为浮点数。因此,在我的例子中,任何尾随的“%”都会被很好地截断。
请注意,此时可以轻松地在 awk 中添加更多过滤器,以使 IO.readlines 从较小的结果 CSV 返回 1 维输出行数组,例如
bash 中的输出哪些行的 DivYld(第 9 列)超过 2.01,价格(第 2 列)超过 99.99。 (不幸的是,我没有使用标题行来确定字段编号,这正是我最终希望得到一些可搜索的关联 Ruby 数组的地方。)
While this isn't a 100% native Ruby solution to the original question, should others stumble here and wonder what awk call I wound up using for now, here it is:
where $stock is the variable I had previously assigned to a company's ticker symbol (the wannabe key field).
Conveniently survives problems by returning 0.0 if: ticker or file or field #9 not found/empty, or if value cannot be typecasted to a float. So any trailing '%' in my case gets nicely truncated.
Note that at this point one could easily add more filters within awk to have IO.readlines return a 1-dim array of output lines from the smaller resulting CSV, eg.
outputs in bash which lines have a DivYld (col 9) over 2.01 and price (col 2) over 99.99. (Unfortunately I'm not using the header row to to determine field numbers, which is where I was ultimately hoping for some searchable associative Ruby array.)