PyTables批量获取和更新

发布于 2024-10-18 08:28:33 字数 943 浏览 6 评论 0原文

我有使用 PyTables 创建的 HDF5 文件形式的每日股票数据。我想获取一组行，将其作为数组处理，然后使用 PyTables 将其写回磁盘（更新行）。我想不出一种干净利落的方法。您能否让我知道实现此目标的最佳方法是什么？

我的数据：

Symbol, date, price, var1, var2
abcd, 1, 2.5, 12, 12.5
abcd, 2, 2.6, 11, 10.2
abcd, 3, 2.45, 11, 10.3
defg, 1,12.34, 19.1, 18.1
defg, 2, 11.90, 19.5, 18.2
defg, 3, 11.75, 21, 20.9
defg, 4, 11.74, 22.2, 21.4

我想将与每个符号对应的行作为数组读取，进行一些处理并更新字段 var1 和 var2。我提前知道所有符号，这样我就可以循环遍历它们。我尝试了这样的操作：

rows_array = [row.fetch_all_fields() for row in table.where('Symbol == "abcd"')]

我想将 rows_array 传递给另一个函数，该函数将计算 var1 和 var2 的值并为每个记录更新它。请注意，var1、var2 就像移动平均值，因此我无法在迭代器内计算它们，因此需要将整个行集作为数组。

在使用 rows_array 计算出所需的内容后，我不确定如何将其写回数据，即用新的计算值更新行。当更新整个表时，我使用这个：

 table.cols.var1[:] = calc_something(rows_array)

但是，当我只想更新表的一部分时，我不是最好的方法。我想我可以重新运行“where”条件，然后根据我的计算更新每一行，但这似乎是浪费时间重新扫描表。

感谢您的建议...

谢谢， -e

原文

I have daily stock data as an HDF5 file created using PyTables. I would like to get a group of rows, process it as an array and then write it back to disk (update rows) using PyTables. I couldn't figure out a way to do this cleanly. Could you please let me know what will be the best way to accomplish this?

My data:

Symbol, date, price, var1, var2
abcd, 1, 2.5, 12, 12.5
abcd, 2, 2.6, 11, 10.2
abcd, 3, 2.45, 11, 10.3
defg, 1,12.34, 19.1, 18.1
defg, 2, 11.90, 19.5, 18.2
defg, 3, 11.75, 21, 20.9
defg, 4, 11.74, 22.2, 21.4

I would like to read the rows that correspond to each symbol as an array, do some processing and update the fields var1 and var2. I know all the symbols in advance so I can loop through them. I tried something like this:

rows_array = [row.fetch_all_fields() for row in table.where('Symbol == "abcd"')]

I would like to pass rows_array to another function which will compute the values for var1 and var2 and update it for each record. Please note that var1, var2 are like moving averages so I will not be able to calculate them inside an iterator and hence the need for the entire set of rows to be an array.

After I calculate whatever I need using rows_array, I am not sure how to write it back to the data i.e., update the rows with the new calculated values. When updating the entire table , I use this:

 table.cols.var1[:] = calc_something(rows_array)

However, when I want to update only a portion of the table, I am not the best way to do it. I guess I can re-run the 'where' condition and then update each row based on my calcs but that's seems like a waste of time rescanning the table.

Your suggestions are appreciated...

Thanks,
-e

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

相对绾红妆 2024-10-25 08:28:33

如果我理解得很好，下一步应该做你想做的事：

condition = 'Symbol == "abcd"'
indices = table.getWhereList(condition)  # get indices
rows_array = table[indices]  # get values
new_rows = compute(rows_array)   # compute new values
table[indices] = new_rows  # update the indices with new values

希望这有帮助

If I understand well, the next should do what you want:

condition = 'Symbol == "abcd"'
indices = table.getWhereList(condition)  # get indices
rows_array = table[indices]  # get values
new_rows = compute(rows_array)   # compute new values
table[indices] = new_rows  # update the indices with new values

Hope this helps

回复收藏 0 原文

~没有更多了~