熊猫以最有效的方式求和两个日期之间的值?
我有一个数据集显示每周报告的产量,另一个数据集报告某些子生产每小时的产量。我现在想以最有效的方式将所有每小时子生产的总和与每周报告的值进行比较。我怎样才能做到这一点?我想不惜一切代价避免 for 循环,因为我的数据集非常大。
所以我的数据集看起来像这样:
每周报告的数据:
Datetime_text | Total_Production_A
--------------------------|--------------------
2014-12-08 00:00:00.000 | 8277000
2014-12-15 00:00:00.000 | 8055000
2014-12-22 00:00:00.000 | 7774000
每小时的数据:
Datetime_text | A_Prod_1 | A_Prod_2 | A_Prod_3 | ...... | A_Prod_N |
--------------------------|-----------|-----------|-----------|-----------|-----------|
2014-12-06 23:00:00.000 | 454 | 9 | 54 | 104 | 4 |
2014-12-07 00:00:00.000 | 0 | NaV | 0 | 23 | 3 |
2014-12-07 01:00:00.000 | 54 | 0 | 4 | NaV | 20 |
等等。我想要一个新表,其中针对每周报告数据的所有日期计算每周报告数据和每小时报告数据之间的差异。所以像这样:
Datetime_text | Diff_Production_A
--------------------------|------------------
2014-12-08 00:00:00.000 | 10
2014-12-15 00:00:00.000 | -100
2014-12-22 00:00:00.000 | 1350
where Diff_Production_A = Total_Production_A - sum(A_Prod_1,A_Prod_2,A_Prod_3,...,A_Prod_N;over all datetimes of a week) 我怎样才能最好地实现这一点?
如果有任何帮助,我们将非常感激:D
Best 菲杜13
I have a dataset which shows production reported every week and another reporting the production every hours over some subproduction. I would now like to compare the sum of all this hourly subproduction with the value reported every week in the most efficient way. How could I achieve this? I would like to avoid a for loop at all cost as my dataset is really large.
So my datasest looks like this:
Weekly reported data:
Datetime_text | Total_Production_A
--------------------------|--------------------
2014-12-08 00:00:00.000 | 8277000
2014-12-15 00:00:00.000 | 8055000
2014-12-22 00:00:00.000 | 7774000
Hourly data:
Datetime_text | A_Prod_1 | A_Prod_2 | A_Prod_3 | ...... | A_Prod_N |
--------------------------|-----------|-----------|-----------|-----------|-----------|
2014-12-06 23:00:00.000 | 454 | 9 | 54 | 104 | 4 |
2014-12-07 00:00:00.000 | 0 | NaV | 0 | 23 | 3 |
2014-12-07 01:00:00.000 | 54 | 0 | 4 | NaV | 20 |
and so on. I would like to a new table where the differnce between the weekly reported data and hourly reported data is calculated for all dates of weekly reported data. So something like this:
Datetime_text | Diff_Production_A
--------------------------|------------------
2014-12-08 00:00:00.000 | 10
2014-12-15 00:00:00.000 | -100
2014-12-22 00:00:00.000 | 1350
where Diff_Production_A = Total_Production_A - sum(A_Prod_1,A_Prod_2,A_Prod_3,...,A_Prod_N;over all datetimes of a week) How can I best achieve this?
Any help is this regard would be greatly appriciated :D
Best
fidu13
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
将日期时间存储为
pd.Timestamp
,然后您可以对日期进行各种操作。对于您的问题,他们是将每小时数据按周(从星期一开始)分组,然后将其与每周数据合并并计算差异:
Store datetime as
pd.Timestamp
, then you can do all kinds of manipulation on the dates.For your problem, they is to group the hourly data by week (starting on Mondays), then merge it with the weekly data and calculate the differences: