熊猫以最有效的方式求和两个日期之间的值?

发布于 2025-01-16 21:25:23 字数 1371 浏览 1 评论 0原文

我有一个数据集显示每周报告的产量,另一个数据集报告某些子生产每小时的产量。我现在想以最有效的方式将所有每小时子生产的总和与每周报告的值进行比较。我怎样才能做到这一点?我想不惜一切代价避免 for 循环,因为我的数据集非常大。

所以我的数据集看起来像这样:

每周报告的数据:

Datetime_text             | Total_Production_A
--------------------------|--------------------
2014-12-08 00:00:00.000   | 8277000
2014-12-15 00:00:00.000   | 8055000
2014-12-22 00:00:00.000   | 7774000

每小时的数据:

Datetime_text             | A_Prod_1  | A_Prod_2  | A_Prod_3  | ......    | A_Prod_N  |
--------------------------|-----------|-----------|-----------|-----------|-----------|
2014-12-06 23:00:00.000   |     454   |       9   |      54   |     104   |       4   | 
2014-12-07 00:00:00.000   |       0   |     NaV   |       0   |      23   |       3   | 
2014-12-07 01:00:00.000   |      54   |       0   |       4   |     NaV   |      20   |

等等。我想要一个新表,其中针对每周报告数据的所有日期计算每周报告数据和每小时报告数据之间的差异。所以像这样:

Datetime_text             | Diff_Production_A
--------------------------|------------------
2014-12-08 00:00:00.000   |                10
2014-12-15 00:00:00.000   |              -100
2014-12-22 00:00:00.000   |              1350

where Diff_Production_A = Total_Production_A - sum(A_Prod_1,A_Prod_2,A_Prod_3,...,A_Prod_N;over all datetimes of a week) 我怎样才能最好地实现这一点?

如果有任何帮助,我们将非常感激:D

Best 菲杜13

I have a dataset which shows production reported every week and another reporting the production every hours over some subproduction. I would now like to compare the sum of all this hourly subproduction with the value reported every week in the most efficient way. How could I achieve this? I would like to avoid a for loop at all cost as my dataset is really large.

So my datasest looks like this:

Weekly reported data:

Datetime_text             | Total_Production_A
--------------------------|--------------------
2014-12-08 00:00:00.000   | 8277000
2014-12-15 00:00:00.000   | 8055000
2014-12-22 00:00:00.000   | 7774000

Hourly data:

Datetime_text             | A_Prod_1  | A_Prod_2  | A_Prod_3  | ......    | A_Prod_N  |
--------------------------|-----------|-----------|-----------|-----------|-----------|
2014-12-06 23:00:00.000   |     454   |       9   |      54   |     104   |       4   | 
2014-12-07 00:00:00.000   |       0   |     NaV   |       0   |      23   |       3   | 
2014-12-07 01:00:00.000   |      54   |       0   |       4   |     NaV   |      20   |

and so on. I would like to a new table where the differnce between the weekly reported data and hourly reported data is calculated for all dates of weekly reported data. So something like this:

Datetime_text             | Diff_Production_A
--------------------------|------------------
2014-12-08 00:00:00.000   |                10
2014-12-15 00:00:00.000   |              -100
2014-12-22 00:00:00.000   |              1350

where Diff_Production_A = Total_Production_A - sum(A_Prod_1,A_Prod_2,A_Prod_3,...,A_Prod_N;over all datetimes of a week) How can I best achieve this?

Any help is this regard would be greatly appriciated :D

Best
fidu13

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

心碎的声音 2025-01-23 21:25:23

将日期时间存储为pd.Timestamp,然后您可以对日期进行各种操作。

对于您的问题,他们是将每小时数据按周(从星期一开始)分组,然后将其与每周数据合并并计算差异:

weekly["Datetime"] = pd.to_datetime(weekly["Datetime_Text"])

hourly["Datetime"] = pd.to_datetime(hourly["Datetime_Text"])
hourly["HourlyTotal"] = hourly.loc[:, "A_Prod_1":"A_Prod_N"].sum(axis=1)

result = (
    hourly.groupby(pd.Grouper(key="Datetime", freq="W-MON"))["HourlyTotal"]
    .sum()
    .to_frame()
    .merge(
        weekly[["Datetime", "Total_Production_A"]],
        how="outer",
        left_index=True,
        right_on="Datetime",
    )
    .assign(Diff=lambda x: x["Total_Production_A"] - x["HourlyTotal"])
)

Store datetime as pd.Timestamp, then you can do all kinds of manipulation on the dates.

For your problem, they is to group the hourly data by week (starting on Mondays), then merge it with the weekly data and calculate the differences:

weekly["Datetime"] = pd.to_datetime(weekly["Datetime_Text"])

hourly["Datetime"] = pd.to_datetime(hourly["Datetime_Text"])
hourly["HourlyTotal"] = hourly.loc[:, "A_Prod_1":"A_Prod_N"].sum(axis=1)

result = (
    hourly.groupby(pd.Grouper(key="Datetime", freq="W-MON"))["HourlyTotal"]
    .sum()
    .to_frame()
    .merge(
        weekly[["Datetime", "Total_Production_A"]],
        how="outer",
        left_index=True,
        right_on="Datetime",
    )
    .assign(Diff=lambda x: x["Total_Production_A"] - x["HourlyTotal"])
)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文