如何在python中找到连续的时间戳并计算总和

发布于 2025-01-14 18:44:40 字数 1493 浏览 1 评论 0原文

我正在查找每个客户在 24 小时内的销售额总和。

例如,

id     timestamp            sales
123   2022-10-01 12:50:55   11
124   2022-10-01 22:50:55   11
123   2022-10-01 13:50:55   11
123   2022-10-02 12:50:55   11
123   2022-10-02 13:50:55   11

那么 对于``id = 123``,我们选择

1. 
id     timestamp            sales
123   2022-10-01 12:50:55   11
123   2022-10-01 13:50:55   11
123   2022-10-02 12:50:55   11
Sum = 11+11+11 = 33
2. 
id     timestamp            sales
123   2022-10-01 13:50:55   11
123   2022-10-02 12:50:55   11
123   2022-10-02 13:50:55   11
Sum = 11+11+11 = 33
3. 
id     timestamp            sales
123   2022-10-02 12:50:55   11
123   2022-10-02 13:50:55   11
Sum = 11+11 = 22
4.
id     timestamp            sales
123   2022-10-02 13:50:55   11
Sum = 11

We get the result for id = 123 is

id     timestamp            sales   sum
123   2022-10-01 12:50:55   11      33
123   2022-10-01 13:50:55   11      33
123   2022-10-02 12:50:55   11      22
123   2022-10-02 13:50:55   11      11
For id = 124, we get 
id     timestamp            sales   sum
124   2022-10-01 22:50:55   11      11

I Know cross join可以处理这个问题,但这种方法对于大数据集来说很耗时。

有更好的方法来实现这一目标吗?

谢谢

I am looking for the sum of sales made by each customer within 24 hours.

For example,

id     timestamp            sales
123   2022-10-01 12:50:55   11
124   2022-10-01 22:50:55   11
123   2022-10-01 13:50:55   11
123   2022-10-02 12:50:55   11
123   2022-10-02 13:50:55   11

Then
For ```id = 123``, we select

1. 
id     timestamp            sales
123   2022-10-01 12:50:55   11
123   2022-10-01 13:50:55   11
123   2022-10-02 12:50:55   11
Sum = 11+11+11 = 33
2. 
id     timestamp            sales
123   2022-10-01 13:50:55   11
123   2022-10-02 12:50:55   11
123   2022-10-02 13:50:55   11
Sum = 11+11+11 = 33
3. 
id     timestamp            sales
123   2022-10-02 12:50:55   11
123   2022-10-02 13:50:55   11
Sum = 11+11 = 22
4.
id     timestamp            sales
123   2022-10-02 13:50:55   11
Sum = 11

We get the result for id = 123 is

id     timestamp            sales   sum
123   2022-10-01 12:50:55   11      33
123   2022-10-01 13:50:55   11      33
123   2022-10-02 12:50:55   11      22
123   2022-10-02 13:50:55   11      11
For id = 124, we get 
id     timestamp            sales   sum
124   2022-10-01 22:50:55   11      11

I know cross join could deal with this problem, but this method is time-costing for big datasets.

Is there a better way to achieve this?

Thank you

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

携余温的黄昏 2025-01-21 18:44:40

您可以使用 groupby 为其提供 pd.Groupby(freq="D")id 列:

df["timestamp"] = pd.to_datetime(df["timestamp"])
df.set_index("timestamp", inplace=True)
newDf = df.groupby([pd.Grouper(freq="D"), "id"]).sum().reset_index()
newDf

newDf 将是:

timestampidsales
02022-10-01 00:00:0012322
12022-10-01 00:00:0012411
22022-10-02 00:00:0012322

因此,通过调用 newDf [newDf["id"] == 124] 您将得到:

timestampidsales
12022-10-01 00:00:0012411

请注意,时间不会完全像您提到的那样,因为 00:00 通常被认为是一天的开始,而不是任何其他时间。

You can use groupby giving it pd.Groupby(freq="D") and the id column:

df["timestamp"] = pd.to_datetime(df["timestamp"])
df.set_index("timestamp", inplace=True)
newDf = df.groupby([pd.Grouper(freq="D"), "id"]).sum().reset_index()
newDf

The newDf would be:

timestampidsales
02022-10-01 00:00:0012322
12022-10-01 00:00:0012411
22022-10-02 00:00:0012322

Therfore, by calling newDf[newDf["id"] == 124] you will get:

timestampidsales
12022-10-01 00:00:0012411

Note that, the time would not be exactly as you mentioned, since 00:00 usually be considered the beginning of the day, and not any other time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文