如何在python中找到连续的时间戳并计算总和
我正在查找每个客户在 24 小时内的销售额总和。
例如,
id timestamp sales
123 2022-10-01 12:50:55 11
124 2022-10-01 22:50:55 11
123 2022-10-01 13:50:55 11
123 2022-10-02 12:50:55 11
123 2022-10-02 13:50:55 11
那么 对于``id = 123``,我们选择
1.
id timestamp sales
123 2022-10-01 12:50:55 11
123 2022-10-01 13:50:55 11
123 2022-10-02 12:50:55 11
Sum = 11+11+11 = 33
2.
id timestamp sales
123 2022-10-01 13:50:55 11
123 2022-10-02 12:50:55 11
123 2022-10-02 13:50:55 11
Sum = 11+11+11 = 33
3.
id timestamp sales
123 2022-10-02 12:50:55 11
123 2022-10-02 13:50:55 11
Sum = 11+11 = 22
4.
id timestamp sales
123 2022-10-02 13:50:55 11
Sum = 11
We get the result for id = 123
is
id timestamp sales sum
123 2022-10-01 12:50:55 11 33
123 2022-10-01 13:50:55 11 33
123 2022-10-02 12:50:55 11 22
123 2022-10-02 13:50:55 11 11
For id = 124, we get
id timestamp sales sum
124 2022-10-01 22:50:55 11 11
I Know cross join可以处理这个问题,但这种方法对于大数据集来说很耗时。
有更好的方法来实现这一目标吗?
谢谢
I am looking for the sum of sales made by each customer within 24 hours.
For example,
id timestamp sales
123 2022-10-01 12:50:55 11
124 2022-10-01 22:50:55 11
123 2022-10-01 13:50:55 11
123 2022-10-02 12:50:55 11
123 2022-10-02 13:50:55 11
Then
For ```id = 123``, we select
1.
id timestamp sales
123 2022-10-01 12:50:55 11
123 2022-10-01 13:50:55 11
123 2022-10-02 12:50:55 11
Sum = 11+11+11 = 33
2.
id timestamp sales
123 2022-10-01 13:50:55 11
123 2022-10-02 12:50:55 11
123 2022-10-02 13:50:55 11
Sum = 11+11+11 = 33
3.
id timestamp sales
123 2022-10-02 12:50:55 11
123 2022-10-02 13:50:55 11
Sum = 11+11 = 22
4.
id timestamp sales
123 2022-10-02 13:50:55 11
Sum = 11
We get the result for id = 123
is
id timestamp sales sum
123 2022-10-01 12:50:55 11 33
123 2022-10-01 13:50:55 11 33
123 2022-10-02 12:50:55 11 22
123 2022-10-02 13:50:55 11 11
For id = 124, we get
id timestamp sales sum
124 2022-10-01 22:50:55 11 11
I know cross join could deal with this problem, but this method is time-costing for big datasets.
Is there a better way to achieve this?
Thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用
groupby
为其提供pd.Groupby(freq="D")
和id
列:newDf
将是:因此,通过调用
newDf [newDf["id"] == 124]
您将得到:请注意,时间不会完全像您提到的那样,因为 00:00 通常被认为是一天的开始,而不是任何其他时间。
You can use
groupby
giving itpd.Groupby(freq="D")
and theid
column:The
newDf
would be:Therfore, by calling
newDf[newDf["id"] == 124]
you will get:Note that, the time would not be exactly as you mentioned, since 00:00 usually be considered the beginning of the day, and not any other time.