groupby 显示每人每天的时间 pandas
我试图按 id、时间戳过滤此数据帧,第三列是条目之间的时间差异。我可以让它显示每个 id 所有日期的总和,但无法让它显示每个 id 每天的总和。
import datetime
import pandas as pd
timestamps = [
datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 1
datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 2
datetime.datetime(2018, 1, 1, 11, 0, 0, 0), # person 2
datetime.datetime(2018, 1, 2, 11, 0, 0, 0), # person 2
datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 3
datetime.datetime(2018, 1, 2, 11, 0, 0, 0), # person 3
datetime.datetime(2018, 1, 4, 10, 0, 0, 0), # person 3
datetime.datetime(2018, 1, 5, 12, 0, 0, 0) # person 3
]
df1 = pd.DataFrame({'person': [1, 2, 1, 3, 2, 1, 3, 2], 'timestamp': timestamps})
df1['new'] = df1.groupby('person').timestamp.transform(pd.Series.diff).dropna()
df1.groupby('person')['timestamp','new'].sum()
这只是给我总数,而不是每天。我每天如何组合它们?
I'm trying to filter this dataframe by id, timestamp and my third column is the time diff between entries. I can get it to display the total sum per id for all days but can't make it work to display sum per day per id.
import datetime
import pandas as pd
timestamps = [
datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 1
datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 2
datetime.datetime(2018, 1, 1, 11, 0, 0, 0), # person 2
datetime.datetime(2018, 1, 2, 11, 0, 0, 0), # person 2
datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 3
datetime.datetime(2018, 1, 2, 11, 0, 0, 0), # person 3
datetime.datetime(2018, 1, 4, 10, 0, 0, 0), # person 3
datetime.datetime(2018, 1, 5, 12, 0, 0, 0) # person 3
]
df1 = pd.DataFrame({'person': [1, 2, 1, 3, 2, 1, 3, 2], 'timestamp': timestamps})
df1['new'] = df1.groupby('person').timestamp.transform(pd.Series.diff).dropna()
df1.groupby('person')['timestamp','new'].sum()
This just gives me the total, not per day. How do I combine them per day?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以在分组条件中包含“时间戳”列的日期部分,如下所示:
此外,如果您愿意,您可以使用时间戳中的日期创建一个新列,然后按该列进行分组:
或者,您可以< code>.reset_index() 最后将您的组值包含在新列中。
You can just include the date part of the "timestamp" column in your groupby condition like this:
Also, if you prefer, you could create a new column with the date from the timestamp and then group by that column:
Optionally, you can
.reset_index()
at the end to contain your group values in new columns.