根据pandas中的列计算日期时间之间的差异
我有一个 pandas DataFrame,其中包含数据库日志,用于对数据库进行更新。我想通过使用登录条目的日期时间到下次登录之前的最后一个条目的时间来找到会话长度的替代值。
log_id staff_id log_date action_name
559932 1 2019-01-05 06:32:36.960 Login
559933 1 2019-01-05 06:33:14.723 Update
559935 1 2019-01-05 06:34:49.770 Insert
559937 1 2019-01-05 06:40:36.503 Update
559938 1 2019-01-05 06:41:11.443 Update
559957 1 2019-01-05 07:24:12.190 Login
559958 1 2019-01-05 07:24:25.673 Update
559960 1 2019-01-05 07:25:01.673 Update
559963 1 2019-01-05 07:25:49.880 Update
559964 1 2019-01-05 07:25:49.897 Accepted
559966 1 2019-01-05 07:27:42.860 Insert
559967 1 2019-01-05 07:27:42.860 Insert
559968 1 2019-01-05 07:27:42.903 Admitted
559969 1 2019-01-05 07:30:23.173 Insert
559970 1 2019-01-05 07:30:37.643 Insert
559971 1 2019-01-05 07:31:18.640 Update
559972 1 2019-01-05 07:31:31.033 Update
570170 2 2019-01-25 22:52:10.160 Login
理想情况下,能够对“staff_id”进行分组,并从下一次登录的移位-1的日期时间条目中减去登录条目的日期时间。但我似乎无法通过 groupby
diff
函数得到我正在寻找的东西。任何想法将不胜感激。
I have a pandas DataFrame with database logs for updates made to a database. I'd like to find a surrogate for session length by using the datetime of a login entry to the time of the last entry before the next login.
log_id staff_id log_date action_name
559932 1 2019-01-05 06:32:36.960 Login
559933 1 2019-01-05 06:33:14.723 Update
559935 1 2019-01-05 06:34:49.770 Insert
559937 1 2019-01-05 06:40:36.503 Update
559938 1 2019-01-05 06:41:11.443 Update
559957 1 2019-01-05 07:24:12.190 Login
559958 1 2019-01-05 07:24:25.673 Update
559960 1 2019-01-05 07:25:01.673 Update
559963 1 2019-01-05 07:25:49.880 Update
559964 1 2019-01-05 07:25:49.897 Accepted
559966 1 2019-01-05 07:27:42.860 Insert
559967 1 2019-01-05 07:27:42.860 Insert
559968 1 2019-01-05 07:27:42.903 Admitted
559969 1 2019-01-05 07:30:23.173 Insert
559970 1 2019-01-05 07:30:37.643 Insert
559971 1 2019-01-05 07:31:18.640 Update
559972 1 2019-01-05 07:31:31.033 Update
570170 2 2019-01-25 22:52:10.160 Login
Ideally would be able to groupby
"staff_id" and subtract the datetime of a login entry from the datetime entry of a shifted -1 from the next login. But I can't seem to get what I'm looking for with the groupby
diff
function. Any thoughts would be appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
IIUC,您想要每次登录条目之间的时间差;因此,在“action_name”列上使用
cumsum
来派生会话ID,然后groupby
它并找出每个组中最后一次和第一次之间的差异应该可以完成这项工作:输出:
IIUC, you want the time difference between each login entries; so a
cumsum
on "action_name" column to derive session IDs andgroupby
it and find the difference between last and first time in each group should do the job:Output: