根据pandas中的列计算日期时间之间的差异

发布于 2025-01-12 04:40:49 字数 1073 浏览 0 评论 0原文

我有一个 pandas DataFrame,其中包含数据库日志,用于对数据库进行更新。我想通过使用登录条目的日期时间到下次登录之前的最后一个条目的时间来找到会话长度的替代值。

log_id  staff_id    log_date    action_name
559932  1   2019-01-05 06:32:36.960 Login
559933  1   2019-01-05 06:33:14.723 Update
559935  1   2019-01-05 06:34:49.770 Insert
559937  1   2019-01-05 06:40:36.503 Update
559938  1   2019-01-05 06:41:11.443 Update
559957  1   2019-01-05 07:24:12.190 Login
559958  1   2019-01-05 07:24:25.673 Update
559960  1   2019-01-05 07:25:01.673 Update
559963  1   2019-01-05 07:25:49.880 Update
559964  1   2019-01-05 07:25:49.897 Accepted
559966  1   2019-01-05 07:27:42.860 Insert
559967  1   2019-01-05 07:27:42.860 Insert
559968  1   2019-01-05 07:27:42.903 Admitted
559969  1   2019-01-05 07:30:23.173 Insert
559970  1   2019-01-05 07:30:37.643 Insert
559971  1   2019-01-05 07:31:18.640 Update
559972  1   2019-01-05 07:31:31.033 Update
570170  2   2019-01-25 22:52:10.160 Login

理想情况下,能够对“staff_id”进行分组,并从下一次登录的移位-1的日期时间条目中减去登录条目的日期时间。但我似乎无法通过 groupby diff 函数得到我正在寻找的东西。任何想法将不胜感激。

I have a pandas DataFrame with database logs for updates made to a database. I'd like to find a surrogate for session length by using the datetime of a login entry to the time of the last entry before the next login.

log_id  staff_id    log_date    action_name
559932  1   2019-01-05 06:32:36.960 Login
559933  1   2019-01-05 06:33:14.723 Update
559935  1   2019-01-05 06:34:49.770 Insert
559937  1   2019-01-05 06:40:36.503 Update
559938  1   2019-01-05 06:41:11.443 Update
559957  1   2019-01-05 07:24:12.190 Login
559958  1   2019-01-05 07:24:25.673 Update
559960  1   2019-01-05 07:25:01.673 Update
559963  1   2019-01-05 07:25:49.880 Update
559964  1   2019-01-05 07:25:49.897 Accepted
559966  1   2019-01-05 07:27:42.860 Insert
559967  1   2019-01-05 07:27:42.860 Insert
559968  1   2019-01-05 07:27:42.903 Admitted
559969  1   2019-01-05 07:30:23.173 Insert
559970  1   2019-01-05 07:30:37.643 Insert
559971  1   2019-01-05 07:31:18.640 Update
559972  1   2019-01-05 07:31:31.033 Update
570170  2   2019-01-25 22:52:10.160 Login

Ideally would be able to groupby "staff_id" and subtract the datetime of a login entry from the datetime entry of a shifted -1 from the next login. But I can't seem to get what I'm looking for with the groupby diff function. Any thoughts would be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

柠檬 2025-01-19 04:40:49

IIUC,您想要每次登录条目之间的时间差;因此,在“action_name”列上使用cumsum来派生会话ID,然后groupby它并找出每个组中最后一次和第一次之间的差异应该可以完成这项工作:

df['log_date'] = pd.to_datetime(df['log_date'])
df['session_length'] = (df.groupby(['staff_id', df['action_name'].eq('Login').cumsum()])['log_date']
                         .transform(lambda x: x.iat[-1] - x.iat[0])
                         .astype(str).str.extract('days (.*)'))

输出:

    log_id  staff_id                log_date action_name   session_length
0   559932         1 2019-01-05 06:32:36.960       Login  00:08:34.483000
1   559933         1 2019-01-05 06:33:14.723      Update  00:08:34.483000
2   559935         1 2019-01-05 06:34:49.770      Insert  00:08:34.483000
3   559937         1 2019-01-05 06:40:36.503      Update  00:08:34.483000
4   559938         1 2019-01-05 06:41:11.443      Update  00:08:34.483000
5   559957         1 2019-01-05 07:24:12.190       Login  00:07:18.843000
6   559958         1 2019-01-05 07:24:25.673      Update  00:07:18.843000
7   559960         1 2019-01-05 07:25:01.673      Update  00:07:18.843000
8   559963         1 2019-01-05 07:25:49.880      Update  00:07:18.843000
9   559964         1 2019-01-05 07:25:49.897    Accepted  00:07:18.843000
10  559966         1 2019-01-05 07:27:42.860      Insert  00:07:18.843000
11  559967         1 2019-01-05 07:27:42.860      Insert  00:07:18.843000
12  559968         1 2019-01-05 07:27:42.903    Admitted  00:07:18.843000
13  559969         1 2019-01-05 07:30:23.173      Insert  00:07:18.843000
14  559970         1 2019-01-05 07:30:37.643      Insert  00:07:18.843000
15  559971         1 2019-01-05 07:31:18.640      Update  00:07:18.843000
16  559972         1 2019-01-05 07:31:31.033      Update  00:07:18.843000
17  570170         2 2019-01-25 22:52:10.160       Login         00:00:00

IIUC, you want the time difference between each login entries; so a cumsum on "action_name" column to derive session IDs and groupby it and find the difference between last and first time in each group should do the job:

df['log_date'] = pd.to_datetime(df['log_date'])
df['session_length'] = (df.groupby(['staff_id', df['action_name'].eq('Login').cumsum()])['log_date']
                         .transform(lambda x: x.iat[-1] - x.iat[0])
                         .astype(str).str.extract('days (.*)'))

Output:

    log_id  staff_id                log_date action_name   session_length
0   559932         1 2019-01-05 06:32:36.960       Login  00:08:34.483000
1   559933         1 2019-01-05 06:33:14.723      Update  00:08:34.483000
2   559935         1 2019-01-05 06:34:49.770      Insert  00:08:34.483000
3   559937         1 2019-01-05 06:40:36.503      Update  00:08:34.483000
4   559938         1 2019-01-05 06:41:11.443      Update  00:08:34.483000
5   559957         1 2019-01-05 07:24:12.190       Login  00:07:18.843000
6   559958         1 2019-01-05 07:24:25.673      Update  00:07:18.843000
7   559960         1 2019-01-05 07:25:01.673      Update  00:07:18.843000
8   559963         1 2019-01-05 07:25:49.880      Update  00:07:18.843000
9   559964         1 2019-01-05 07:25:49.897    Accepted  00:07:18.843000
10  559966         1 2019-01-05 07:27:42.860      Insert  00:07:18.843000
11  559967         1 2019-01-05 07:27:42.860      Insert  00:07:18.843000
12  559968         1 2019-01-05 07:27:42.903    Admitted  00:07:18.843000
13  559969         1 2019-01-05 07:30:23.173      Insert  00:07:18.843000
14  559970         1 2019-01-05 07:30:37.643      Insert  00:07:18.843000
15  559971         1 2019-01-05 07:31:18.640      Update  00:07:18.843000
16  559972         1 2019-01-05 07:31:31.033      Update  00:07:18.843000
17  570170         2 2019-01-25 22:52:10.160       Login         00:00:00
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文