根据多个参数查找多个事件之间的时间差(以天为单位)

发布于 2025-01-15 03:20:20 字数 8930 浏览 0 评论 0原文

当电子邮件发送到特定客户时,我有 2 个 DataFrame,其中包含 Date - 特定 Territory 上的 Account:External ID df 名称是 emails

帐户:外部 ID - 只是一个客户 ID(客户的编码名称)。两个 DataFrame 中可能存在或不存在相同的 ID。

emails


         Territory  Account: External ID        Date   type
5           1                      1 2021-01-25  email
29          1                      1 2021-05-29  email
15          1                      2 2021-03-15  email
18          1                      2 2021-04-02  email
24          1                      2 2021-04-29  email
0           1                      3 2021-01-01  email
2           1                      3 2021-01-10  email
6           1                      3 2021-01-25  email
11          1                      3 2021-02-19  email
13          1                      4 2021-03-03  email
21          1                      4 2021-04-12  email
4           1                      5 2021-01-22  email
26          1                      5 2021-05-10  email
8           2                      2 2021-01-30  email
9           2                      2 2021-02-07  email
10          2                      2 2021-02-10  email
20          2                      3 2021-04-08  email
7           2                      4 2021-01-28  email
17          2                      5 2021-03-27  email
28          2                      5 2021-05-22  email
25          3                      1 2021-05-07  email
27          3                      1 2021-05-12  email
22          3                      2 2021-04-14  email
23          3                      4 2021-04-23  email
3           3                      5 2021-01-15  email
1           4                      1 2021-01-05  email
12          4                      3 2021-02-23  email
16          4                      4 2021-03-20  email
14          4                      5 2021-03-06  email
19          4                      5 2021-04-07  email

第二个 df 具有完全相同的字段,但包含有关与客户交互的信息 对于第一个 DataFrame 中的每一行,我需要找到电子邮件

          Territory  Account: External ID        Date   type
11          1                      2 2021-06-01  visit
0           2                      2 2021-01-02  visit
8           3                      1 2021-04-11  visit
10          3                      1 2021-05-17  visit
5           3                      3 2021-02-26  visit
1           3                      4 2021-01-14  visit
4           3                      4 2021-02-20  visit
2           4                      1 2021-01-30  visit
3           4                      2 2021-02-07  visit
7           4                      2 2021-03-31  visit
9           4                      2 2021-04-28  visit
6           4                      4 2021-03-12  visit

发送和交互之间的天数。 我想分别查看交互之前或之后的天数。

应该有更优雅的方式

我设法编写了完美运行的代码,但我相信我的代码


#concant and sort 2 DataFrame
emails_visits=pd.concat([emails,visits],ignore_index=True)
emails_visits = emails_visits.sort_values(by=["Territory", "Account: External ID ", "Date"])

#Intermidiate step 1
emails_visits["date_after"] = emails_visits["Date"].where(emails_visits["type"]=="visit")
emails_visits["date_before"] = emails_visits["Date"].where(emails_visits["type"]=="visit")

#Fill columns with date
emails_visits["date_after"] = emails_visits.groupby(["Territory", "Account: External ID"])["date_after"].ffill()
emails_visits["date_before"] = emails_visits.groupby(["Territory", "Account: External ID"])["date_before"].bfill()

#calcate Difference in number of days
emails_visits["date_after"]=emails_visits.groupby(["Territory","Account: External ID ",])["date_after"].bfill()
emails_visits["date_before"]=emails_visits.groupby(["Territory","Account: External ID ",])["date_before"].ffill()

emails_visits["days_before_visit"]=emails_visits["date_before"]-emails_visits["Date"]
emails_visits["days_after_visit"]=emails_visits["date_after"]-emails_visits["Date"]

:输出

          Territory  Account: External ID        Date   type date_after date_before days_before_visit days_after_visit
0           1                      1 2021-01-25  email        NaT         NaT               NaT              NaT
1           1                      1 2021-05-29  email        NaT         NaT               NaT              NaT
2           1                      2 2021-03-15  email 2021-06-01         NaT               NaT          78 days
3           1                      2 2021-04-02  email 2021-06-01         NaT               NaT          60 days
4           1                      2 2021-04-29  email 2021-06-01         NaT               NaT          33 days
30          1                      2 2021-06-01  visit 2021-06-01  2021-06-01            0 days           0 days
5           1                      3 2021-01-01  email        NaT         NaT               NaT              NaT
6           1                      3 2021-01-10  email        NaT         NaT               NaT              NaT
7           1                      3 2021-01-25  email        NaT         NaT               NaT              NaT
8           1                      3 2021-02-19  email        NaT         NaT               NaT              NaT
9           1                      4 2021-03-03  email        NaT         NaT               NaT              NaT
10          1                      4 2021-04-12  email        NaT         NaT               NaT              NaT
11          1                      5 2021-01-22  email        NaT         NaT               NaT              NaT
12          1                      5 2021-05-10  email        NaT         NaT               NaT              NaT
31          2                      2 2021-01-02  visit 2021-01-02  2021-01-02            0 days           0 days
13          2                      2 2021-01-30  email        NaT  2021-01-02          -28 days              NaT
14          2                      2 2021-02-07  email        NaT  2021-01-02          -36 days              NaT
15          2                      2 2021-02-10  email        NaT  2021-01-02          -39 days              NaT
16          2                      3 2021-04-08  email        NaT         NaT               NaT              NaT
17          2                      4 2021-01-28  email        NaT         NaT               NaT              NaT
18          2                      5 2021-03-27  email        NaT         NaT               NaT              NaT
19          2                      5 2021-05-22  email        NaT         NaT               NaT              NaT
32          3                      1 2021-04-11  visit 2021-04-11  2021-04-11            0 days           0 days
20          3                      1 2021-05-07  email 2021-05-17  2021-04-11          -26 days          10 days
21          3                      1 2021-05-12  email 2021-05-17  2021-04-11          -31 days           5 days
33          3                      1 2021-05-17  visit 2021-05-17  2021-05-17            0 days           0 days
22          3                      2 2021-04-14  email        NaT         NaT               NaT              NaT
34          3                      3 2021-02-26  visit 2021-02-26  2021-02-26            0 days           0 days
35          3                      4 2021-01-14  visit 2021-01-14  2021-01-14            0 days           0 days
36          3                      4 2021-02-20  visit 2021-02-20  2021-02-20            0 days           0 days
23          3                      4 2021-04-23  email        NaT  2021-02-20          -62 days              NaT
24          3                      5 2021-01-15  email        NaT         NaT               NaT              NaT
25          4                      1 2021-01-05  email 2021-01-30         NaT               NaT          25 days
37          4                      1 2021-01-30  visit 2021-01-30  2021-01-30            0 days           0 days
38          4                      2 2021-02-07  visit 2021-02-07  2021-02-07            0 days           0 days
39          4                      2 2021-03-31  visit 2021-03-31  2021-03-31            0 days           0 days
40          4                      2 2021-04-28  visit 2021-04-28  2021-04-28            0 days           0 days
26          4                      3 2021-02-23  email        NaT         NaT               NaT              NaT
41          4                      4 2021-03-12  visit 2021-03-12  2021-03-12            0 days           0 days
27          4                      4 2021-03-20  email        NaT  2021-03-12           -8 days              NaT
28          4                      5 2021-03-06  email        NaT         NaT               NaT              NaT
29          4                      5 2021-04-07  email        NaT         NaT               NaT              NaT

这正是我期望的输出,但我希望它可以做得更简单。

I have 2 DataFrames first with Date when email was sent to certain client - Account: External ID on certain Territory
df name is emails

Account: External ID - is just a customer ID (coded name of a client). Same ID could be present or not in both DataFrames.

emails


         Territory  Account: External ID        Date   type
5           1                      1 2021-01-25  email
29          1                      1 2021-05-29  email
15          1                      2 2021-03-15  email
18          1                      2 2021-04-02  email
24          1                      2 2021-04-29  email
0           1                      3 2021-01-01  email
2           1                      3 2021-01-10  email
6           1                      3 2021-01-25  email
11          1                      3 2021-02-19  email
13          1                      4 2021-03-03  email
21          1                      4 2021-04-12  email
4           1                      5 2021-01-22  email
26          1                      5 2021-05-10  email
8           2                      2 2021-01-30  email
9           2                      2 2021-02-07  email
10          2                      2 2021-02-10  email
20          2                      3 2021-04-08  email
7           2                      4 2021-01-28  email
17          2                      5 2021-03-27  email
28          2                      5 2021-05-22  email
25          3                      1 2021-05-07  email
27          3                      1 2021-05-12  email
22          3                      2 2021-04-14  email
23          3                      4 2021-04-23  email
3           3                      5 2021-01-15  email
1           4                      1 2021-01-05  email
12          4                      3 2021-02-23  email
16          4                      4 2021-03-20  email
14          4                      5 2021-03-06  email
19          4                      5 2021-04-07  email

Second df with exactly same fields but with info about interaction with customer
visits

          Territory  Account: External ID        Date   type
11          1                      2 2021-06-01  visit
0           2                      2 2021-01-02  visit
8           3                      1 2021-04-11  visit
10          3                      1 2021-05-17  visit
5           3                      3 2021-02-26  visit
1           3                      4 2021-01-14  visit
4           3                      4 2021-02-20  visit
2           4                      1 2021-01-30  visit
3           4                      2 2021-02-07  visit
7           4                      2 2021-03-31  visit
9           4                      2 2021-04-28  visit
6           4                      4 2021-03-12  visit

For each row in first DataFrame I need to find number of days between email send and interaction.
I want separately see number of days before or after interaction.

I managed to write code that works perfectly but I believe there should be more elegant way

my code is here:


#concant and sort 2 DataFrame
emails_visits=pd.concat([emails,visits],ignore_index=True)
emails_visits = emails_visits.sort_values(by=["Territory", "Account: External ID ", "Date"])

#Intermidiate step 1
emails_visits["date_after"] = emails_visits["Date"].where(emails_visits["type"]=="visit")
emails_visits["date_before"] = emails_visits["Date"].where(emails_visits["type"]=="visit")

#Fill columns with date
emails_visits["date_after"] = emails_visits.groupby(["Territory", "Account: External ID"])["date_after"].ffill()
emails_visits["date_before"] = emails_visits.groupby(["Territory", "Account: External ID"])["date_before"].bfill()

#calcate Difference in number of days
emails_visits["date_after"]=emails_visits.groupby(["Territory","Account: External ID ",])["date_after"].bfill()
emails_visits["date_before"]=emails_visits.groupby(["Territory","Account: External ID ",])["date_before"].ffill()

emails_visits["days_before_visit"]=emails_visits["date_before"]-emails_visits["Date"]
emails_visits["days_after_visit"]=emails_visits["date_after"]-emails_visits["Date"]

output

          Territory  Account: External ID        Date   type date_after date_before days_before_visit days_after_visit
0           1                      1 2021-01-25  email        NaT         NaT               NaT              NaT
1           1                      1 2021-05-29  email        NaT         NaT               NaT              NaT
2           1                      2 2021-03-15  email 2021-06-01         NaT               NaT          78 days
3           1                      2 2021-04-02  email 2021-06-01         NaT               NaT          60 days
4           1                      2 2021-04-29  email 2021-06-01         NaT               NaT          33 days
30          1                      2 2021-06-01  visit 2021-06-01  2021-06-01            0 days           0 days
5           1                      3 2021-01-01  email        NaT         NaT               NaT              NaT
6           1                      3 2021-01-10  email        NaT         NaT               NaT              NaT
7           1                      3 2021-01-25  email        NaT         NaT               NaT              NaT
8           1                      3 2021-02-19  email        NaT         NaT               NaT              NaT
9           1                      4 2021-03-03  email        NaT         NaT               NaT              NaT
10          1                      4 2021-04-12  email        NaT         NaT               NaT              NaT
11          1                      5 2021-01-22  email        NaT         NaT               NaT              NaT
12          1                      5 2021-05-10  email        NaT         NaT               NaT              NaT
31          2                      2 2021-01-02  visit 2021-01-02  2021-01-02            0 days           0 days
13          2                      2 2021-01-30  email        NaT  2021-01-02          -28 days              NaT
14          2                      2 2021-02-07  email        NaT  2021-01-02          -36 days              NaT
15          2                      2 2021-02-10  email        NaT  2021-01-02          -39 days              NaT
16          2                      3 2021-04-08  email        NaT         NaT               NaT              NaT
17          2                      4 2021-01-28  email        NaT         NaT               NaT              NaT
18          2                      5 2021-03-27  email        NaT         NaT               NaT              NaT
19          2                      5 2021-05-22  email        NaT         NaT               NaT              NaT
32          3                      1 2021-04-11  visit 2021-04-11  2021-04-11            0 days           0 days
20          3                      1 2021-05-07  email 2021-05-17  2021-04-11          -26 days          10 days
21          3                      1 2021-05-12  email 2021-05-17  2021-04-11          -31 days           5 days
33          3                      1 2021-05-17  visit 2021-05-17  2021-05-17            0 days           0 days
22          3                      2 2021-04-14  email        NaT         NaT               NaT              NaT
34          3                      3 2021-02-26  visit 2021-02-26  2021-02-26            0 days           0 days
35          3                      4 2021-01-14  visit 2021-01-14  2021-01-14            0 days           0 days
36          3                      4 2021-02-20  visit 2021-02-20  2021-02-20            0 days           0 days
23          3                      4 2021-04-23  email        NaT  2021-02-20          -62 days              NaT
24          3                      5 2021-01-15  email        NaT         NaT               NaT              NaT
25          4                      1 2021-01-05  email 2021-01-30         NaT               NaT          25 days
37          4                      1 2021-01-30  visit 2021-01-30  2021-01-30            0 days           0 days
38          4                      2 2021-02-07  visit 2021-02-07  2021-02-07            0 days           0 days
39          4                      2 2021-03-31  visit 2021-03-31  2021-03-31            0 days           0 days
40          4                      2 2021-04-28  visit 2021-04-28  2021-04-28            0 days           0 days
26          4                      3 2021-02-23  email        NaT         NaT               NaT              NaT
41          4                      4 2021-03-12  visit 2021-03-12  2021-03-12            0 days           0 days
27          4                      4 2021-03-20  email        NaT  2021-03-12           -8 days              NaT
28          4                      5 2021-03-06  email        NaT         NaT               NaT              NaT
29          4                      5 2021-04-07  email        NaT         NaT               NaT              NaT

this is exactly output I expect but I hope it could be done simpler.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文