获取 pandas 中的累积和

发布于 2025-01-14 07:34:28 字数 1351 浏览 0 评论 0原文

上下文

日期时间Campaign_name状态开放
时间2022-03-15 00:00unny_campaign开放
2022-03-15 01:00unny_campaign继续
2022-03-15 02:00unny_campaign继续
2022-03-15 03:00unny_campaign继续
2022-03-15 04:00unny_campaign关闭
2022-03-15 08:00unny_campaign打开
2022-03-15 09:00unny_campaign继续
2022-03-15 10:00unny_campaign关闭

问题

我需要计算从打开到关闭的时间。

我现在的代码

有两种我可以采用的方法。获取每个“关闭”中的打开时间或每个“打开”和“继续”中的累积 open_time。这是我对最后一项的看法。

我现在的代码几乎没问题,它不计算关闭和打开之间的时间,但它忘记了最后一次时间差的总和。

df["Datetime"] = pd.to_datetime(df["Datetime"])
df["time_diff"] = df["Datetime"].diff()
df["time_diff"] = df["time_diff"].astype("timedelta64[m]").fillna(0)
condition = df["Status"] == "Close"
df.loc[condition, "time_diff"] = 0
df["Cumulative time"] = df.groupby(["Campaign_name"])["time_diff"].cumsum()
df = df.drop("time_diff", 1)

Context

DatetimeCampaign_nameStatusOpen_time
2022-03-15 00:00Funny_campaignOpen
2022-03-15 01:00Funny_campaignContinue
2022-03-15 02:00Funny_campaignContinue
2022-03-15 03:00Funny_campaignContinue
2022-03-15 04:00Funny_campaignClose
2022-03-15 08:00Funny_campaignOpen
2022-03-15 09:00Funny_campaignContinue
2022-03-15 10:00Funny_campaignClose

Problem

I need to calculate the time from open to close.

My code right now

There are two approches I could go with. Get the open time in each 'Close' or a cumulative open_time in each 'Open' and 'Continue'. Here is my take on the last one.

My code right now is almost fine, it doesn't count the time between Close and Open but it forgets to sum the last time difference.

df["Datetime"] = pd.to_datetime(df["Datetime"])
df["time_diff"] = df["Datetime"].diff()
df["time_diff"] = df["time_diff"].astype("timedelta64[m]").fillna(0)
condition = df["Status"] == "Close"
df.loc[condition, "time_diff"] = 0
df["Cumulative time"] = df.groupby(["Campaign_name"])["time_diff"].cumsum()
df = df.drop("time_diff", 1)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

零度° 2025-01-21 07:34:28

IIUC,您可以在打开时启动新组并使用:

df['Datetime'] = pd.to_datetime(df['Datetime'])

group = df['Status'].eq('Open').cumsum()

df['Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g-g.iloc[0])
# or, alternative syntax
# df['Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g.diff().cumsum())

输出:

             Datetime   Campaign_name    Status       Open_time
0 2022-03-15 00:00:00  Funny_campaign      Open 0 days 00:00:00
1 2022-03-15 01:00:00  Funny_campaign  Continue 0 days 01:00:00
2 2022-03-15 02:00:00  Funny_campaign  Continue 0 days 02:00:00
3 2022-03-15 03:00:00  Funny_campaign  Continue 0 days 03:00:00
4 2022-03-15 04:00:00  Funny_campaign     Close 0 days 04:00:00
5 2022-03-15 08:00:00  Funny_campaign      Open 0 days 00:00:00
6 2022-03-15 09:00:00  Funny_campaign  Continue 0 days 01:00:00
7 2022-03-15 10:00:00  Funny_campaign     Close 0 days 02:00:00

或仅分配给“关闭”:

df.loc[df['Status'].eq('Close'), 'Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g-g.iloc[0])

输出:

             Datetime   Campaign_name    Status        Open_time
0 2022-03-15 00:00:00  Funny_campaign      Open              NaN
1 2022-03-15 01:00:00  Funny_campaign  Continue              NaN
2 2022-03-15 02:00:00  Funny_campaign  Continue              NaN
3 2022-03-15 03:00:00  Funny_campaign  Continue              NaN
4 2022-03-15 04:00:00  Funny_campaign     Close  0 days 04:00:00
5 2022-03-15 08:00:00  Funny_campaign      Open              NaN
6 2022-03-15 09:00:00  Funny_campaign  Continue              NaN
7 2022-03-15 10:00:00  Funny_campaign     Close  0 days 02:00:00

对于每个组的关闭打开差异:

df.groupby(group)['Datetime'].agg(lambda g: g.iloc[-1]-g.iloc[0])

输出:

Status
1   0 days 04:00:00
2   0 days 02:00:00
Name: Datetime, dtype: timedelta64[ns]

IIUC, you could start new groups on the opens and use:

df['Datetime'] = pd.to_datetime(df['Datetime'])

group = df['Status'].eq('Open').cumsum()

df['Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g-g.iloc[0])
# or, alternative syntax
# df['Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g.diff().cumsum())

Output:

             Datetime   Campaign_name    Status       Open_time
0 2022-03-15 00:00:00  Funny_campaign      Open 0 days 00:00:00
1 2022-03-15 01:00:00  Funny_campaign  Continue 0 days 01:00:00
2 2022-03-15 02:00:00  Funny_campaign  Continue 0 days 02:00:00
3 2022-03-15 03:00:00  Funny_campaign  Continue 0 days 03:00:00
4 2022-03-15 04:00:00  Funny_campaign     Close 0 days 04:00:00
5 2022-03-15 08:00:00  Funny_campaign      Open 0 days 00:00:00
6 2022-03-15 09:00:00  Funny_campaign  Continue 0 days 01:00:00
7 2022-03-15 10:00:00  Funny_campaign     Close 0 days 02:00:00

Or to only assign to "Close":

df.loc[df['Status'].eq('Close'), 'Open_time'] = df.groupby(group)['Datetime'].apply(lambda g: g-g.iloc[0])

Output:

             Datetime   Campaign_name    Status        Open_time
0 2022-03-15 00:00:00  Funny_campaign      Open              NaN
1 2022-03-15 01:00:00  Funny_campaign  Continue              NaN
2 2022-03-15 02:00:00  Funny_campaign  Continue              NaN
3 2022-03-15 03:00:00  Funny_campaign  Continue              NaN
4 2022-03-15 04:00:00  Funny_campaign     Close  0 days 04:00:00
5 2022-03-15 08:00:00  Funny_campaign      Open              NaN
6 2022-03-15 09:00:00  Funny_campaign  Continue              NaN
7 2022-03-15 10:00:00  Funny_campaign     Close  0 days 02:00:00

And for just the difference close-open for each group:

df.groupby(group)['Datetime'].agg(lambda g: g.iloc[-1]-g.iloc[0])

Output:

Status
1   0 days 04:00:00
2   0 days 02:00:00
Name: Datetime, dtype: timedelta64[ns]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文