如何计算熊猫数据室中周末或一天休息的天数

发布于 2025-02-06 06:06:34 字数 1944 浏览 4 评论 0原文

我有熊猫的数据框架,带有非连续的日期索引(缺少的是周末和假期)。我想添加列,其中包含天数,直到第二天休息。

这是在 till_day_off 列中使用所需值的代码生成示例数据帧:

import pandas as pd
​
df = pd.DataFrame(index=pd.date_range(start="2022-06-06", periods=15))
df["day_of_week"] = df.index.dayofweek   # adding column with number of day in a week
df = df[(df.day_of_week < 5)]   # remove weekends
df = df.drop(index="2022-06-15")   # remove Wednesday in second week
df["till_day_off"] = [5,4,3,2,1,2,1,2,1,1] # desired values, end of column is treated as day off

结果数据框:

day_of_weektill_day_off
2022-06-0605
2022-06-0714
2022-06-0823
2022-06-0932
2022-06-1041
2022-06-06-130 2 2022-06-13 02
2022-06-1411
2022-06-1632
2022-06-1741
2022-06-2001

真实的数据框架超过7_000行,因此显然我试图避免在行上迭代。知道如何解决这个问题吗?

I have pandas dataframe with a non-continuous date index (missing are weekends and holidays). I want to add column which would contain number of days until next day off.

Here is code generating example dataframe with desired values in till_day_off column:

import pandas as pd
​
df = pd.DataFrame(index=pd.date_range(start="2022-06-06", periods=15))
df["day_of_week"] = df.index.dayofweek   # adding column with number of day in a week
df = df[(df.day_of_week < 5)]   # remove weekends
df = df.drop(index="2022-06-15")   # remove Wednesday in second week
df["till_day_off"] = [5,4,3,2,1,2,1,2,1,1] # desired values, end of column is treated as day off

Resulting dataframe:

day_of_weektill_day_off
2022-06-0605
2022-06-0714
2022-06-0823
2022-06-0932
2022-06-1041
2022-06-1302
2022-06-1411
2022-06-1632
2022-06-1741
2022-06-2001

Real dataframe has over 7_000 rows so obviously I am trying to avoid iteration over rows. Any idea how to tackle the issue?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

看海 2025-02-13 06:06:36

假设输入分类(如果不是,则按几天进行分类),您可以使用掩码连续识别并使用它来分组并计算cumcount:

mask = (-df.index.to_series().diff(-1)).eq('1d').iloc[::-1]
# reversing the Series to count until (not since) the value

df['till_day_off'] = mask.groupby((~mask).cumsum()).cumcount().add(1)

输出:

            day_of_week  till_day_off
2022-06-06            0             5
2022-06-07            1             4
2022-06-08            2             3
2022-06-09            3             2
2022-06-10            4             1
2022-06-13            0             2
2022-06-14            1             1
2022-06-16            3             2
2022-06-17            4             1
2022-06-20            0             1

中间体:

mask

2022-06-20    False
2022-06-17    False
2022-06-16     True
2022-06-14    False
2022-06-13     True
2022-06-10    False
2022-06-09     True
2022-06-08     True
2022-06-07     True
2022-06-06     True
dtype: bool

(~mask).cumsum()

2022-06-20    1
2022-06-17    2
2022-06-16    2
2022-06-14    3
2022-06-13    3
2022-06-10    4
2022-06-09    4
2022-06-08    4
2022-06-07    4
2022-06-06    4
dtype: int64

Assuming a sorted input (if not, sort it by days), you can use a mask to identify consecutive days and use it to group them and compute a cumcount:

mask = (-df.index.to_series().diff(-1)).eq('1d').iloc[::-1]
# reversing the Series to count until (not since) the value

df['till_day_off'] = mask.groupby((~mask).cumsum()).cumcount().add(1)

output:

            day_of_week  till_day_off
2022-06-06            0             5
2022-06-07            1             4
2022-06-08            2             3
2022-06-09            3             2
2022-06-10            4             1
2022-06-13            0             2
2022-06-14            1             1
2022-06-16            3             2
2022-06-17            4             1
2022-06-20            0             1

intermediates:

mask

2022-06-20    False
2022-06-17    False
2022-06-16     True
2022-06-14    False
2022-06-13     True
2022-06-10    False
2022-06-09     True
2022-06-08     True
2022-06-07     True
2022-06-06     True
dtype: bool

(~mask).cumsum()

2022-06-20    1
2022-06-17    2
2022-06-16    2
2022-06-14    3
2022-06-13    3
2022-06-10    4
2022-06-09    4
2022-06-08    4
2022-06-07    4
2022-06-06    4
dtype: int64
淡写薰衣草的香 2025-02-13 06:06:36

创建缺失日期的数据框架,然后使用Merge_asof与将来的最接近的数据框架匹配,并计算到那天休息的时间。

在这里,我认为休假只是缺少日期,但这扩展到了您要使用的明确列表的情况。

import pandas as pd

# DataFrame of missing dates, e.g. days off.
df1 = pd.DataFrame({'day_off': pd.date_range(df.index.min(), df.index.max()+pd.offsets.DateOffset(days=1), freq='D')})
df1 = df1[~df1['day_off'].isin(df.index)]

df = pd.merge_asof(df, df1, left_index=True, right_on='day_off', direction='forward')
df['till_day_off'] = (df['day_off'] - df.index).dt.days

print(df)

            day_of_week    day_off  till_day_off
2022-06-06            0 2022-06-11             5
2022-06-07            1 2022-06-11             4
2022-06-08            2 2022-06-11             3
2022-06-09            3 2022-06-11             2
2022-06-10            4 2022-06-11             1
2022-06-13            0 2022-06-15             2
2022-06-14            1 2022-06-15             1
2022-06-16            3 2022-06-18             2
2022-06-17            4 2022-06-18             1
2022-06-20            0 2022-06-21             1

Create a DataFrame of the missing dates, then use a merge_asof to match with the closest one in the future and calculate the time until that day off.

Here I assume days off are just missing dates, but this extends to the case where you have an explicit list of dates you want to use.

import pandas as pd

# DataFrame of missing dates, e.g. days off.
df1 = pd.DataFrame({'day_off': pd.date_range(df.index.min(), df.index.max()+pd.offsets.DateOffset(days=1), freq='D')})
df1 = df1[~df1['day_off'].isin(df.index)]

df = pd.merge_asof(df, df1, left_index=True, right_on='day_off', direction='forward')
df['till_day_off'] = (df['day_off'] - df.index).dt.days

print(df)

            day_of_week    day_off  till_day_off
2022-06-06            0 2022-06-11             5
2022-06-07            1 2022-06-11             4
2022-06-08            2 2022-06-11             3
2022-06-09            3 2022-06-11             2
2022-06-10            4 2022-06-11             1
2022-06-13            0 2022-06-15             2
2022-06-14            1 2022-06-15             1
2022-06-16            3 2022-06-18             2
2022-06-17            4 2022-06-18             1
2022-06-20            0 2022-06-21             1
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文