如何计算熊猫数据室中周末或一天休息的天数
我有熊猫的数据框架,带有非连续的日期索引(缺少的是周末和假期)。我想添加列,其中包含天数,直到第二天休息。
这是在 till_day_off 列中使用所需值的代码生成示例数据帧:
import pandas as pd
df = pd.DataFrame(index=pd.date_range(start="2022-06-06", periods=15))
df["day_of_week"] = df.index.dayofweek # adding column with number of day in a week
df = df[(df.day_of_week < 5)] # remove weekends
df = df.drop(index="2022-06-15") # remove Wednesday in second week
df["till_day_off"] = [5,4,3,2,1,2,1,2,1,1] # desired values, end of column is treated as day off
结果数据框:
day_of_week | till_day_off | |
---|---|---|
2022-06-06 | 0 | 5 |
2022-06-07 | 1 | 4 |
2022-06-08 | 2 | 3 |
2022-06-09 | 3 | 2 |
2022-06-10 | 4 | 1 |
2022-06-06-13 | 0 2 2022-06-13 0 | 2 |
2022-06-14 | 1 | 1 |
2022-06-16 | 3 | 2 |
2022-06-17 | 4 | 1 |
2022-06-20 | 0 | 1 |
真实的数据框架超过7_000行,因此显然我试图避免在行上迭代。知道如何解决这个问题吗?
I have pandas dataframe with a non-continuous date index (missing are weekends and holidays). I want to add column which would contain number of days until next day off.
Here is code generating example dataframe with desired values in till_day_off column:
import pandas as pd
df = pd.DataFrame(index=pd.date_range(start="2022-06-06", periods=15))
df["day_of_week"] = df.index.dayofweek # adding column with number of day in a week
df = df[(df.day_of_week < 5)] # remove weekends
df = df.drop(index="2022-06-15") # remove Wednesday in second week
df["till_day_off"] = [5,4,3,2,1,2,1,2,1,1] # desired values, end of column is treated as day off
Resulting dataframe:
day_of_week | till_day_off | |
---|---|---|
2022-06-06 | 0 | 5 |
2022-06-07 | 1 | 4 |
2022-06-08 | 2 | 3 |
2022-06-09 | 3 | 2 |
2022-06-10 | 4 | 1 |
2022-06-13 | 0 | 2 |
2022-06-14 | 1 | 1 |
2022-06-16 | 3 | 2 |
2022-06-17 | 4 | 1 |
2022-06-20 | 0 | 1 |
Real dataframe has over 7_000 rows so obviously I am trying to avoid iteration over rows. Any idea how to tackle the issue?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
假设输入分类(如果不是,则按几天进行分类),您可以使用掩码连续识别并使用它来分组并计算cumcount:
输出:
中间体:
Assuming a sorted input (if not, sort it by days), you can use a mask to identify consecutive days and use it to group them and compute a cumcount:
output:
intermediates:
创建缺失日期的数据框架,然后使用
Merge_asof
与将来的最接近的数据框架匹配,并计算到那天休息的时间。在这里,我认为休假只是缺少日期,但这扩展到了您要使用的明确列表的情况。
Create a DataFrame of the missing dates, then use a
merge_asof
to match with the closest one in the future and calculate the time until that day off.Here I assume days off are just missing dates, but this extends to the case where you have an explicit list of dates you want to use.