pandas 根据所选的一周中的几天进行分组

发布于 2025-01-16 17:04:02 字数 869 浏览 1 评论 0原文

我有这个数据框：

rng = pd.date_range(start='2018-01-01', end='2018-01-21')
rnd_values = np.random.rand(len(rng))+3

df = pd.DataFrame({'time':rng.to_list(),'value':rnd_values})

假设我想根据一周中的某一天对其进行分组并计算平均值：

df['span'] = np.where((df['time'].dt.day_of_week <= 2 , 'Th-Sn', 'Mn-Wd')
df['wkno'] = df['time'].dt.isocalendar().week.shift(fill_value=0) 
df.groupby(['wkno','span']).mean()

但是，我想让这个过程更加通用。

假设我定义第二天是一周：

days=['Monday','Thursday']

是否有任何选项允许我使用“天”来完成我所做的事情。我想我必须计算“星期一”和“星期四”之间的天数，然后我应该使用该数字。当

days=['Monday','Thursday','Friday']

我考虑将字典设置为：

days={'Monday':0,'Thursday':3,'Friday':4}

那么

idays = list(days.values())[:]

我现在如何在 np.where 中使用 idays 呢？确实我有三个间隔。

谢谢

原文

I have this dataframe:

rng = pd.date_range(start='2018-01-01', end='2018-01-21')
rnd_values = np.random.rand(len(rng))+3

df = pd.DataFrame({'time':rng.to_list(),'value':rnd_values})

let's say that I want to group it according to the day of the week and compute the mean:

df['span'] = np.where((df['time'].dt.day_of_week <= 2 , 'Th-Sn', 'Mn-Wd')
df['wkno'] = df['time'].dt.isocalendar().week.shift(fill_value=0) 
df.groupby(['wkno','span']).mean()

However, I would like to make this procedure more general.

Let's say that I define the following day is the week:

days=['Monday','Thursday']

Is there any option that allows me to do what I have done by using "days". I imagine that I have to compute the number of day between 'Monday','Thursday' and then I should use that number. What about the case when

days=['Monday','Thursday','Friday']

I was thinking to set-up a dictionary as:

days={'Monday':0,'Thursday':3,'Friday':4}

then

idays = list(days.values())[:]

How can I use now idays inside np.where? Indeed I have three interval.

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

单调的奢华 2025-01-23 17:04:02

如果您想使用多个阈值，则需要 np.searchsorted ，结果函数将类似于

def groupby_daysspan_week(dfc,days):
    df = dfc.copy()
    day_to_dayofweek = {'Monday':0,'Tuesday':1,'Wednesday':2,
                        'Thursday':3,'Friday':4,'Saturday':5,'Sunday':6}
    short_dict = {0:'Mn',1:'Tu',2:'Wd',3:'Th',4:'Fr',5:'St',6:'Sn'}
    day_split = [day_to_dayofweek[d] for d in days]
    df['wkno'] = df['time'].dt.isocalendar().week
    df['dow'] = df['time'].dt.day_of_week
    df['span'] = np.searchsorted(day_split,df['dow'],side='right')
    span_name_dict = {i+1:short_dict[day_split[i]]+'-'+short_dict[(day_split+[6])[i+1]] 
                      for i in range(len(day_split))}
    df_agg = df.groupby(['wkno','span'])['value'].mean()
    df_agg = df_agg.rename(index=span_name_dict,level=1)
    return df_agg

If you want to use more than one threshold you need np.searchsorted the resulting function would look something like

def groupby_daysspan_week(dfc,days):
    df = dfc.copy()
    day_to_dayofweek = {'Monday':0,'Tuesday':1,'Wednesday':2,
                        'Thursday':3,'Friday':4,'Saturday':5,'Sunday':6}
    short_dict = {0:'Mn',1:'Tu',2:'Wd',3:'Th',4:'Fr',5:'St',6:'Sn'}
    day_split = [day_to_dayofweek[d] for d in days]
    df['wkno'] = df['time'].dt.isocalendar().week
    df['dow'] = df['time'].dt.day_of_week
    df['span'] = np.searchsorted(day_split,df['dow'],side='right')
    span_name_dict = {i+1:short_dict[day_split[i]]+'-'+short_dict[(day_split+[6])[i+1]] 
                      for i in range(len(day_split))}
    df_agg = df.groupby(['wkno','span'])['value'].mean()
    df_agg = df_agg.rename(index=span_name_dict,level=1)
    return df_agg

回复收藏 0 原文

~没有更多了~