pandas - 星期几和周数 - 选择值的进度整数

发布于 2025-01-17 18:52:34 字数 2213 浏览 0 评论 0原文

我有这个数据框：

rng = pd.date_range(start='2018-01-01', end='2018-01-14')
rnd_values = np.random.rand(len(rng))+3
dfc = pd.DataFrame({'time':rng.to_list(),'value':rnd_values})
dfc = dfc.set_index('time')
dfc.index = pd.to_datetime(dfc.index)

假设我必须根据某个定义间隔内的值每周进行划分。假设周一到周三之间；以及周四和周日。因此，我定义此列表是为了设置间隔：

days=['Monday','Thursday']

到目前为止，我已经学会了如何对结果组进行分组，这要归功于这篇文章

作为

def groupby_daysspan_week(dfc,days):
    df = dfc.copy()
    day_to_dayofweek = {'Monday':0,'Tuesday':1,'Wednesday':2,
                        'Thursday':3,'Friday':4,'Saturday':5,'Sunday':6}
    short_dict = {0:'Mn',1:'Tu',2:'Wd',3:'Th',4:'Fr',5:'St',6:'Sn'}
    day_split = [day_to_dayofweek[d] for d in days]
    df_agg = 0.
    df['wkno'] = df.index.isocalendar().week
    df['dow']  = df.index.day_of_week
    df['span'] = np.searchsorted(day_split,df['dow'],side='right')
    span_name_dict = {i+1:short_dict[day_split[i]]+'-'+short_dict[(day_split+[6])[i+1]] 
                      for i in range(len(day_split))}
    df_agg = df.groupby(['wkno','span'])['value'].mean()
    df_agg = df_agg.rename(index=span_name_dict,level=1)
    return df_agg

感谢 arnau。

正如你所注意到的，我现在有 4 个间隔。我想根据间隔来识别值。我想提出类似的东西：

               value  wkno  dow  span    int
time                                 
2018-01-01  3.548814     1    0     1    0
2018-01-02  3.715189     1    1     1    0
2018-01-03  3.602763     1    2     1    0
2018-01-04  3.544883     1    3     2    1
2018-01-05  3.423655     1    4     2    1
2018-01-06  3.645894     1    5     2    1
2018-01-07  3.437587     1    6     2    1
2018-01-08  3.891773     2    0     1    2
2018-01-09  3.963663     2    1     1    2
2018-01-10  3.383442     2    2     1    2
2018-01-11  3.791725     2    3     2    3
2018-01-12  3.528895     2    4     2    3
2018-01-13  3.568045     2    5     2    3
2018-01-14  3.925597     2    6     2    3

这将允许我根据间隔选择“值”。正如您所注意到的，我有一个名为“Int”的附加列，用于标识间隔。然后，我将将此列与 np.where 一起使用。问题是我无法建立“Int”列。

感谢您的任何帮助。

迪德罗

原文

I have this dataframe:

rng = pd.date_range(start='2018-01-01', end='2018-01-14')
rnd_values = np.random.rand(len(rng))+3
dfc = pd.DataFrame({'time':rng.to_list(),'value':rnd_values})
dfc = dfc.set_index('time')
dfc.index = pd.to_datetime(dfc.index)

Let's say that I have to divide each week according to the value in some defined interval.
Let's say between Monday and Wednesday; and Thursday and Sunday. So I define this list in order to set-up the intervals:

days=['Monday','Thursday']

I have, so far, learned how to group the resulting groups thanks to this post

def groupby_daysspan_week(dfc,days):
    df = dfc.copy()
    day_to_dayofweek = {'Monday':0,'Tuesday':1,'Wednesday':2,
                        'Thursday':3,'Friday':4,'Saturday':5,'Sunday':6}
    short_dict = {0:'Mn',1:'Tu',2:'Wd',3:'Th',4:'Fr',5:'St',6:'Sn'}
    day_split = [day_to_dayofweek[d] for d in days]
    df_agg = 0.
    df['wkno'] = df.index.isocalendar().week
    df['dow']  = df.index.day_of_week
    df['span'] = np.searchsorted(day_split,df['dow'],side='right')
    span_name_dict = {i+1:short_dict[day_split[i]]+'-'+short_dict[(day_split+[6])[i+1]] 
                      for i in range(len(day_split))}
    df_agg = df.groupby(['wkno','span'])['value'].mean()
    df_agg = df_agg.rename(index=span_name_dict,level=1)
    return df_agg

thanks to arnau.

As you can notice I have now 4 intervals. I would like to identify the values according to the interval. I would like to come out with something like:

               value  wkno  dow  span    int
time                                 
2018-01-01  3.548814     1    0     1    0
2018-01-02  3.715189     1    1     1    0
2018-01-03  3.602763     1    2     1    0
2018-01-04  3.544883     1    3     2    1
2018-01-05  3.423655     1    4     2    1
2018-01-06  3.645894     1    5     2    1
2018-01-07  3.437587     1    6     2    1
2018-01-08  3.891773     2    0     1    2
2018-01-09  3.963663     2    1     1    2
2018-01-10  3.383442     2    2     1    2
2018-01-11  3.791725     2    3     2    3
2018-01-12  3.528895     2    4     2    3
2018-01-13  3.568045     2    5     2    3
2018-01-14  3.925597     2    6     2    3

This would allow me to select "value" according to the interval. As you can notice I have an additional column named "Int" which identifies the interval. Then, I will use this column with np.where. The problem is that I am not able to build-up the "Int" column.

Thanks for any kind of help.

Diedro

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凶凌 2025-01-24 18:52:34

这是我设置的解决方案。
不知道够不够优雅。

import pandas as pd
import numpy as np
import random
random.seed(10)


days=['Monday','Thursday']

np.random.seed(0)

rng = pd.date_range(start='2018-01-01', end='2018-01-14')
rnd_values = np.random.rand(len(rng))+3

dfc = pd.DataFrame({'time':rng.to_list(),'value':rnd_values})

dfc = dfc.set_index('time')

dfc.index = pd.to_datetime(dfc.index)

def groupby_daysspan_week(dfc,days):
    df = dfc.copy()
    day_to_dayofweek = {'Monday':0,'Tuesday':1,'Wednesday':2,
                        'Thursday':3,'Friday':4,'Saturday':5,'Sunday':6}
    short_dict = {0:'Mn',1:'Tu',2:'Wd',3:'Th',4:'Fr',5:'St',6:'Sn'}
    day_split = [day_to_dayofweek[d] for d in days]
    df_agg = 0.
    df['wkno'] = df.index.isocalendar().week
    df['dow']  = df.index.day_of_week
    df['span']   = np.searchsorted(day_split,df['dow'],side='right')
    
    df['Int'] = len(days)*(df['wkno']-1) +  df['span']   
    
    
    span_name_dict = {i+1:short_dict[day_split[i]]+'-'+short_dict[(day_split+[6])[i+1]] 
                      for i in range(len(day_split))}
    df_agg = df.groupby(['wkno','span'])['value'].mean()
    df_agg = df_agg.rename(index=span_name_dict,level=1)
    return df_agg,df


df_agg,df = groupby_daysspan_week(dfc,days)


print(df)

here the solution that I have set-up.
I do not know if it is elegant enough.

import pandas as pd
import numpy as np
import random
random.seed(10)


days=['Monday','Thursday']

np.random.seed(0)

rng = pd.date_range(start='2018-01-01', end='2018-01-14')
rnd_values = np.random.rand(len(rng))+3

dfc = pd.DataFrame({'time':rng.to_list(),'value':rnd_values})

dfc = dfc.set_index('time')

dfc.index = pd.to_datetime(dfc.index)

def groupby_daysspan_week(dfc,days):
    df = dfc.copy()
    day_to_dayofweek = {'Monday':0,'Tuesday':1,'Wednesday':2,
                        'Thursday':3,'Friday':4,'Saturday':5,'Sunday':6}
    short_dict = {0:'Mn',1:'Tu',2:'Wd',3:'Th',4:'Fr',5:'St',6:'Sn'}
    day_split = [day_to_dayofweek[d] for d in days]
    df_agg = 0.
    df['wkno'] = df.index.isocalendar().week
    df['dow']  = df.index.day_of_week
    df['span']   = np.searchsorted(day_split,df['dow'],side='right')
    
    df['Int'] = len(days)*(df['wkno']-1) +  df['span']   
    
    
    span_name_dict = {i+1:short_dict[day_split[i]]+'-'+short_dict[(day_split+[6])[i+1]] 
                      for i in range(len(day_split))}
    df_agg = df.groupby(['wkno','span'])['value'].mean()
    df_agg = df_agg.rename(index=span_name_dict,level=1)
    return df_agg,df


df_agg,df = groupby_daysspan_week(dfc,days)


print(df)

回复收藏 0 原文

~没有更多了~