pandas按某两列分组,求均值,但不加这行的?
1.想得到一个新列 test['avg_score'] , 想要的结果就是图片的 score_avg_before 列.
score_avg_before = 取小于本行Day_Num的列 ,按 Season 和 T1_TeamID 分组,score列的均值 .
如果本行之前没有,填"NaN".
就是求 这个赛季本球队今天之前的平均进球,所以均值不算当前这行.
这种pandas能算吗?
ps:可以按像网页那样,不是全赛季,只计算最近n场/10场/20场 的均值吗?
----------------------------------自己写的,结果对,速度太慢,怎么改成快的---------------------------------------
不太会用pandas,只写了一行行处理.按我这个逻辑改成 不是一行行处理 快速的也行。
test['score_avg'] = "NaN"
for i in range(0,len(test)):
if test['DayNum'][i] > 1:
tmp_sc = test.loc[test.DayNum<test['DayNum'][i]].reset_index(drop=True).groupby(["Season", 'T1_TeamID'])['score'].agg([np.mean]).reset_index()
test['score_avg'][i] = tmp_sc.loc[(tmp_sc.T1_TeamID==test['T1_TeamID'][i])&(tmp_sc.Season==test['Season'][i])].reset_index(drop=True)['mean'][0]
else:
test['score_avg'][i] = "NaN"
--------------------------下面是尝试修改快的错误--------------------------------------------------
1.改成这样会错误,下面。
test['score_avg'] = np.where(test['DayNum'] > 1,test.loc[test.DayNum<test['DayNum']].reset_index(drop=True).groupby(["Season", 'T1_TeamID'])['score'].agg([np.mean]).reset_index().loc[(test.T1_TeamID==test['T1_TeamID'])&(test.Season==test['Season'])].reset_index(drop=True)['mean'][0],"NaN")
KeyError: 0
IndexError: index out of bounds
2.改成这样会错误,下面。
test['score_avg'] = np.where(test['DayNum'] > 1,test.loc[test.DayNum<test['DayNum']].reset_index(drop=True).groupby(["Season", 'T1_TeamID'])['score'].agg([np.mean]).reset_index().loc[(test.T1_TeamID==test['T1_TeamID'])&(test.Season==test['Season'])].reset_index(drop=True)['mean'],"NaN")
ValueError: operands could not be broadcast together with shapes (21,) (0,) ()
-------------------------DF 创建初始化:------------------------
import pandas as pd
from numpy import nan as NaN
import numpy as np
data = dict()
data['T1_TeamID'] = [1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3]
data['score'] = [50,60,70,60,70,80,70,80,90,60,70,80,70,80,90,80,90,100,90,100,110]
data['score_avg_before'] = [NaN,NaN,NaN,50,60,70,55,65,75,NaN,NaN,NaN,60,70,80,65,75,85,70,80,90]
data['Season'] = ['2018','2018','2018','2018','2018','2018','2018','2018','2018','2019','2019','2019','2019','2019','2019','2019','2019','2019','2019','2019','2019']
data['DayNum'] = [1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3,4,4,4]
#create dataframe
test = pd.DataFrame(data)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,按照Season和T1_TeamID进行分组,
创建一个空表用于存数据,
每个组里面,把score组成一个列表,根据求和第0个到第k-1个,并且除以k-1个,
返回赋值,
最后concat聚合一下