如何使用 Pandas groupby 函数计算上一年的平均值?
我正在尝试寻找一种方法来查找玩家“上赛季”
(上一年)的平均得分,并将其添加到原始数据帧df
的新列中。
我编写了一个公式来获取玩家当年的平均得分(不包括当前行),如下所示:
df['Season Avg'] = df.groupby([df['Player'], df['DateTime'].dt.year])['Score']
.apply(lambda x: x.shift(1).expanding().mean())
然而,尽管我尽了最大努力使用 shift
函数,但我还是无法完全工作如何直接将前几年的平均值(“Last Season Avg”
)计算到新列中。
数据框设置如下:
Player | DateTime | Score | Season Avg |
---|---|---|---|
PlayerB | 2020-MM-DD HH:MM:SS | 40 | NaN |
PlayerA | 2020-MM-DD HH:MM:SS | 50 | NaN |
PlayerA | 2021-MM-DD HH:MM:SS | 100 | NaN |
玩家B | 2021-MM-DD HH:MM:SS | 200 | NaN |
玩家A | 2021-MM-DD HH:MM:SS | 160 | 100 |
玩家 B | 2021-MM-DD HH:MM:SS | 140 | 200 |
玩家 B | 2021-MM-DD HH:MM:SS | 160 | 170 |
玩家 A | 2021-MM-DD HH:MM:SS | 200 | 130 |
我想要的新的理想数据框:
玩家 | 日期时间 | 得分 | 赛季平均 | 上赛季平均 |
---|---|---|---|---|
玩家B | 2020-MM-DD HH:MM:SS | 40 | NaN | NaN |
玩家 | A 2020-MM-DD HH:MM:SS | 50 | NaN | NaN |
玩家A | 2021-MM-DD HH:MM:SS | 100 | NaN | 50 |
玩家B | 2021-MM-DD HH :MM:SS | 200 | NaN | 40 |
玩家 | A 2021-MM-DD HH:MM:SS | 160 | 100 | 50 |
玩家 B | 2021-MM-DD HH : | MM:SS 140 | 200 | 40 |
玩家B | 2021-MM-DD HH:MM:SS | 160 | 170 | 40 |
玩家 A | 2021-MM-DD HH:MM:SS | 200 | 130 | 50 |
I am trying to look for a method to find a Player's mean score for the "Last Season"
(Previous Year) and add it in a new column in the original dataframe df
.
I have coded a formula to get a Player's mean score for the current year, excluding the current row, which is as follows:
df['Season Avg'] = df.groupby([df['Player'], df['DateTime'].dt.year])['Score']
.apply(lambda x: x.shift(1).expanding().mean())
However, despite my best attempt at using the shift
function, I can not quite work out how to calculate the previous years mean ("Last Season Avg"
) directly into a new column.
The dataframe is set out as follows:
Player | DateTime | Score | Season Avg |
---|---|---|---|
PlayerB | 2020-MM-DD HH:MM:SS | 40 | NaN |
PlayerA | 2020-MM-DD HH:MM:SS | 50 | NaN |
PlayerA | 2021-MM-DD HH:MM:SS | 100 | NaN |
PlayerB | 2021-MM-DD HH:MM:SS | 200 | NaN |
PlayerA | 2021-MM-DD HH:MM:SS | 160 | 100 |
PlayerB | 2021-MM-DD HH:MM:SS | 140 | 200 |
PlayerB | 2021-MM-DD HH:MM:SS | 160 | 170 |
PlayerA | 2021-MM-DD HH:MM:SS | 200 | 130 |
The new ideal dataframe that I would like:
Player | DateTime | Score | Season Avg | Last Season Avg |
---|---|---|---|---|
PlayerB | 2020-MM-DD HH:MM:SS | 40 | NaN | NaN |
PlayerA | 2020-MM-DD HH:MM:SS | 50 | NaN | NaN |
PlayerA | 2021-MM-DD HH:MM:SS | 100 | NaN | 50 |
PlayerB | 2021-MM-DD HH:MM:SS | 200 | NaN | 40 |
PlayerA | 2021-MM-DD HH:MM:SS | 160 | 100 | 50 |
PlayerB | 2021-MM-DD HH:MM:SS | 140 | 200 | 40 |
PlayerB | 2021-MM-DD HH:MM:SS | 160 | 170 | 40 |
PlayerA | 2021-MM-DD HH:MM:SS | 200 | 130 | 50 |
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以按“玩家”和年份
groupby
一次来查找每个玩家的年平均值;然后groupby
"Player" +shift
获取上一年的上一年的平均值。如果您要查找特定赛季之前的职业生涯平均值,可以使用
expanding().mean()
:编辑:
提供了示例数据后,让我们对其进行测试。这里的主要问题是“年份”有重复的值。 @PaulRougieux 处理得非常优雅。这是另一种选择。这个想法是找到上赛季的平均值并将其映射回原始的 df (而不是对其进行转换)。
输出:
You can
groupby
once by the "Player" and the year to find the yearly average for each player; thengroupby
"Player" +shift
to get the previous year's previous year's averages.If you're looking for career averages until a particular season, you could use
expanding().mean()
:Edit:
With sample data provided, let's test it. The main problem here is that "Year"s have duplicate values. @PaulRougieux handles it very elegantly. Here's another option. The idea is to find last season's averages and map it back to the original
df
(instead of transforming it).Output:
创建样本数据集
使用转换将当前季节平均值添加到数据框
此处无法应用移位,因为年份会重复
计算上一年的平均值并将它们连接到原始数据框
计算上一年平均值的另一种方法,使用
shift
可能比使用year + 1
更优雅。Create a sample data set
Use transform to add the current season average to the data frame
Shift cannot be applied here because years are repeated
Compute the average from the previous year and join them to the original dataframe
Another way to compute the average from the previous year, using
shift
is maybe more elegant than doingyear + 1
.