根据另一列和分组依据的值查找列的标准差

发布于 2025-01-12 10:12:38 字数 309 浏览 0 评论 0原文

我有一个如下所示的数据框:

classid  grade  haveTeacher
0        99     1
1        40     1
1        50     0
1        70     1
2        50     0
3        34     0

我想知道我可以在 pandas 中写什么来找出有老师的 classid 中​​“等级”的标准差(1 表示有老师)。我知道我们必须按“classid”进行分组,但我想知道 .apply 和 lambda 函数内部会包含什么来满足所有这些条件?

I have a data frame looking like this:

classid  grade  haveTeacher
0        99     1
1        40     1
1        50     0
1        70     1
2        50     0
3        34     0

I'd like to find out what I could write in pandas to find out the standard deviation of "grade" across classid that have a teacher (1 means there is a teacher). I know we would have to groupby "classid", but I was wondering what would go inside the .apply and lambda function to fulfill all these conditionals?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

物价感观 2025-01-19 10:12:38

为了提高性能,如果haveTeacher中没有1,首先通过Series.where 然后聚合 std

df = (df['grade'].where(df['haveTeacher'].eq(1))
                 .groupby(df['classid'])
                 .std()
                 .reset_index(name='std'))
print (df)
   classid        std
0        0        NaN
1        1  21.213203
2        2        NaN
3        3        NaN

有问题建议的解决方案应该很慢如果数据框很大:

df = (df.groupby('classid')
        .apply(lambda x: x.loc[x['haveTeacher'].eq(1), 'grade'].std())
        .reset_index(name='std'))
print (df)
   classid        std
0        0        NaN
1        1  21.213203
2        2        NaN
3        3        NaN

For improve performance first set missing values if no 1 in haveTeacher by Series.where and then aggregate std:

df = (df['grade'].where(df['haveTeacher'].eq(1))
                 .groupby(df['classid'])
                 .std()
                 .reset_index(name='std'))
print (df)
   classid        std
0        0        NaN
1        1  21.213203
2        2        NaN
3        3        NaN

Solution suggested in question should be slow if large DataFrame:

df = (df.groupby('classid')
        .apply(lambda x: x.loc[x['haveTeacher'].eq(1), 'grade'].std())
        .reset_index(name='std'))
print (df)
   classid        std
0        0        NaN
1        1  21.213203
2        2        NaN
3        3        NaN
过期以后 2025-01-19 10:12:38

您可能首先想要获取包含教师 - df[df['haveteacher'] == 1] 的记录的数据框。获得此信息后,您可以执行 groupby(classid) 并使用 numpy.std (在此之前 import numpy as np )函数来查找该组的标准差
所以你有 -

>>> df[df['haveteacher'] == 1].groupby(['classid']).agg({'grade': np.std})

输出是 -

grade
classid           
0              NaN
1        21.213203

You might first want to get the dataframe with records having teacher - df[df['haveteacher'] == 1]. Once you get this you can do a groupby(classid) and use numpy.std (import numpy as np before that ) function to find the standard devitation of that group
so you have -

>>> df[df['haveteacher'] == 1].groupby(['classid']).agg({'grade': np.std})

output is -

grade
classid           
0              NaN
1        21.213203
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文