根据另一列和分组依据的值查找列的标准差
我有一个如下所示的数据框:
classid grade haveTeacher
0 99 1
1 40 1
1 50 0
1 70 1
2 50 0
3 34 0
我想知道我可以在 pandas 中写什么来找出有老师的 classid 中“等级”的标准差(1 表示有老师)。我知道我们必须按“classid”进行分组,但我想知道 .apply 和 lambda 函数内部会包含什么来满足所有这些条件?
I have a data frame looking like this:
classid grade haveTeacher
0 99 1
1 40 1
1 50 0
1 70 1
2 50 0
3 34 0
I'd like to find out what I could write in pandas to find out the standard deviation of "grade" across classid that have a teacher (1 means there is a teacher). I know we would have to groupby "classid", but I was wondering what would go inside the .apply and lambda function to fulfill all these conditionals?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
为了提高性能,如果
haveTeacher
中没有1
,首先通过Series.where
然后聚合std
:有问题建议的解决方案应该很慢如果数据框很大:
For improve performance first set missing values if no
1
inhaveTeacher
bySeries.where
and then aggregatestd
:Solution suggested in question should be slow if large DataFrame:
您可能首先想要获取包含教师 -
df[df['haveteacher'] == 1]
的记录的数据框。获得此信息后,您可以执行groupby(classid)
并使用 numpy.std (在此之前import numpy as np
)函数来查找该组的标准差所以你有 -
输出是 -
You might first want to get the dataframe with records having teacher -
df[df['haveteacher'] == 1]
. Once you get this you can do agroupby(classid)
and use numpy.std (import numpy as np
before that ) function to find the standard devitation of that groupso you have -
output is -