在多列上平均

发布于 2025-01-22 21:08:52 字数 440 浏览 0 评论 0原文

我试图找到计算新列的平均值。

data['english_combined'] = data['english'] + data['intake_english'] + data['language test scores formatted']

因此，English_combined列是其他列的总和。我想根据输入哪些等级来命中均值，例如，只有“英语”和“ inktake_english”有一个等级，我想以这2的平均值。 3个测试结合在一起，

我确实尝试了这样的事情，没有

[np.mean(i,j,k) for i,j,k in zip(data['english'], data['intake_english'], data['language test scores formatted'])]

任何成功的建议？

原文

I am trying to find calculate the mean for a new column.

data['english_combined'] = data['english'] + data['intake_english'] + data['language test scores formatted']

so the english_combined column is a the sum of the other columns. I want to take the mean based on what grades are entered, example if only 'English' and 'inktake_english' have a grade I want to take the mean of these 2. if all 3 test are taken I want to take the mean of the 3 tests combined

I did try something like this with no succes

[np.mean(i,j,k) for i,j,k in zip(data['english'], data['intake_english'], data['language test scores formatted'])]

any suggestions that would work?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

当梦初醒 2025-01-29 21:08:52

df.mean（axis ='列'）执行您想要的。默认情况下，它忽略了NAN（也就是说，计算平均值时不会计算它们的总数）。

一个简单的示例：

>>> df = pd.DataFrame({'a': [7, 8.5, pd.NA, 6], 
                       'b': [5, 6, 6, 7], 
                       'c': [7, pd.NA, pd.NA, 5]})
>>> df
      a  b     c
0     7  5     7
1   8.5  6  <NA>
2  <NA>  6  <NA>
3     6  7     5
>>> df.mean(axis='columns')
0    6.333333
1    7.250000
2    6.000000
3    6.000000
dtype: float64

请注意，第2行的平均值6，而不是2。第1行类似。

对于您的情况，它就是类似的

data['english_combined'] = data[
            ['english', 'intake_english', 
             'language test scores formatted']].mean(axis='columns')

df.mean(axis='columns') does what you want. By default, it ignores NaNs (that is, it won't count them for the total when computing the average).

A simple example:

>>> df = pd.DataFrame({'a': [7, 8.5, pd.NA, 6], 
                       'b': [5, 6, 6, 7], 
                       'c': [7, pd.NA, pd.NA, 5]})
>>> df
      a  b     c
0     7  5     7
1   8.5  6  <NA>
2  <NA>  6  <NA>
3     6  7     5
>>> df.mean(axis='columns')
0    6.333333
1    7.250000
2    6.000000
3    6.000000
dtype: float64

Note how row 2 has 6 as its mean, not 2. Similar for row 1.

For your case, it would be something like

data['english_combined'] = data[
            ['english', 'intake_english', 
             'language test scores formatted']].mean(axis='columns')

回复收藏 0 原文

~没有更多了~

关于作者

孤独难免

暂无简介

文章

24 人气

关注发私信

饮湿

文章 0 评论 0

关注

明月

文章 0 评论 0

关注

02

文章 0 评论 0

关注

hs1283

文章 0 评论 0

关注

风向决定发型

文章 0 评论 0

关注

落花浅忆

文章 0 评论 0

友情链接

文江博客

在多列上平均

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

在多列上平均

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。