Pandas - 修改 groupby.agg 的描述性输出

发布于 2025-01-12 03:37:53 字数 1200 浏览 0 评论 0 原文

我想从任何分数列中获取平均值、stp、偏度,同时将数据集分组为其他 2 列(组、块)。 我使用了这段代码 -

scores_list = ['A','B','C']
descriptive_agg = df.groupby(['group','block'])[scores_list].agg(['mean', 'std','skew'])

并得到了这个数据帧:

<前><代码> AAABBBCCC
    mean    std skew    mean    std skew    mean    std skew

组块
0 负 26.76470588 54.79291496 6.069163775 3.098039216 1.170553749 0.114238196 1.738755233 0.611860454 1.063953504 0 新铀 29.92 70.9644464 6.275474539 3.6 1.245399698 -0.039619494 1.906404475 0.568964543 0.561075178 1 负 16.42391304 18.0702133 2.968326848 2.891304348 1.253185144 0.209586627 1.684455875 0.598785419 0.872917578 1 neu 16.92391304 18.49159815 2.951129818 3.5 1.172018077 -0.313988331 1.893045967 0.646930842 1.11778034

但我想在左边,我的预期输出是:

分数组块平均标准偏差 A 0 负 26.76470588 54.79291496 6.069163775 0 新电子 29.92 70.9644464 6.275474539 1 负 16.42391304 18.0702133 2.968326848 1 新 16.92391304 18.49159815 2.951129818

B 0 新 3.098039216 1.170553749 0.114238196 0 新 3.6 1.245399698 -0.039619494 1 负 2.891304348 1.253185144 0.209586627 1 新 3.5 1.172018077 -0.313988331

提前致谢!

I wanted to get the mean, stp, skewness from any score column, while I grouped my data set be 2 other columns (group, block).
I used this code for it -

scores_list = ['A','B','C']
descriptive_agg = df.groupby(['group','block'])[scores_list].agg(['mean', 'std','skew'])

and got this dataFrame:

  A   A   A   B   B   B   C   C   C
    mean    std skew    mean    std skew    mean    std skew

group block
0 neg 26.76470588 54.79291496 6.069163775 3.098039216 1.170553749 0.114238196 1.738755233 0.611860454 1.063953504
0 neu 29.92 70.9644464 6.275474539 3.6 1.245399698 -0.039619494 1.906404475 0.568964543 0.561075178
1 neg 16.42391304 18.0702133 2.968326848 2.891304348 1.253185144 0.209586627 1.684455875 0.598785419 0.872917578
1 neu 16.92391304 18.49159815 2.951129818 3.5 1.172018077 -0.313988331 1.893045967 0.646930842 1.11778034

But I want to have a "score" column on the left, my expected output is:

Score group block mean std skew
A 0 neg 26.76470588 54.79291496 6.069163775
0 neu 29.92 70.9644464 6.275474539
1 neg 16.42391304 18.0702133 2.968326848
1 neu 16.92391304 18.49159815 2.951129818

B 0 neg 3.098039216 1.170553749 0.114238196
0 neu 3.6 1.245399698 -0.039619494
1 neg 2.891304348 1.253185144 0.209586627
1 neu 3.5 1.172018077 -0.313988331

thanks in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

述情 2025-01-19 03:37:54

添加 DataFrame.stackDataFrame.reorder_levelsDataFrame.sort_index

df = df.stack(0).reorder_levels([2,0,1]).sort_index()
print (df)
              mean      skew        std
A 0 neg  26.764706  6.069164  54.792915
    neu  29.920000  6.275475  70.964446
  1 neg  16.423913  2.968327  18.070213
    neu  16.923913  2.951130  18.491598
B 0 neg   3.098039  0.114238   1.170554
    neu   3.600000 -0.039619   1.245400
  1 neg   2.891304  0.209587   1.253185
    neu   3.500000 -0.313988   1.172018
C 0 neg   1.738755  1.063954   0.611860
    neu   1.906404  0.561075   0.568965
  1 neg   1.684456  0.872918   0.598785
    neu   1.893046  1.117780   0.646931

编辑:如果需要将重复值替换为空字符串:

#original index
print (df.index)
MultiIndex([('A', 0, 'neg'),
            ('A', 0, 'neu'),
            ('A', 1, 'neg'),
            ('A', 1, 'neu'),
            ('B', 0, 'neg'),
            ('B', 0, 'neu'),
            ('B', 1, 'neg'),
            ('B', 1, 'neu'),
            ('C', 0, 'neg'),
            ('C', 0, 'neu'),
            ('C', 1, 'neg'),
            ('C', 1, 'neu')],
           )

df1 = df.index.to_frame(index=False)
df1.columns = [0,1,2]
m1 = df1[0].duplicated()
m2 = df1.duplicated(subset=[0,1])

df1[0] = df1[0].mask(m1, '')
df1[1] = df1[1].mask(m2, '')
print (df1)
    0  1    2
0   A  0  neg
1         neu
2      1  neg
3         neu
4   B  0  neg
5         neu
6      1  neg
7         neu
8   C  0  neg
9         neu
10     1  neg
11        neu

df.index = pd.MultiIndex.from_frame(df1)
df = df.rename_axis([None, None, None])
print (df)
              mean      skew        std
A 0 neg  26.764706  6.069164  54.792915
    neu  29.920000  6.275475  70.964446
  1 neg  16.423913  2.968327  18.070213
    neu  16.923913  2.951130  18.491598
B 0 neg   3.098039  0.114238   1.170554
    neu   3.600000 -0.039619   1.245400
  1 neg   2.891304  0.209587   1.253185
    neu   3.500000 -0.313988   1.172018
C 0 neg   1.738755  1.063954   0.611860
    neu   1.906404  0.561075   0.568965
  1 neg   1.684456  0.872918   0.598785
    neu   1.893046  1.117780   0.646931

   
print (df.index)
MultiIndex([('A',  0, 'neg'),
            ( '', '', 'neu'),
            ( '',  1, 'neg'),
            ( '', '', 'neu'),
            ('B',  0, 'neg'),
            ( '', '', 'neu'),
            ( '',  1, 'neg'),
            ( '', '', 'neu'),
            ('C',  0, 'neg'),
            ( '', '', 'neu'),
            ( '',  1, 'neg'),
            ( '', '', 'neu')],
           )

Add DataFrame.stack with DataFrame.reorder_levels and DataFrame.sort_index:

df = df.stack(0).reorder_levels([2,0,1]).sort_index()
print (df)
              mean      skew        std
A 0 neg  26.764706  6.069164  54.792915
    neu  29.920000  6.275475  70.964446
  1 neg  16.423913  2.968327  18.070213
    neu  16.923913  2.951130  18.491598
B 0 neg   3.098039  0.114238   1.170554
    neu   3.600000 -0.039619   1.245400
  1 neg   2.891304  0.209587   1.253185
    neu   3.500000 -0.313988   1.172018
C 0 neg   1.738755  1.063954   0.611860
    neu   1.906404  0.561075   0.568965
  1 neg   1.684456  0.872918   0.598785
    neu   1.893046  1.117780   0.646931

EDIT: If need replace duplicated values to empty strings:

#original index
print (df.index)
MultiIndex([('A', 0, 'neg'),
            ('A', 0, 'neu'),
            ('A', 1, 'neg'),
            ('A', 1, 'neu'),
            ('B', 0, 'neg'),
            ('B', 0, 'neu'),
            ('B', 1, 'neg'),
            ('B', 1, 'neu'),
            ('C', 0, 'neg'),
            ('C', 0, 'neu'),
            ('C', 1, 'neg'),
            ('C', 1, 'neu')],
           )

df1 = df.index.to_frame(index=False)
df1.columns = [0,1,2]
m1 = df1[0].duplicated()
m2 = df1.duplicated(subset=[0,1])

df1[0] = df1[0].mask(m1, '')
df1[1] = df1[1].mask(m2, '')
print (df1)
    0  1    2
0   A  0  neg
1         neu
2      1  neg
3         neu
4   B  0  neg
5         neu
6      1  neg
7         neu
8   C  0  neg
9         neu
10     1  neg
11        neu

df.index = pd.MultiIndex.from_frame(df1)
df = df.rename_axis([None, None, None])
print (df)
              mean      skew        std
A 0 neg  26.764706  6.069164  54.792915
    neu  29.920000  6.275475  70.964446
  1 neg  16.423913  2.968327  18.070213
    neu  16.923913  2.951130  18.491598
B 0 neg   3.098039  0.114238   1.170554
    neu   3.600000 -0.039619   1.245400
  1 neg   2.891304  0.209587   1.253185
    neu   3.500000 -0.313988   1.172018
C 0 neg   1.738755  1.063954   0.611860
    neu   1.906404  0.561075   0.568965
  1 neg   1.684456  0.872918   0.598785
    neu   1.893046  1.117780   0.646931

   
print (df.index)
MultiIndex([('A',  0, 'neg'),
            ( '', '', 'neu'),
            ( '',  1, 'neg'),
            ( '', '', 'neu'),
            ('B',  0, 'neg'),
            ( '', '', 'neu'),
            ( '',  1, 'neg'),
            ( '', '', 'neu'),
            ('C',  0, 'neg'),
            ( '', '', 'neu'),
            ( '',  1, 'neg'),
            ( '', '', 'neu')],
           )
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文