使用 Crosstab 在 Pandas 中聚合具有不同聚合函数的多列

发布于 2025-01-09 20:01:08 字数 1966 浏览 3 评论 0原文

我有一个以下格式的数据框。让我们称之为 df

flag1	flag2	type	count1	count2
a	x	new	10	2
a	y	old	40	5
a	x	old	50	6
a	y	new	15	1

我正在尝试获取以下格式。（我无法合并count1和count2的相邻单元格）

		count1		count2
		new	old	new	old
a	x	10	50	2	6
a	y	15	40	1	5

当我必须仅对一列（count1）进行聚合时，我尝试了以下操作，并且以下操作有效：

pd.crosstab([df.flag1,df.flag2], df.type, values=df.count1, aggfunc='sum')

但由于我想要两列数据， count1 和 count2，我尝试了以下方法，但没有成功，但

pd.crosstab([df.flag1,df.flag2], df.type, values=[df.count1,df.count2], aggfunc=['sum','sum']) #trial1
pd.crosstab([df.flag1,df.flag2], df.type, values=[df.count1,df.count2], aggfunc='sum') #trial2

没有一个有效。

扩展：我应该能够在不同的列上使用不同的功能。说对 count1 求和，对 count2 求和 或 对 count1 求和，对 count2 求平均值

原文

I have a dataframe of the below format. Let us call it df

flag1	flag2	type	count1	count2
a	x	new	10	2
a	y	old	40	5
a	x	old	50	6
a	y	new	15	1

I am trying to get the following format. (I could not merge the adjacent cells of count1 and count2)

		count1		count2
		new	old	new	old
a	x	10	50	2	6
a	y	15	40	1	5

I tried the following when i had to do the aggregate on only one column (count1) and the following worked:

pd.crosstab([df.flag1,df.flag2], df.type, values=df.count1, aggfunc='sum')

But since i want two columns of data, both count1 and count2, I tried the following but did not work out

pd.crosstab([df.flag1,df.flag2], df.type, values=[df.count1,df.count2], aggfunc=['sum','sum']) #trial1
pd.crosstab([df.flag1,df.flag2], df.type, values=[df.count1,df.count2], aggfunc='sum') #trial2

None of them worked.

Extension : I should be able use different functions on the different columns. say sum on count1 and nunique on count2 or sum on count1 and mean on count2

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

满意归宿 2025-01-16 20:01:08

我认为 crosstab 在这里不可能使用，替代方法是 DataFrame.pivot_table：

df = df.pivot_table(index=['flag1','flag2'], 
                    columns='type', 
                    aggfunc={'count1':'sum', 'count2':'nunique'})
print (df)
            count1     count2    
type           new old    new old
flag1 flag2                      
a     x         10  50      1   1
      y         15  40      1   1

另一种聚合替代方案：

df = (df.groupby(['flag1','flag2','type'])
        .agg({'count1':'sum', 'count2':'nunique'})
        .unstack())
print (df)
            count1     count2    
type           new old    new old
flag1 flag2                      
a     x         10  50      1   1
      y         15  40      1   1

I think crosstab is not possible use here, alternative is DataFrame.pivot_table:

df = df.pivot_table(index=['flag1','flag2'], 
                    columns='type', 
                    aggfunc={'count1':'sum', 'count2':'nunique'})
print (df)
            count1     count2    
type           new old    new old
flag1 flag2                      
a     x         10  50      1   1
      y         15  40      1   1

Another alternative with aggregation:

df = (df.groupby(['flag1','flag2','type'])
        .agg({'count1':'sum', 'count2':'nunique'})
        .unstack())
print (df)
            count1     count2    
type           new old    new old
flag1 flag2                      
a     x         10  50      1   1
      y         15  40      1   1

回复收藏 0 原文

~没有更多了~