两个groupby的匹配索引

发布于 2025-01-12 03:12:57 字数 676 浏览 0 评论 0原文

我需要计算组之间的几个百分比，并且我正在尝试以最佳方式构建一个允许我这样做的对象。

假设我有这个框架：

df = pd.DataFrame({ "cluster" : ["A", "A", "B", "B", "A", "B", "C", "C", "C"], "category": ["x", "y", "x", "x", "x", "y", "y", "z", "x"], "result" : [0,1,1,0,0,1,1,1,0]})

为了有一种简单的方法来计算几个％，我需要两种大小，一种带有完整的对象，另一种带有过滤器：

r1 = df.groupby(["cluster", "category"]).size()
print(r1)

r2 = df[df['result']==1].groupby(["cluster", "category"]).size()
print(r2)

但是，r2与带有索引的r1不兼容，它最终会带来问题当我想在同一个轴上绘制结果时，所以我尝试为 r2 提供与 r1 相同的索引，这是我发现的最好方法：

r3 = (r2 + r1 - r1).fillna(0)
print(r3)

你有更好的方法吗？也许将所有信息放在一个对象（具有两个命名列）中会很棒。

非常感谢！

原文

I need to calculate several % between groups, and I'm trying to optimally build an object that allows me to do so.

Say I have this frame:

df = pd.DataFrame({ "cluster" : ["A", "A", "B", "B", "A", "B", "C", "C", "C"], "category": ["x", "y", "x", "x", "x", "y", "y", "z", "x"], "result" : [0,1,1,0,0,1,1,1,0]})

To have an easy way to calculate several %, I need two sizes, one with the full object and other with a filter:

r1 = df.groupby(["cluster", "category"]).size()
print(r1)

r2 = df[df['result']==1].groupby(["cluster", "category"]).size()
print(r2)

However, r2 is not compatible with r1 with the indexes, and it will bring problems eventually when I want to plot the results in the same ax, so I'm trying to have for r2 same indexes as r1, and this is the best way I found:

r3 = (r2 + r1 - r1).fillna(0)
print(r3)

Do you have a better way of doing this? Perhaps having all the info in a single object (with two named columns) would be awesome.

Thank you very much!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

死开点丶别碍眼 2025-01-19 03:12:57

如果我理解正确，您可以使用 pd.concat （这样你将拥有包含两列的单个数据帧）：

out = pd.concat([r1, r2], axis=1).fillna(0)
print(out)

打印：

                  0    1
cluster category        
A       x         2  0.0
        y         1  1.0
B       x         2  1.0
        y         1  1.0
C       x         1  0.0
        y         1  1.0
        z         1  1.0

If I understand you correctly, you can use pd.concat (that way you will have single dataframe with two columns):

out = pd.concat([r1, r2], axis=1).fillna(0)
print(out)

Prints:

                  0    1
cluster category        
A       x         2  0.0
        y         1  1.0
B       x         2  1.0
        y         1  1.0
C       x         1  0.0
        y         1  1.0
        z         1  1.0

回复收藏 0 原文

~没有更多了~