两个groupby的匹配索引

发布于 2025-01-12 03:12:57 字数 676 浏览 0 评论 0原文

我需要计算组之间的几个百分比,并且我正在尝试以最佳方式构建一个允许我这样做的对象。

假设我有这个框架:

df = pd.DataFrame({ "cluster" : ["A", "A", "B", "B", "A", "B", "C", "C", "C"], "category": ["x", "y", "x", "x", "x", "y", "y", "z", "x"], "result" : [0,1,1,0,0,1,1,1,0]})

为了有一种简单的方法来计算几个%,我需要两种大小,一种带有完整的对象,另一种带有过滤器:

r1 = df.groupby(["cluster", "category"]).size()
print(r1)

r2 = df[df['result']==1].groupby(["cluster", "category"]).size()
print(r2)

但是,r2与带有索引的r1不兼容,它最终会带来问题当我想在同一个轴上绘制结果时,所以我尝试为 r2 提供与 r1 相同的索引,这是我发现的最好方法:

r3 = (r2 + r1 - r1).fillna(0)
print(r3)

你有更好的方法吗?也许将所有信息放在一个对象(具有两个命名列)中会很棒。

非常感谢!

I need to calculate several % between groups, and I'm trying to optimally build an object that allows me to do so.

Say I have this frame:

df = pd.DataFrame({ "cluster" : ["A", "A", "B", "B", "A", "B", "C", "C", "C"], "category": ["x", "y", "x", "x", "x", "y", "y", "z", "x"], "result" : [0,1,1,0,0,1,1,1,0]})

To have an easy way to calculate several %, I need two sizes, one with the full object and other with a filter:

r1 = df.groupby(["cluster", "category"]).size()
print(r1)

r2 = df[df['result']==1].groupby(["cluster", "category"]).size()
print(r2)

However, r2 is not compatible with r1 with the indexes, and it will bring problems eventually when I want to plot the results in the same ax, so I'm trying to have for r2 same indexes as r1, and this is the best way I found:

r3 = (r2 + r1 - r1).fillna(0)
print(r3)

Do you have a better way of doing this? Perhaps having all the info in a single object (with two named columns) would be awesome.

Thank you very much!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

死开点丶别碍眼 2025-01-19 03:12:57

如果我理解正确,您可以使用 pd.concat (这样你将拥有包含两列的单个数据帧):

out = pd.concat([r1, r2], axis=1).fillna(0)
print(out)

打印:

                  0    1
cluster category        
A       x         2  0.0
        y         1  1.0
B       x         2  1.0
        y         1  1.0
C       x         1  0.0
        y         1  1.0
        z         1  1.0

If I understand you correctly, you can use pd.concat (that way you will have single dataframe with two columns):

out = pd.concat([r1, r2], axis=1).fillna(0)
print(out)

Prints:

                  0    1
cluster category        
A       x         2  0.0
        y         1  1.0
B       x         2  1.0
        y         1  1.0
C       x         1  0.0
        y         1  1.0
        z         1  1.0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文