两个groupby的匹配索引
我需要计算组之间的几个百分比,并且我正在尝试以最佳方式构建一个允许我这样做的对象。
假设我有这个框架:
df = pd.DataFrame({ "cluster" : ["A", "A", "B", "B", "A", "B", "C", "C", "C"], "category": ["x", "y", "x", "x", "x", "y", "y", "z", "x"], "result" : [0,1,1,0,0,1,1,1,0]})
为了有一种简单的方法来计算几个%,我需要两种大小,一种带有完整的对象,另一种带有过滤器:
r1 = df.groupby(["cluster", "category"]).size()
print(r1)
r2 = df[df['result']==1].groupby(["cluster", "category"]).size()
print(r2)
但是,r2与带有索引的r1不兼容,它最终会带来问题当我想在同一个轴上绘制结果时,所以我尝试为 r2 提供与 r1 相同的索引,这是我发现的最好方法:
r3 = (r2 + r1 - r1).fillna(0)
print(r3)
你有更好的方法吗?也许将所有信息放在一个对象(具有两个命名列)中会很棒。
非常感谢!
I need to calculate several % between groups, and I'm trying to optimally build an object that allows me to do so.
Say I have this frame:
df = pd.DataFrame({ "cluster" : ["A", "A", "B", "B", "A", "B", "C", "C", "C"], "category": ["x", "y", "x", "x", "x", "y", "y", "z", "x"], "result" : [0,1,1,0,0,1,1,1,0]})
To have an easy way to calculate several %, I need two sizes, one with the full object and other with a filter:
r1 = df.groupby(["cluster", "category"]).size()
print(r1)
r2 = df[df['result']==1].groupby(["cluster", "category"]).size()
print(r2)
However, r2 is not compatible with r1 with the indexes, and it will bring problems eventually when I want to plot the results in the same ax, so I'm trying to have for r2 same indexes as r1, and this is the best way I found:
r3 = (r2 + r1 - r1).fillna(0)
print(r3)
Do you have a better way of doing this? Perhaps having all the info in a single object (with two named columns) would be awesome.
Thank you very much!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果我理解正确,您可以使用
pd.concat
(这样你将拥有包含两列的单个数据帧):打印:
If I understand you correctly, you can use
pd.concat
(that way you will have single dataframe with two columns):Prints: