如何在熊猫中按一列或另一列分组

发布于 2025-02-01 03:35:11 字数 373 浏览 2 评论 0原文

我有一个表：

    col1    col2
0   1       a
1   2       b
2   2       c
3   3       c
4   4       d

如果col1 或 col2中的匹配值，我希望排将它们分组在一起。也就是说，我想要这样的事情：

> (
    df
    .groupby(set('col1', 'col2'))  # Made-up syntax
    .ngroup())
0  0
1  1
2  1
3  1
4  2

有没有办法用大熊猫来做到这一点？

原文

I have a table like:

    col1    col2
0   1       a
1   2       b
2   2       c
3   3       c
4   4       d

I'd like rows to be grouped together if they have a matching value in col1 or col2. That is, I'd like something like this:

> (
    df
    .groupby(set('col1', 'col2'))  # Made-up syntax
    .ngroup())
0  0
1  1
2  1
3  1
4  2

Is there a way to do this with pandas?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

茶花眉 2025-02-08 03:35:11

仅仅用大熊猫实现这一点并不容易。实际上，当第二组连接两个项目时，可以连接两个遥远的组。

您可以使用图理论对此进行处理。使用两个（或更多）组形成的边缘找到连接的组件。一个python库为 networkx ：

import networkx as nx

g1 = df.groupby('col1').ngroup()
g2 = 'a'+df.groupby('col2').ngroup().astype(str)

# make graph and get connected components to form a mapping dictionary
G = nx.from_edgelist(zip(g1, g2))
d = {k:v for v,s in enumerate(nx.connected_components(G)) for k in s}

# find common group
group = g1.map(d)

df.groupby(group).ngroup()

output：

0    0
1    1
2    1
3    1
4    2
dtype: int64

graph：

另外使用：

G = nx.from_pandas_edgelist(df, source='col1', target='col2')
mapper = {n: i for i, g in enumerate(nx.connected_components(G)) for n in g}
df['group'] = df['col1'].map(mapper)

输出：

   col1 col2  group
0     1    a      0
1     2    b      1
2     2    c      1
3     3    c      1
4     4    d      2

This is not easy to achieve simply with pandas. Indeed, two far away groups can become connected when two items are connected in the second group.

You can approach this using graph theory. Find the connected components using edges formed by the two (or more) groups. A python library for this is networkx:

import networkx as nx

g1 = df.groupby('col1').ngroup()
g2 = 'a'+df.groupby('col2').ngroup().astype(str)

# make graph and get connected components to form a mapping dictionary
G = nx.from_edgelist(zip(g1, g2))
d = {k:v for v,s in enumerate(nx.connected_components(G)) for k in s}

# find common group
group = g1.map(d)

df.groupby(group).ngroup()

output:

0    0
1    1
2    1
3    1
4    2
dtype: int64

graph:

Alternatively using from_pandas_edgelist:

G = nx.from_pandas_edgelist(df, source='col1', target='col2')
mapper = {n: i for i, g in enumerate(nx.connected_components(G)) for n in g}
df['group'] = df['col1'].map(mapper)

Output:

   col1 col2  group
0     1    a      0
1     2    b      1
2     2    c      1
3     3    c      1
4     4    d      2

回复收藏 0 原文

~没有更多了~

关于作者

李不

暂无简介

文章

28 人气

关注发私信

陪我终i

文章 0 评论 0

关注

别忘他

文章 0 评论 0

关注

野心澎湃

文章 0 评论 0

关注

蒲公英的约定

文章 0 评论 0

关注

。

文章 0 评论 0

关注

旧时模样

文章 0 评论 0

友情链接

文江博客

如何在熊猫中按一列或另一列分组

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

陪我终i

别忘他

野心澎湃

蒲公英的约定

。

旧时模样

友情链接

如何在熊猫中按一列或另一列分组

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

陪我终i

别忘他

野心澎湃

蒲公英的约定

。

旧时模样

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。