如何在熊猫中按一列或另一列分组

发布于 2025-02-01 03:35:11 字数 373 浏览 2 评论 0原文

我有一个表:

    col1    col2
0   1       a
1   2       b
2   2       c
3   3       c
4   4       d

如果col1 col2中的匹配值,我希望排将它们分组在一起。也就是说,我想要这样的事情:

> (
    df
    .groupby(set('col1', 'col2'))  # Made-up syntax
    .ngroup())
0  0
1  1
2  1
3  1
4  2

有没有办法用大熊猫来做到这一点?

I have a table like:

    col1    col2
0   1       a
1   2       b
2   2       c
3   3       c
4   4       d

I'd like rows to be grouped together if they have a matching value in col1 or col2. That is, I'd like something like this:

> (
    df
    .groupby(set('col1', 'col2'))  # Made-up syntax
    .ngroup())
0  0
1  1
2  1
3  1
4  2

Is there a way to do this with pandas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

茶花眉 2025-02-08 03:35:11

仅仅用大熊猫实现这一点并不容易。实际上,当第二组连接两个项目时,可以连接两个遥远的组。

您可以使用图理论对此进行处理。使用两个(或更多)组形成的边缘找到连接的组件。一个python库为 networkx

import networkx as nx

g1 = df.groupby('col1').ngroup()
g2 = 'a'+df.groupby('col2').ngroup().astype(str)

# make graph and get connected components to form a mapping dictionary
G = nx.from_edgelist(zip(g1, g2))
d = {k:v for v,s in enumerate(nx.connected_components(G)) for k in s}

# find common group
group = g1.map(d)

df.groupby(group).ngroup()

output:

0    0
1    1
2    1
3    1
4    2
dtype: int64

graph:

另外使用

G = nx.from_pandas_edgelist(df, source='col1', target='col2')
mapper = {n: i for i, g in enumerate(nx.connected_components(G)) for n in g}
df['group'] = df['col1'].map(mapper)

输出:

   col1 col2  group
0     1    a      0
1     2    b      1
2     2    c      1
3     3    c      1
4     4    d      2

This is not easy to achieve simply with pandas. Indeed, two far away groups can become connected when two items are connected in the second group.

You can approach this using graph theory. Find the connected components using edges formed by the two (or more) groups. A python library for this is networkx:

import networkx as nx

g1 = df.groupby('col1').ngroup()
g2 = 'a'+df.groupby('col2').ngroup().astype(str)

# make graph and get connected components to form a mapping dictionary
G = nx.from_edgelist(zip(g1, g2))
d = {k:v for v,s in enumerate(nx.connected_components(G)) for k in s}

# find common group
group = g1.map(d)

df.groupby(group).ngroup()

output:

0    0
1    1
2    1
3    1
4    2
dtype: int64

graph:

enter image description here

Alternatively using from_pandas_edgelist:

G = nx.from_pandas_edgelist(df, source='col1', target='col2')
mapper = {n: i for i, g in enumerate(nx.connected_components(G)) for n in g}
df['group'] = df['col1'].map(mapper)

Output:

   col1 col2  group
0     1    a      0
1     2    b      1
2     2    c      1
3     3    c      1
4     4    d      2
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文