NetworkX最大的连接组件共享属性

发布于 2025-01-25 19:18:57 字数 196 浏览 4 评论 0原文

我知道存在计算NetworkX中图的连接组件的大小的功能。您可以将属性添加到节点。 在Axelrod用于传播培养物的模型中,一个有趣的测量是最大的连接组件的大小,其节点具有多个属性。在NetworkX中有办法做到这一点吗? 例如,假设我们通过网络代表人口。每个节点都有头发颜色和肤色的属性。如何获得节点中最大的组成部分的大小,以便在该子图中和每个节点具有相同的头发和肤色? 谢谢

I know there exist functions for computing the size of the connected components of a graph in NetworkX. You can add attributes to a node.
In Axelrod's model for dissemination of culture, an interesting measurement is the size of the largest connected component whose nodes share several attributes. Is there a way of doing that in NetworkX?
For example, let's say we have a population represented through a network. Each node has attributes of hair color and skin color. How can I get the size of the largest component of nodes such that in that subgraph each and every node has the same hair and skin color?
Thank you

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

野稚 2025-02-01 19:18:57

对于一般数据分析,最好使用pandas。使用图形库,例如networkxGraph-tool确定连接的组件,然后将该信息加载到您可以分析的dataframe中。在这种情况下,PANDAS groupbynunique(唯一元素的数量)功能将很有用。

这是一个使用graph-tool的独立示例(使用此网络)。您也可以通过NetworkX来计算连接的组件。

import numpy as np
import pandas as pd
import graph_tool.all as gt

# Download an example graph
# https://networks.skewed.de/net/baseball
g = gt.collection.ns["baseball", 'user-provider']

# Extract the player names
names = g.vertex_properties['name'].get_2d_array([0])[0]

# Extract connected component ID for each node
cc, cc_sizes = gt.label_components(g)

# Load into a DataFrame
players = pd.DataFrame({
    'id': np.arange(g.num_vertices()),
    'name': names,
    'cc': cc.a
})

# Create some random attributes
players['hair'] = np.random.choice(['purple', 'pink'], size=len(players))
players['skin'] = np.random.choice(['green', 'blue'], size=len(players))

# For the sake of this example, manipulate the data so
# that some groups are homogenous with respect to some attributes.
players.loc[players['cc'] == 2, 'hair'] = 'purple'
players.loc[players['cc'] == 2, 'skin'] = 'blue'

players.loc[players['cc'] == 4, 'hair'] = 'pink'
players.loc[players['cc'] == 4, 'skin'] = 'green'

# Now determine how many unique hair and skin colors we have in each group.
group_stats = players.groupby('cc').agg({
    'hair': 'nunique',
    'skin': ['nunique', 'size']
})

# Simplify the column names
group_stats.columns = ['hair_colors', 'skin_colors', 'player_count']

# Select homogenous groups, i.e. groups for which only 1 unique
# hair color is present and 1 unique skin color is present
homogenous = group_stats.query('hair_colors == 1 and skin_colors == 1')

# Sort from large groups to small groups
homogenous = homogenous.sort_values('player_count', ascending=False)
print(homogenous)

打印以下内容:

    hair_colors  skin_colors  player_count
cc
4             1            1             4
2             1            1             3

For general data analysis, it's best to use pandas. Use a graph library like networkx or graph-tool to determine the connected components, and then load that info into a DataFrame that you can analyze. In this case, the pandas groupby and nunique (number of unique elements) features will be useful.

Here's a self-contained example using graph-tool (using this network). You could also compute the connected components via networkx.

import numpy as np
import pandas as pd
import graph_tool.all as gt

# Download an example graph
# https://networks.skewed.de/net/baseball
g = gt.collection.ns["baseball", 'user-provider']

# Extract the player names
names = g.vertex_properties['name'].get_2d_array([0])[0]

# Extract connected component ID for each node
cc, cc_sizes = gt.label_components(g)

# Load into a DataFrame
players = pd.DataFrame({
    'id': np.arange(g.num_vertices()),
    'name': names,
    'cc': cc.a
})

# Create some random attributes
players['hair'] = np.random.choice(['purple', 'pink'], size=len(players))
players['skin'] = np.random.choice(['green', 'blue'], size=len(players))

# For the sake of this example, manipulate the data so
# that some groups are homogenous with respect to some attributes.
players.loc[players['cc'] == 2, 'hair'] = 'purple'
players.loc[players['cc'] == 2, 'skin'] = 'blue'

players.loc[players['cc'] == 4, 'hair'] = 'pink'
players.loc[players['cc'] == 4, 'skin'] = 'green'

# Now determine how many unique hair and skin colors we have in each group.
group_stats = players.groupby('cc').agg({
    'hair': 'nunique',
    'skin': ['nunique', 'size']
})

# Simplify the column names
group_stats.columns = ['hair_colors', 'skin_colors', 'player_count']

# Select homogenous groups, i.e. groups for which only 1 unique
# hair color is present and 1 unique skin color is present
homogenous = group_stats.query('hair_colors == 1 and skin_colors == 1')

# Sort from large groups to small groups
homogenous = homogenous.sort_values('player_count', ascending=False)
print(homogenous)

That prints the following:

    hair_colors  skin_colors  player_count
cc
4             1            1             4
2             1            1             3
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文