通过在熊猫数据框架中收集的索引分配矩阵元素

发布于 2025-02-08 16:24:44 字数 1585 浏览 2 评论 0原文

我正在尝试为社交网络构建一个隶属关系矩阵。我有一个pd dataframe，其中列i是元素的i索引，列j是元素的J索引。列v是两个节点之间的权重值。

我编造了下表以进行演示。我只称它为df

i	J	V
1	3	0
2	4	2
5	3	0
2	1	2
1	2	0.5
3	1	1

我的想法是先构造一个矩阵

A_matrix = np.zeros((i_num, j_num))

然后我使用应用功能

df.apply(set_to_matrix)

，

def set_to_matrix(row):
    A_matrix[row.i, row.j] = row.v

我的问题是，有可能获得更好的表现吗？

我有i_num = 100000，j_num = 1000;有了上面的代码，我花了1分钟53秒。

我尝试使用swifter软件包来加快应用功能，但事实证明是2分钟23秒，这更长。

如果可能的话，也让我知道为什么我的原因较慢，其他方法如何可能加快流程。

原文

I am trying to construct an affiliation matrix for a social network. I have a pd dataframe where column i is the i index of an element and column j is the j index of an element. Column v is the value of weight between two nodes.

I made up the following table for demonstration. I'll just call it df

i	j	v
1	3	0
2	4	2
5	3	0
2	1	2
1	2	0.5
3	1	1

My idea was to first construct a matrix

A_matrix = np.zeros((i_num, j_num))

Then I use the apply function

df.apply(set_to_matrix)

where

def set_to_matrix(row):
    A_matrix[row.i, row.j] = row.v

My question is, Is it possible to get a better performance?

I have i_num = 100000 and j_num = 1000; with the code above it took me 1 minute 53 sec.

I tried using the swifter package to speed up the apply function, but it turns out to be 2 minutes 23 sec, which is longer.

If possible, also let me know why mine is slower and how other approach can potentially speed up the process.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

百变从容 2025-02-15 16:24:44

无需使用应用，您可以使用i和j列在a_matrix中进行索引。从v列分配值的值到相应的索引位置：

A_matrix = np.zeros((i_num, j_num)) 
A_matrix[df.i, df.j] = df.v

There is no need to use apply, you can use the i and j columns to index inside the A_matrix then assign the values from v column to the corresponding index positions:

A_matrix = np.zeros((i_num, j_num)) 
A_matrix[df.i, df.j] = df.v

回复收藏 0 原文

少跟Wǒ拽 2025-02-15 16:24:44

您的代码对我不起作用＆amp;我没有花时间调试它。以下代码将为您提供您所需的矩阵。唯一的问题是重复的行（1＆amp; 2）和列（1＆amp; 3）将合并在一起（对我来说，这很有意义！）。

df = pd.DataFrame({'i': [1,2,5,2,1,3],
                    'j': [3,4,3,1,2,1],
                    'v': [0,2,0,2,0.5,1]})

df1 = pd.pivot_table(df, values='v',index='i', columns='j', aggfunc=np.mean).reset_index().fillna(0)

最终网络矩阵：

print(df1.to_numpy())

Your code is not working for me & I didn't spend time to debug it. The following code will give you the matrix you require pretty quickly. The only issue is the duplicate rows (1 & 2) and columns (1& 3) will be combined together (& to me it makes sense!).

df = pd.DataFrame({'i': [1,2,5,2,1,3],
                    'j': [3,4,3,1,2,1],
                    'v': [0,2,0,2,0.5,1]})

df1 = pd.pivot_table(df, values='v',index='i', columns='j', aggfunc=np.mean).reset_index().fillna(0)

Final network matrix: