通过在熊猫数据框架中收集的索引分配矩阵元素
我正在尝试为社交网络构建一个隶属关系矩阵。我有一个pd dataframe,其中列i
是元素的i索引,列j
是元素的J索引。列v
是两个节点之间的权重值。
我编造了下表以进行演示。我只称它为df
i | J | V |
---|---|---|
1 | 3 | 0 |
2 | 4 | 2 |
5 | 3 | 0 |
2 | 1 | 2 |
1 | 2 | 0.5 |
3 | 1 | 1 |
我的想法是先构造一个矩阵
A_matrix = np.zeros((i_num, j_num))
然后我使用应用功能
df.apply(set_to_matrix)
,
def set_to_matrix(row):
A_matrix[row.i, row.j] = row.v
我的问题是,有可能获得更好的表现吗?
我有i_num = 100000,j_num = 1000;有了上面的代码,我花了1分钟53秒。
我尝试使用swifter
软件包来加快应用功能,但事实证明是2分钟23秒,这更长。
如果可能的话,也让我知道为什么我的原因较慢,其他方法如何可能加快流程。
I am trying to construct an affiliation matrix for a social network. I have a pd dataframe where column i
is the i index of an element and column j
is the j index of an element. Column v
is the value of weight between two nodes.
I made up the following table for demonstration. I'll just call it df
i | j | v |
---|---|---|
1 | 3 | 0 |
2 | 4 | 2 |
5 | 3 | 0 |
2 | 1 | 2 |
1 | 2 | 0.5 |
3 | 1 | 1 |
My idea was to first construct a matrix
A_matrix = np.zeros((i_num, j_num))
Then I use the apply function
df.apply(set_to_matrix)
where
def set_to_matrix(row):
A_matrix[row.i, row.j] = row.v
My question is, Is it possible to get a better performance?
I have i_num = 100000 and j_num = 1000; with the code above it took me 1 minute 53 sec.
I tried using the swifter
package to speed up the apply function, but it turns out to be 2 minutes 23 sec, which is longer.
If possible, also let me know why mine is slower and how other approach can potentially speed up the process.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
无需使用
应用
,您可以使用i
和j
列在a_matrix
中进行索引。从v
列分配值的值到相应的索引位置:There is no need to use
apply
, you can use thei
andj
columns to index inside theA_matrix
then assign the values fromv
column to the corresponding index positions:您的代码对我不起作用&我没有花时间调试它。以下代码将为您提供您所需的矩阵。唯一的问题是重复的行(
1& 2
)和列(1& 3
)将合并在一起(对我来说,这很有意义!)。最终网络矩阵:
Your code is not working for me & I didn't spend time to debug it. The following code will give you the matrix you require pretty quickly. The only issue is the duplicate rows (
1 & 2
) and columns (1& 3
) will be combined together (& to me it makes sense!).Final network matrix: