MATLAB 循环优化
我有一个矩阵,matrix_logic(50000,100000),它是一个稀疏逻辑矩阵(很多错误,一些正确)。我必须生成一个矩阵,相交(50000,50000),对于每对i,j,matrix_逻辑(50000,100000)行,存储行 i 和 j 的值均为“true”的列数。
这是我写的代码:
% store in advance the nonzeros cols
for i=1:50000
nonzeros{i} = num2cell(find(matrix_logical(i,:)));
end
intersect = zeros(50000,50000);
for i=1:49999
a = cell2mat(nonzeros{i});
for j=(i+1):50000
b = cell2mat(nonzeros{j});
intersect(i,j) = numel(intersect(a,b));
end
end
是否可以进一步提高性能?计算矩阵花费的时间太长。我想避免代码第二部分中的双循环。
matrix_logic 是稀疏的,但在 MATLAB 中不会将其保存为稀疏,否则性能会变得最差。
I have a matrix, matrix_logical(50000,100000), that is a sparse logical matrix (a lot of falses, some true). I have to produce a matrix, intersect(50000,50000), that, for each pair, i,j, of rows of matrix_logical(50000,100000), stores the number of columns for which rows i and j have both "true" as the value.
Here is the code I wrote:
% store in advance the nonzeros cols
for i=1:50000
nonzeros{i} = num2cell(find(matrix_logical(i,:)));
end
intersect = zeros(50000,50000);
for i=1:49999
a = cell2mat(nonzeros{i});
for j=(i+1):50000
b = cell2mat(nonzeros{j});
intersect(i,j) = numel(intersect(a,b));
end
end
Is it possible to further increase the performance? It takes too long to compute the matrix. I would like to avoid the double loop in the second part of the code.
matrix_logical is sparse, but it is not saved as sparse in MATLAB because otherwise the performance become the worst possible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
由于 [i,j] 条目计算行 i 和 j 的元素乘法中非零元素的数量,因此您可以通过将矩阵逻辑与其转置相乘来完成此操作(您应该转换为数字数据类型优先,例如
matrix_logic = single(matrix_logic)
):它适用于稀疏表示或完整表示。
编辑
为了计算
numel(intersect(a,b))/numel(union(a,b));
(按照您的评论中的要求),您可以使用事实上,对于两个集合a
和b
,您有这样的情况,您可以执行以下操作:
Since the [i,j] entry counts the number of non zero elements in the element-wise multiplication of rows i and j, you can do it by multiplying
matrix_logical
with its transpose (you should convert to numeric data type first, e.gmatrix_logical = single(matrix_logical)
):And it works both for sparse or full representation.
EDIT
In order to calculate
numel(intersect(a,b))/numel(union(a,b));
(as asked in your comment), you can use the fact that for two setsa
andb
, you haveso, you can do the following:
如果我理解正确,您需要对行进行逻辑与:
不能避免双循环,但至少可以在没有第一个循环和缓慢的查找命令的情况下工作。
If I understood you correctly, you want a logical AND of the rows:
Doesn't avoid the double loop, but at least works without the first loop and the slow find command.
详细说明我的评论,这里是一个适合
pdist()
的距离函数根据我的经验,
sum(sum())
对于逻辑来说比nnz() 更快
,因此它的外观如上所示。您还需要使用
squareform()
适当地重塑pdist()
的输出:请注意,
pdist()
包含一个'jaccard'
距离度量,但它实际上是 Jaccard 距离,而不是 Jaccard 指数 或系数,后者是值显然你在追赶。Elaborating on my comment, here is a distance function suitable for
pdist()
In my experience,
sum(sum())
is faster for logicals thannnz()
, thus its appearance above.You would also need to use
squareform()
to reshape the output ofpdist()
appropriately:Note that
pdist()
includes a'jaccard'
distance measure, but it is actually the Jaccard distance and not the Jaccard index or coefficient, which is the value you are apparently after.