MATLAB 循环优化

发布于 2024-11-26 15:56:50 字数 704 浏览 1 评论 0原文

我有一个矩阵，matrix_logic(50000,100000)，它是一个稀疏逻辑矩阵（很多错误，一些正确）。我必须生成一个矩阵，相交(50000,50000)，对于每对i,j，matrix_逻辑(50000,100000)行，存储行 i 和 j 的值均为“true”的列数。

这是我写的代码：

% store in advance the nonzeros cols
for i=1:50000
    nonzeros{i} = num2cell(find(matrix_logical(i,:)));
end

intersect = zeros(50000,50000);

for i=1:49999
    a = cell2mat(nonzeros{i});
    for j=(i+1):50000
        b = cell2mat(nonzeros{j});
        intersect(i,j) = numel(intersect(a,b));
    end
end

是否可以进一步提高性能？计算矩阵花费的时间太长。我想避免代码第二部分中的双循环。

matrix_logic 是稀疏的，但在 MATLAB 中不会将其保存为稀疏，否则性能会变得最差。

原文

I have a matrix, matrix_logical(50000,100000), that is a sparse logical matrix (a lot of falses, some true). I have to produce a matrix, intersect(50000,50000), that, for each pair, i,j, of rows of matrix_logical(50000,100000), stores the number of columns for which rows i and j have both "true" as the value.

Here is the code I wrote:

% store in advance the nonzeros cols
for i=1:50000
    nonzeros{i} = num2cell(find(matrix_logical(i,:)));
end

intersect = zeros(50000,50000);

for i=1:49999
    a = cell2mat(nonzeros{i});
    for j=(i+1):50000
        b = cell2mat(nonzeros{j});
        intersect(i,j) = numel(intersect(a,b));
    end
end

Is it possible to further increase the performance? It takes too long to compute the matrix. I would like to avoid the double loop in the second part of the code.

matrix_logical is sparse, but it is not saved as sparse in MATLAB because otherwise the performance become the worst possible.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

遮云壑 2024-12-03 15:56:50

由于 [i,j] 条目计算行 i 和 j 的元素乘法中非零元素的数量，因此您可以通过将矩阵逻辑与其转置相乘来完成此操作（您应该转换为数字数据类型优先，例如matrix_logic = single(matrix_logic))：

inter = matrix_logical * matrix_logical';

它适用于稀疏表示或完整表示。

编辑

为了计算numel(intersect(a,b))/numel(union(a,b));（按照您的评论中的要求），您可以使用事实上，对于两个集合 a 和 b，您有

length(union(a,b)) = length(a) + length(b) - length(intersect(a,b))

这样的情况，您可以执行以下操作：

unLen = sum(matrix_logical,2);
tmp = repmat(unLen, 1, length(unLen)) + repmat(unLen', length(unLen), 1);
inter = matrix_logical * matrix_logical';
inter = inter ./ (tmp-inter);

Since the [i,j] entry counts the number of non zero elements in the element-wise multiplication of rows i and j, you can do it by multiplying matrix_logical with its transpose (you should convert to numeric data type first, e.g matrix_logical = single(matrix_logical)):

inter = matrix_logical * matrix_logical';

And it works both for sparse or full representation.

EDIT

In order to calculate numel(intersect(a,b))/numel(union(a,b)); (as asked in your comment), you can use the fact that for two sets a and b, you have

length(union(a,b)) = length(a) + length(b) - length(intersect(a,b))

so, you can do the following:

unLen = sum(matrix_logical,2);
tmp = repmat(unLen, 1, length(unLen)) + repmat(unLen', length(unLen), 1);
inter = matrix_logical * matrix_logical';
inter = inter ./ (tmp-inter);

回复收藏 0 原文

水染的天色ゝ 2024-12-03 15:56:50

如果我理解正确，您需要对行进行逻辑与：

intersct = zeros(50000, 50000)
for ii = 1:49999
    for jj = ii:50000
        intersct(ii, jj) = sum(matrix_logical(ii, :) & matrix_logical(jj, :));
        intersct(jj, ii) = intersct(ii, jj);
    end
end

不能避免双循环，但至少可以在没有第一个循环和缓慢的查找命令的情况下工作。

If I understood you correctly, you want a logical AND of the rows:

intersct = zeros(50000, 50000)
for ii = 1:49999
    for jj = ii:50000
        intersct(ii, jj) = sum(matrix_logical(ii, :) & matrix_logical(jj, :));
        intersct(jj, ii) = intersct(ii, jj);
    end
end

Doesn't avoid the double loop, but at least works without the first loop and the slow find command.

回复收藏 0 原文

吃不饱 2024-12-03 15:56:50

详细说明我的评论，这里是一个适合 pdist() 的距离函数

function out = distfun(xi,xj)
    out = zeros(size(xj,1),1);
    for i=1:size(xj,1)
        out(i) = sum(sum( xi & xj(i,:) )) / sum(sum( xi | xj(i,:) ));
    end

根据我的经验，sum(sum()) 对于逻辑来说比 nnz() 更快，因此它的外观如上所示。

您还需要使用 squareform() 适当地重塑 pdist() 的输出：

squareform(pdist(martrix_logical,@distfun));

请注意，pdist() 包含一个 'jaccard' 距离度量，但它实际上是 Jaccard 距离，而不是 Jaccard 指数或系数，后者是值显然你在追赶。

Elaborating on my comment, here is a distance function suitable for pdist()

function out = distfun(xi,xj)
    out = zeros(size(xj,1),1);
    for i=1:size(xj,1)
        out(i) = sum(sum( xi & xj(i,:) )) / sum(sum( xi | xj(i,:) ));
    end

In my experience, sum(sum()) is faster for logicals than nnz(), thus its appearance above.

You would also need to use squareform() to reshape the output of pdist() appropriately:

squareform(pdist(martrix_logical,@distfun));

Note that pdist() includes a 'jaccard' distance measure, but it is actually the Jaccard distance and not the Jaccard index or coefficient, which is the value you are apparently after.

回复收藏 0 原文

~没有更多了~