MATLAB 循环优化

发布于 2024-11-26 15:56:50 字数 704 浏览 1 评论 0原文

我有一个矩阵,ma​​trix_logic(50000,100000),它是一个稀疏逻辑矩阵(很多错误,一些正确)。我必须生成一个矩阵,相交(50000,50000),对于每对i,jma​​trix_逻辑(50000,100000)行,存储行 ij 的值均为“true”的列数。

这是我写的代码:

% store in advance the nonzeros cols
for i=1:50000
    nonzeros{i} = num2cell(find(matrix_logical(i,:)));
end

intersect = zeros(50000,50000);

for i=1:49999
    a = cell2mat(nonzeros{i});
    for j=(i+1):50000
        b = cell2mat(nonzeros{j});
        intersect(i,j) = numel(intersect(a,b));
    end
end

是否可以进一步提高性能?计算矩阵花费的时间太长。我想避免代码第二部分中的双循环。

ma​​trix_logic 是稀疏的,但在 MATLAB 中不会将其保存为稀疏,否则性能会变得最差。

I have a matrix, matrix_logical(50000,100000), that is a sparse logical matrix (a lot of falses, some true). I have to produce a matrix, intersect(50000,50000), that, for each pair, i,j, of rows of matrix_logical(50000,100000), stores the number of columns for which rows i and j have both "true" as the value.

Here is the code I wrote:

% store in advance the nonzeros cols
for i=1:50000
    nonzeros{i} = num2cell(find(matrix_logical(i,:)));
end

intersect = zeros(50000,50000);

for i=1:49999
    a = cell2mat(nonzeros{i});
    for j=(i+1):50000
        b = cell2mat(nonzeros{j});
        intersect(i,j) = numel(intersect(a,b));
    end
end

Is it possible to further increase the performance? It takes too long to compute the matrix. I would like to avoid the double loop in the second part of the code.

matrix_logical is sparse, but it is not saved as sparse in MATLAB because otherwise the performance become the worst possible.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

遮云壑 2024-12-03 15:56:50

由于 [i,j] 条目计算行 i 和 j 的元素乘法中非零元素的数量,因此您可以通过将矩阵逻辑与其转置相乘来完成此操作(您应该转换为数字数据类型优先,例如matrix_logic = single(matrix_logic)):

inter = matrix_logical * matrix_logical';

它适用于稀疏表示或完整表示。

编辑

为了计算numel(intersect(a,b))/numel(union(a,b));(按照您的评论中的要求),您可以使用事实上,对于两个集合 ab,您有

length(union(a,b)) = length(a) + length(b) - length(intersect(a,b))

这样的情况,您可以执行以下操作:

unLen = sum(matrix_logical,2);
tmp = repmat(unLen, 1, length(unLen)) + repmat(unLen', length(unLen), 1);
inter = matrix_logical * matrix_logical';
inter = inter ./ (tmp-inter);

Since the [i,j] entry counts the number of non zero elements in the element-wise multiplication of rows i and j, you can do it by multiplying matrix_logical with its transpose (you should convert to numeric data type first, e.g matrix_logical = single(matrix_logical)):

inter = matrix_logical * matrix_logical';

And it works both for sparse or full representation.

EDIT

In order to calculate numel(intersect(a,b))/numel(union(a,b)); (as asked in your comment), you can use the fact that for two sets a and b, you have

length(union(a,b)) = length(a) + length(b) - length(intersect(a,b))

so, you can do the following:

unLen = sum(matrix_logical,2);
tmp = repmat(unLen, 1, length(unLen)) + repmat(unLen', length(unLen), 1);
inter = matrix_logical * matrix_logical';
inter = inter ./ (tmp-inter);
水染的天色ゝ 2024-12-03 15:56:50

如果我理解正确,您需要对行进行逻辑与:

intersct = zeros(50000, 50000)
for ii = 1:49999
    for jj = ii:50000
        intersct(ii, jj) = sum(matrix_logical(ii, :) & matrix_logical(jj, :));
        intersct(jj, ii) = intersct(ii, jj);
    end
end

不能避免双循环,但至少可以在没有第一个循环和缓慢的查找命令的情况下工作。

If I understood you correctly, you want a logical AND of the rows:

intersct = zeros(50000, 50000)
for ii = 1:49999
    for jj = ii:50000
        intersct(ii, jj) = sum(matrix_logical(ii, :) & matrix_logical(jj, :));
        intersct(jj, ii) = intersct(ii, jj);
    end
end

Doesn't avoid the double loop, but at least works without the first loop and the slow find command.

吃不饱 2024-12-03 15:56:50

详细说明我的评论,这里是一个适合 pdist() 的距离函数

function out = distfun(xi,xj)
    out = zeros(size(xj,1),1);
    for i=1:size(xj,1)
        out(i) = sum(sum( xi & xj(i,:) )) / sum(sum( xi | xj(i,:) ));
    end

根据我的经验,sum(sum()) 对于逻辑来说比 nnz() 更快,因此它的外观如上所示。

您还需要使用 squareform() 适当地重塑 pdist() 的输出:

squareform(pdist(martrix_logical,@distfun));

请注意,pdist() 包含一个 'jaccard' 距离度量,但它实际上是 Jaccard 距离,而不是 Jaccard 指数系数,后者是值显然你在追赶。

Elaborating on my comment, here is a distance function suitable for pdist()

function out = distfun(xi,xj)
    out = zeros(size(xj,1),1);
    for i=1:size(xj,1)
        out(i) = sum(sum( xi & xj(i,:) )) / sum(sum( xi | xj(i,:) ));
    end

In my experience, sum(sum()) is faster for logicals than nnz(), thus its appearance above.

You would also need to use squareform() to reshape the output of pdist() appropriately:

squareform(pdist(martrix_logical,@distfun));

Note that pdist() includes a 'jaccard' distance measure, but it is actually the Jaccard distance and not the Jaccard index or coefficient, which is the value you are apparently after.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文