Matlab 中的矢量化循环 - 性能问题

发布于 2024-09-02 19:53:12 字数 1854 浏览 8 评论 0原文

这个问题与这两个相关:
MATLAB 向量化简介 - 有什么好的教程吗?
同时使用两个数组中的元素的过滤器时间

根据我阅读的教程,我试图对一些需要花费大量时间的过程进行矢量化。

我已经将其重写

function B = bfltGray(A,w,sigma_r)
dim = size(A);
B = zeros(dim);
for i = 1:dim(1)
    for j = 1:dim(2)

        % Extract local region.
        iMin = max(i-w,1);
        iMax = min(i+w,dim(1));
        jMin = max(j-w,1);
        jMax = min(j+w,dim(2));
        I = A(iMin:iMax,jMin:jMax);

        % Compute Gaussian intensity weights.
        F = exp(-0.5*(abs(I-A(i,j))/sigma_r).^2);
        B(i,j) = sum(F(:).*I(:))/sum(F(:));

    end
end

为:

function B = rngVect(A, w, sigma)
W = 2*w+1;
I = padarray(A, [w,w],'symmetric');
I = im2col(I, [W,W]);
H = exp(-0.5*(abs(I-repmat(A(:)', size(I,1),1))/sigma).^2);
B = reshape(sum(H.*I,1)./sum(H,1), size(A, 1), []);

Where
A 是一个 512x512 矩阵
w 是窗口大小的一半,通常等于 5
sigma 是 [0 1] 范围内的参数(通常为:0.1、0.2 或 0.3 之一)
所以 I 矩阵将有 512x512x121 = 31719424 个元素

,但是这个版本似乎和第一个版本一样慢,而且它使用了大量的内存,有时会导致内存问题。

我想我做错了什么。可能存在一些关于矢量化的逻辑错误。好吧,事实上我并不感到惊讶 - 这种方法创建了非常大的矩阵,并且计算可能成比例地更长。

我还尝试使用 nlfilter 编写它(类似于 Jonas 给出的第二个解决方案),但似乎很难,因为我使用 Matlab 6.5 (R13)(没有可用的复杂函数句柄)。

因此,我再次要求的不是现成的解决方案,而是一些可以帮助我在合理的时间内解决此问题的想法。也许你会指出我做错了什么。

编辑:
正如 Mikhail 所建议的,分析结果如下:
65% 的时间花在 H= exp(...)
行上 25% 的时间被 im2col 使用

This question is related to these two:
Introduction to vectorizing in MATLAB - any good tutorials?
filter that uses elements from two arrays at the same time

Basing on the tutorials I read, I was trying to vectorize some procedure that takes really a lot of time.

I've rewritten this:

function B = bfltGray(A,w,sigma_r)
dim = size(A);
B = zeros(dim);
for i = 1:dim(1)
    for j = 1:dim(2)

        % Extract local region.
        iMin = max(i-w,1);
        iMax = min(i+w,dim(1));
        jMin = max(j-w,1);
        jMax = min(j+w,dim(2));
        I = A(iMin:iMax,jMin:jMax);

        % Compute Gaussian intensity weights.
        F = exp(-0.5*(abs(I-A(i,j))/sigma_r).^2);
        B(i,j) = sum(F(:).*I(:))/sum(F(:));

    end
end

into this:

function B = rngVect(A, w, sigma)
W = 2*w+1;
I = padarray(A, [w,w],'symmetric');
I = im2col(I, [W,W]);
H = exp(-0.5*(abs(I-repmat(A(:)', size(I,1),1))/sigma).^2);
B = reshape(sum(H.*I,1)./sum(H,1), size(A, 1), []);

Where
A is a matrix 512x512
w is half of the window size, usually equal 5
sigma is a parameter in range [0 1] (usually one of: 0.1, 0.2 or 0.3)
So the I matrix would have 512x512x121 = 31719424 elements

But this version seems to be as slow as the first one, but in addition it uses a lot of memory and sometimes causes memory problems.

I suppose I've made something wrong. Probably some logic mistake regarding vectorizing. Well, in fact I'm not surprised - this method creates really big matrices and probably the computations are proportionally longer.

I have also tried to write it using nlfilter (similar to the second solution given by Jonas) but it seems to be hard since I use Matlab 6.5 (R13) (there are no sophisticated function handles available).

So once again, I'm asking not for ready solution, but for some ideas that would help me to solve this in reasonable time. Maybe you will point me what I did wrong.

Edit:
As Mikhail suggested, the results of profiling are as follows:
65% of time was spent in the line H= exp(...)
25% of time was used by im2col

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

只涨不跌 2024-09-09 19:53:12

I 和 H 有多大(即 numel(I)*8 字节)?如果您开始分页,那么第二个解决方案的性能将受到非常严重的影响。

要测试是否确实因数组太大而出现问题,可以尝试使用数组 Atic 和 toc 来测量计算速度代码> 的大小不断增加。如果执行时间的增长速度快于 A 大小的平方,或者执行时间在 A 的某个大小处跳跃,您可以尝试拆分填充的 I分成许多子数组并执行类似的计算。

否则,我没有看到任何明显的地方会让你浪费很多时间。好吧,也许您可​​以通过在函数中将 B 替换为 A (也节省一点内存)来跳过重塑,然后编写
A(:) = sum(H.*I,1)./sum(H,1);

您可能还想考虑升级到更新版本的 Matlab - 它们已经起作用了努力提高性能。

How big are I and H (i.e. numel(I)*8 bytes)? If you start paging, then the performance of your second solution is going to be affected very badly.

To test whether you really have a problem due to too large arrays, you can try and measure the speed of the calculation using tic and toc for arrays A of increasing size. If the execution time increases faster than by the square of the size of A, or if the execution time jumps at some size of A, you can try and split the padded I into a number of sub-arrays and perform the calculations like that.

Otherwise, I don't see any obvious places where you could be losing lots of time. Well, maybe you could skip the reshape, by replacing B with A in your function (saves a little memory as well), and writing
A(:) = sum(H.*I,1)./sum(H,1);

You may also want to look into upgrading to a more recent version of Matlab - they've worked hard on improving performance.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文