“为了” MATLAB 中的循环与矢量化

发布于 2024-12-06 11:40:34 字数 1624 浏览 1 评论 0原文

我正在 MATLAB 中进行编程,并且按照建议,我总是尝试使用矢量化。但最终程序进展得很慢。所以我发现在一个地方使用循环时代码明显更快(下面的示例)。

我想知道我是否误解了某些内容或做错了什么,因为在这种情况下性能很重要,而且我不想继续猜测矢量化或循环是否会更快。

% data initialization

k = 8;
n = 2^k+1;
h = 1/(n-1);
cw = 0.1;

iter = 10000;

uloc = zeros(n);
fploc = uloc;
uloc(2:end-1,2:end-1) = 1;
vloc = uloc;
ploc = ones(n);

uloc2 = zeros(n);
fploc2 = uloc2;
uloc2(2:end-1,2:end-1) = 1;
vloc2 = uloc2;
ploc2 = ones(n);

%%%%%%%%%%%%%%%%%%%%%%
% vectorized version %
%%%%%%%%%%%%%%%%%%%%%%
tic
for it=1:iter
    il=2:4;
    jl=2:4;
    fploc(il,jl) = h/6*(-uloc(il-1,jl-1) + uloc(il-1,jl)...
        -2*uloc(il,jl-1)+2*uloc(il,jl+1)...
        -uloc(il+1,jl) + uloc(il+1,jl+1)...
        ...
        -vloc(il-1,jl-1) - 2*vloc(il-1,jl)...
        +vloc(il,jl-1) - vloc(il,jl+1)...
        + 2*vloc(il+1,jl) + vloc(il+1,jl+1))...
        ...
        +cw*h^2*(-ploc(il-1,jl)-ploc(il,jl-1)+4*ploc(il,jl)...
        -ploc(il+1,jl)-ploc(il,jl+1));
end
toc


%%%%%%%%%%%%%%%%%%%%%%
%    loop version    %
%%%%%%%%%%%%%%%%%%%%%%
tic
for it=1:iter
    for il=2:4
        for jl=2:4
            fploc2(il,jl) = h/6*(-uloc2(il-1,jl-1) + uloc2(il-1,jl)...
                -2*uloc2(il,jl-1)+2*uloc2(il,jl+1)...
                -uloc2(il+1,jl) + uloc2(il+1,jl+1)...
                ...
                -vloc2(il-1,jl-1) - 2*vloc2(il-1,jl)...
                +vloc2(il,jl-1) - vloc2(il,jl+1)...
                + 2*vloc2(il+1,jl) + vloc2(il+1,jl+1))...
                ...
                +cw*h^2*(-ploc2(il-1,jl)-ploc2(il,jl-1)+4*ploc2(il,jl)...
                -ploc2(il+1,jl)-ploc2(il,jl+1));
        end
    end
end
toc

I was programming something in MATLAB and, as recommended, I am always trying to use vectorization. But in the end the program was quite slow. So I found out that in one place the code is significantly faster when using loops (example below).

I would like to know if I misinterpreted something or did something wrong, because performance is important in this case, and I don't want to keep guessing if vectorization or loops are going to be faster.

% data initialization

k = 8;
n = 2^k+1;
h = 1/(n-1);
cw = 0.1;

iter = 10000;

uloc = zeros(n);
fploc = uloc;
uloc(2:end-1,2:end-1) = 1;
vloc = uloc;
ploc = ones(n);

uloc2 = zeros(n);
fploc2 = uloc2;
uloc2(2:end-1,2:end-1) = 1;
vloc2 = uloc2;
ploc2 = ones(n);

%%%%%%%%%%%%%%%%%%%%%%
% vectorized version %
%%%%%%%%%%%%%%%%%%%%%%
tic
for it=1:iter
    il=2:4;
    jl=2:4;
    fploc(il,jl) = h/6*(-uloc(il-1,jl-1) + uloc(il-1,jl)...
        -2*uloc(il,jl-1)+2*uloc(il,jl+1)...
        -uloc(il+1,jl) + uloc(il+1,jl+1)...
        ...
        -vloc(il-1,jl-1) - 2*vloc(il-1,jl)...
        +vloc(il,jl-1) - vloc(il,jl+1)...
        + 2*vloc(il+1,jl) + vloc(il+1,jl+1))...
        ...
        +cw*h^2*(-ploc(il-1,jl)-ploc(il,jl-1)+4*ploc(il,jl)...
        -ploc(il+1,jl)-ploc(il,jl+1));
end
toc


%%%%%%%%%%%%%%%%%%%%%%
%    loop version    %
%%%%%%%%%%%%%%%%%%%%%%
tic
for it=1:iter
    for il=2:4
        for jl=2:4
            fploc2(il,jl) = h/6*(-uloc2(il-1,jl-1) + uloc2(il-1,jl)...
                -2*uloc2(il,jl-1)+2*uloc2(il,jl+1)...
                -uloc2(il+1,jl) + uloc2(il+1,jl+1)...
                ...
                -vloc2(il-1,jl-1) - 2*vloc2(il-1,jl)...
                +vloc2(il,jl-1) - vloc2(il,jl+1)...
                + 2*vloc2(il+1,jl) + vloc2(il+1,jl+1))...
                ...
                +cw*h^2*(-ploc2(il-1,jl)-ploc2(il,jl-1)+4*ploc2(il,jl)...
                -ploc2(il+1,jl)-ploc2(il,jl+1));
        end
    end
end
toc

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

乄_柒ぐ汐 2024-12-13 11:40:34

我没有仔细检查你的代码,但是最新版本的 Matlab 中的 JIT 编译器已经改进到你所面临的情况相当常见的程度 - 循环可以比矢量化代码更快。很难提前知道哪个会更快,因此最好的方法是以最自然的方式编写代码,分析它,然后如果存在瓶颈,尝试从循环切换到矢量化(或其他方式)。

I didn't go through your code, but the JIT compiler in recent versions of Matlab has improved to the point where the situation you're facing is quite common - loops can be faster than vectorized code. It is difficult to know in advance which will be faster, so the best approach is to write the code in the most natural fashion, profile it and then if there is a bottleneck, try switching from loops to vectorized (or the other way).

假装爱人 2024-12-13 11:40:34

MATLAB 的即时编译器 (JIT) 在过去几年中得到了显着改进。尽管您认为通常应该对代码进行矢量化是正确的,但根据我的经验,这仅适用于某些操作和函数,并且还取决于函数正在处理的数据量。

要找出最有效的方法,最好的方法是分析您的 MATLAB使用和不使用矢量化的代码

MATLAB's just in time compiler (JIT) has been improved significantly over the last couple years. And even though you are right that one should generally vectorize code, from my experience this is only true for certain operations and functions and also depends on how much data your functions are handling.

The best way for you to find out what works best, is to profile your MATLAB code with and without vectorization.

失眠症患者 2024-12-13 11:40:34

也许几个元素的矩阵并不能很好地测试矢量化效率。最终,这取决于应用程序的效果。

此外,通常矢量化代码看起来更好(更符合底层模型),但很多情况下并非如此,最终会损害实现。您所做的很棒,因为现在您知道什么最适合您。

Maybe a matrix of a few elements is not a good test for vectorization efficiency. In the end it depends on the application on what works well.

Also, usually vectorized code looks better (more true to underlying model), but it many cases it does not and it ends up hurting the implementation. What you did is great as now you know what works best for you.

半世晨晓 2024-12-13 11:40:34

我不会称之为矢量化。

您似乎正在执行某种过滤操作。这种滤波器的真正矢量化版本是原始数据乘以滤波器矩阵(即代表整个 for 循环的一个矩阵)。

这些矩阵的问题在于它们非常稀疏(对角线周围只有一些非零元素),以至于使用它们几乎没有效率。您可以使用稀疏命令,但即使如此,优雅的表示法可能也无法证明所需的额外内存是合理的。

Matlab 过去不擅长 for 循环,因为即使是循环计数器等仍然被视为复杂矩阵,因此每次迭代都会评估此类矩阵的所有检查。我的猜测是,在 for 循环内,每次应用滤波器系数时仍然会执行所有这些检查。

也许 matlab 函数 filterfilter2 在这里很有用?
您还可以阅读这篇文章:改进 MATLAB 矩阵构造代码:或者,为初学者编写矢量化代码

I would not call this vectorisation.

You appear to be doing some sort of filtering operation. A truly vectorised version of such a filter is the original data, multiplied by the filter matrix (that is, one matrix that represents the whole for-loop).

Problem with these matrices is that they are so sparse (only a few nonzero elements around the diagonal) that it's hardly ever efficient to use them. You can use the sparse command but even then, the elegance of the notation probably does not justify the extra memory required.

Matlab used to be bad at for loops because even the loop counters etc were still treated as complex matrices, so all the checks for such matrices were evaluated at every iteration. My guess is that inside your for loop, all those checks are still performed every time you apply the filter coefficients.

Perhaps the matlab functions filter and filter2 are useful here?
You may also ant to read this post: Improving MATLAB Matrix Construction Code : Or, code Vectorization for begginers

最丧也最甜 2024-12-13 11:40:34

一种可能的解释是启动开销。如果在幕后创建临时矩阵,请为内存分配做好准备。另外,我猜 MATLAB 无法推断出你的矩阵很小,因此会有循环开销。因此,您的矢量化版本可能最终会出现这样的代码:

double* tmp=(double*)malloc(n*sizeof(double));
for(size_t k=0;k<N;++k)
    {
//  Do stuff with elements
    }
free(tmp);

将其与已知数量的操作进行比较:

double temp[2];
temp[0]=...;
temp[1]=...;

因此,当 malloc-loopcounter-free 时间与每次计算的工作负载相比较长时,JIT 可能会更快。

One possible explanation is startup overhead. If a temporary matrix is created behind the scene, be prepared for memory allocations. Also, I guess MATLAB cannot deduce that your matrix is small so there will be loop overhead. So your vectorized version may end up in code like

double* tmp=(double*)malloc(n*sizeof(double));
for(size_t k=0;k<N;++k)
    {
//  Do stuff with elements
    }
free(tmp);

Compare this to a known number of operations:

double temp[2];
temp[0]=...;
temp[1]=...;

So JIT may be faster when the malloc-loopcounter-free time is long compared to the workload for each computation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文