下面的循环可以向量化吗?

发布于 2024-08-22 15:28:44 字数 1238 浏览 10 评论 0原文

我有一个 for 循环,它执行以下功能:

取一个 M x 8 矩阵,然后:

  1. 将其拆分为大小为 512 个元素的块(意味着矩阵的 X x 8 == 512,元素数量可以是 128,256,512,1024 ,2048)
  2. 将块重塑为 1 x 512(元素数)矩阵。
  3. 取矩阵的最后1/4,放在前面,
    例如 Data = [Data(1,385:512),Data(1,1:384)];

以下是我的代码:

for i = 1 : NumOfBlock  
    if i == 1  
        Header = tempHeader(1:RowNeeded,:);  
        Header = reshape(Header,1,BlockSize); %BS  
        Header = [Header(1,385:512),Header(1,1:384)]; %CP  
        Data = tempData(1:RowNeeded,:);  
        Data = reshape(Data,1,BlockSize); %BS  
        Data = [Data(1,385:512),Data(1,1:384)]; %CP  
        start = RowNeeded + 1;  
        end1 = RowNeeded * 2;  
    else  
        temp = tempData(start:end1,:);  
        temp = reshape(temp,1,BlockSize); %BS  
        temp = [temp(1,385:512),temp(1,1:384)]; %CP  
        Data = [Data, temp];  
    end

    if i <= 127 & i > 1
        temp = tempHeader(start:end1,:);
        temp = reshape(temp,1,BlockSize); %BS
        temp = [temp(1,385:512),temp(1,1:384)]; %CP
        Header = [Header, temp];
    end

    start = end1 + 1;
    end1=end1 + RowNeeded;  
end

使用 500 万个元素运行此循环将需要 1 个多小时。我需要它尽可能快(以秒为单位)。这个循环可以向量化吗?

I have a for-loop which performs the following function:

Take a M by 8 matrix and:

  1. Split it into blocks of size 512 elements (meaning X by 8 of the matrix == 512, and the number of elements can be 128,256,512,1024,2048)
  2. Reshape the block into 1 by 512 (Number of elements) matrix.
  3. Take the last 1/4 of the matrix and put it in front,
    e.g. Data = [Data(1,385:512),Data(1,1:384)];

The following is my code:

for i = 1 : NumOfBlock  
    if i == 1  
        Header = tempHeader(1:RowNeeded,:);  
        Header = reshape(Header,1,BlockSize); %BS  
        Header = [Header(1,385:512),Header(1,1:384)]; %CP  
        Data = tempData(1:RowNeeded,:);  
        Data = reshape(Data,1,BlockSize); %BS  
        Data = [Data(1,385:512),Data(1,1:384)]; %CP  
        start = RowNeeded + 1;  
        end1 = RowNeeded * 2;  
    else  
        temp = tempData(start:end1,:);  
        temp = reshape(temp,1,BlockSize); %BS  
        temp = [temp(1,385:512),temp(1,1:384)]; %CP  
        Data = [Data, temp];  
    end

    if i <= 127 & i > 1
        temp = tempHeader(start:end1,:);
        temp = reshape(temp,1,BlockSize); %BS
        temp = [temp(1,385:512),temp(1,1:384)]; %CP
        Header = [Header, temp];
    end

    start = end1 + 1;
    end1=end1 + RowNeeded;  
end

Running this loop with 5 million element will take more than 1 hour. I need it to be as fast as possible (in sec). Is this loop able to be vectorized?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

一紙繁鸢 2024-08-29 15:28:44

根据您的功能描述,这就是我的想法:

M = 320;           %# M must be divisble by (numberOfElements/8)
A = rand(M,8);     %# input matrix

num = 512;         %# numberOfElements
rows = num/8;      %# rows needed

%# equivalent to taking the last 1/4 and putting it in front
A = [A(:,7:8) A(:,1:6)];

%# break the matrix in blocks of size (x-by-8==512) into the third dimension
B = permute(reshape(A',[8 rows M/rows]),[2 1 3]);

%'# linearize everything
B = B(:);

此图可能有助于理解上述内容:

图表

Based on your function description, here's what I came up with:

M = 320;           %# M must be divisble by (numberOfElements/8)
A = rand(M,8);     %# input matrix

num = 512;         %# numberOfElements
rows = num/8;      %# rows needed

%# equivalent to taking the last 1/4 and putting it in front
A = [A(:,7:8) A(:,1:6)];

%# break the matrix in blocks of size (x-by-8==512) into the third dimension
B = permute(reshape(A',[8 rows M/rows]),[2 1 3]);

%'# linearize everything
B = B(:);

this diagram might help in understanding the above:

diagram

静谧 2024-08-29 15:28:44

矢量化可能有帮助,也可能没有帮助。了解瓶颈在哪里会有所帮助。使用此处概述的探查器:

http: //blogs.mathworks.com/videos/2006/10/19/profiler-to-find-code-bottlenecks/

Vectorizing may or may not help. What will help is knowing where the bottleneck is. Use the profiler as outlined here:

http://blogs.mathworks.com/videos/2006/10/19/profiler-to-find-code-bottlenecks/

再可℃爱ぅ一点好了 2024-08-29 15:28:44

如果你能告诉你你正在尝试做什么,那就太好了(我的猜测是动态系统中的一些模拟,但很难说)。

是的,当然可以矢量化:每个块实际上是四个子块;使用您的(极其非标准)索引:

1...128、129...256、257...384、385...512

向量化的每个内核/线程/无论您如何称呼它都应该执行以下操作:

i = threadIdx 介于 0 和 127 之间
温度 = 数据[1 + i]
数据[1 + i] = 数据[385+i]
数据[385 + i] = 数据[257+i]
数据[257 + i] = 数据[129+i]
data[129 + i] = temp

您当然还应该在块上并行化,而不仅仅是矢量化。

It would be nice if you'd tell what you are trying to do (my guess is some simulation in dynamical systems, but it's hard to tell).

yes, of course it can be vectorized: each of your blocks is actually four sub blocks; using your (extremely non standard) indices:

1...128, 129...256, 257...384, 385...512

Every kernel/thread/what-ever-you-call-it of the vectorization should do the following:

i = threadIdx is between 0 and 127
temp = data[1 + i]
data[1 + i] = data[385+i]
data[385 + i] = data[257+i]
data[257 + i] = data[129+i]
data[129 + i] = temp

You should of course also parallelize on blocks, not only vectorize.

小霸王臭丫头 2024-08-29 15:28:44

我要再次感谢 Amro 为我提供了如何解决我的问题的想法。很抱歉没有在问题中表达清楚。

这是我的问题的解决方案:

%#BS CDMA, Block size 128,512,1024,2048  
  BlockSize = 512;  
  RowNeeded = BlockSize / 8;  
  TotalRows = size(tempData);  
  TotalRows = TotalRows(1,1);  
  NumOfBlock = TotalRows / RowNeeded;  
  CPSize = BlockSize / 4;  

%#spilt into blocks  
  Header = reshape(tempHeader',[RowNeeded,8, 128]);  
  Data = reshape(tempData',[RowNeeded,8, NumOfBlock]);  
  clear tempData tempHeader;  

%#block spread & cyclic prefix  
    K = zeros([1,BlockSize,128],'single');  
    L = zeros([1,BlockSize,NumOfBlock],'single');  
       for i = 1:NumOfBlock  
           if i <= 128  
              K(:,:,i) = reshape(Header(:,:,i),[1,BlockSize]);  
              K(:,:,i) = [K((CPSize*3)+1:BlockSize),K(1:CPSize*3)];
           end  
           L(:,:,i) = reshape(Data(:,:,i),[1,BlockSize]);  
           L(:,:,i) = [L((CPSize*3)+1:BlockSize),L(1:CPSize*3)];
        end

Once again I would like to thanks Amro for giving me an idea on how to solve my question. Sorry for not making myself clear in the question.

Here is my solution to my problem:

%#BS CDMA, Block size 128,512,1024,2048  
  BlockSize = 512;  
  RowNeeded = BlockSize / 8;  
  TotalRows = size(tempData);  
  TotalRows = TotalRows(1,1);  
  NumOfBlock = TotalRows / RowNeeded;  
  CPSize = BlockSize / 4;  

%#spilt into blocks  
  Header = reshape(tempHeader',[RowNeeded,8, 128]);  
  Data = reshape(tempData',[RowNeeded,8, NumOfBlock]);  
  clear tempData tempHeader;  

%#block spread & cyclic prefix  
    K = zeros([1,BlockSize,128],'single');  
    L = zeros([1,BlockSize,NumOfBlock],'single');  
       for i = 1:NumOfBlock  
           if i <= 128  
              K(:,:,i) = reshape(Header(:,:,i),[1,BlockSize]);  
              K(:,:,i) = [K((CPSize*3)+1:BlockSize),K(1:CPSize*3)];
           end  
           L(:,:,i) = reshape(Data(:,:,i),[1,BlockSize]);  
           L(:,:,i) = [L((CPSize*3)+1:BlockSize),L(1:CPSize*3)];
        end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文