对大量矩阵进行并行化或向量化所有对抗操作？

发布于 2024-09-02 09:02:17 字数 1299 浏览 15 评论 0原文

我有大约 5,000 个矩阵，它们具有相同的行数和不同的列数 (20 x ~200)。这些矩阵中的每一个都必须在动态规划算法中与其他矩阵进行比较。

在这个问题中，我问如何快速进行比较并得到了涉及 2D 卷积的出色答案。串行地，迭代地应用该方法，

list = who('data_matrix_prefix*')
H = cell(numel(list),numel(list));  
for i=1:numel(list)
    for j=1:numel(list)
        if i ~= j
            eval([ 'H{i,j} = compare(' char(list(i)) ',' char(list(j)) ');']);
        end
    end
end

对于数据的小子集来说是快速的（例如，对于 9 个矩阵，9*9 - 9 = 72 次调用在 ~1 秒内进行，870 次调用在 ~2.5 秒内进行）。
然而，对所有数据进行操作需要近 2500 万次调用。
我还尝试使用 deal() 来创建一个完全由数据中的下一个元素组成的元胞数组，因此我可以在单个循环中使用 cellfun() ：

# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
nextData = cell(k,1);
for i=1:k
    [nextData{:}] = deal(data{i});
    H{:,i} = cellfun(@compare,data,nextData,'UniformOutput',false);
end

不幸的是，这实际上并没有更快，因为所有时间都在比较中()。这两个代码示例似乎都不适合并行化。我无法弄清楚如何对变量进行切片。
Compare() 是完全矢量化的；它专门使用矩阵乘法和 conv2() （我的印象是所有这些操作，包括 cellfun()，都应该在 MATLAB 中是多线程的？）。

有人看到（明确的）并行解决方案或更好的问题矢量化吗？

注意
我意识到我的两个例子都是低效的 - 如果第一个例子计算三角形元胞数组，速度会是原来的两倍，而第二个例子仍然在计算自我比较。但是，良好的并行化所节省的时间更像是 16 倍（如果我在每个人的机器上安装 MATLAB，则为 72 倍）。

旁白
还有一个内存问题。我使用了几次评估将 H 的每一列附加到文件中，名称如 H1、H2 等，然后清除 H_i。不幸的是，保存速度非常慢......

原文

I have approximately 5,000 matrices with the same number of rows and varying numbers of columns (20 x ~200). Each of these matrices must be compared against every other in a dynamic programming algorithm.

In this question, I asked how to perform the comparison quickly and was given an excellent answer involving a 2D convolution. Serially, iteratively applying that method, like so

list = who('data_matrix_prefix*')
H = cell(numel(list),numel(list));  
for i=1:numel(list)
    for j=1:numel(list)
        if i ~= j
            eval([ 'H{i,j} = compare(' char(list(i)) ',' char(list(j)) ');']);
        end
    end
end

is fast for small subsets of the data (e.g. for 9 matrices, 9*9 - 9 = 72 calls are made in ~1 s, 870 calls in ~2.5 s).
However, operating on all the data requires almost 25 million calls.
I have also tried using deal() to make a cell array composed entirely of the next element in data, so I could use cellfun() in a single loop:

# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
nextData = cell(k,1);
for i=1:k
    [nextData{:}] = deal(data{i});
    H{:,i} = cellfun(@compare,data,nextData,'UniformOutput',false);
end

Unfortunately, this is not really any faster, because all the time is in compare(). Both of these code examples seem ill-suited for parallelization. I'm having trouble figuring out how to make my variables sliced.
compare() is totally vectorized; it uses matrix multiplication and conv2() exclusively (I am under the impression that all of these operations, including the cellfun(), should be multithreaded in MATLAB?).

Does anyone see a (explicitly) parallelized solution or better vectorization of the problem?

Note
I realize both my examples are inefficient - the first would be twice as fast if it calculated a triangular cell array, and the second is still calculating the self comparisons, as well. But the time savings for a good parallelization are more like a factor of 16 (or 72 if I install MATLAB on everyone's machines).

Aside
There is also a memory issue. I used a couple of evals to append each column of H into a file, with names like H1, H2, etc. and then clear H_i. Unfortunately, the saves are very slow...

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

比忠 2024-09-09 09:02:17

是

compare(a,b) == compare(b,a)

如果

compare(a,a) == 1

，则将循环更改

for i=1:numel(list)
    for j=1:numel(list)
    ...
    end
end

为

for i=1:numel(list)
    for j= i+1 : numel(list)
    ...
    end
end

并处理对称性和恒等性情况。这将使您的计算时间减少一半。

Does

compare(a,b) == compare(b,a)

and

compare(a,a) == 1

If so, change your loop

for i=1:numel(list)
    for j=1:numel(list)
    ...
    end
end

for i=1:numel(list)
    for j= i+1 : numel(list)
    ...
    end
end

and deal with the symmetry and identity case. This will cut your calculation time by half.

回复收藏 0 原文

鹤仙姿 2024-09-09 09:02:17

第二个示例可以轻松地切片以与并行处理工具箱一起使用。该工具箱将代码迭代分布在最多 8 个不同的本地处理器之间。如果你想在集群上运行代码，你还需要分布式计算工具箱。

%# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.

parfor i=1:k-1 %# this will run the loop in parallel with the parallel processing toolbox
    %# only make the necessary comparisons
    H{i+1:k,i} = cellfun(@compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);

    %# if the above doesn't work, try this
    hSlice = cell(k,1);
    hSlice{i+1:k} = cellfun(@compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
    H{:,i} = hSlice;
end

The second example can be easily sliced for use with the Parallel Processing Toolbox. This toolbox distributes iterations of your code among up to 8 different local processors. If you want to run the code on a cluster, you also need the Distributed Computing Toolbox.

%# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.

parfor i=1:k-1 %# this will run the loop in parallel with the parallel processing toolbox
    %# only make the necessary comparisons
    H{i+1:k,i} = cellfun(@compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);

    %# if the above doesn't work, try this
    hSlice = cell(k,1);
    hSlice{i+1:k} = cellfun(@compare,data(i+1:k),repmat(data(i),k-i,1),'UniformOutput',false);
    H{:,i} = hSlice;
end

回复收藏 0 原文

生生不灭 2024-09-09 09:02:17

如果我理解正确的话，你必须执行 5000^2 矩阵比较？也许您应该考虑您的问题是由 5000^2 个任务组成，而不是尝试并行比较函数？ Matlab 并行计算工具箱支持基于任务的并行性。不幸的是，我在 PCT 方面的经验是大型线性代数类型问题的并行化，因此我无法告诉您更多信息。该文档无疑会给您带来更多帮助。

回复收藏 0 原文

~没有更多了~