如何寻找并行的可能性？

发布于 2024-10-31 06:18:28 字数 1262 浏览 1 评论 0原文

我有一些串行代码，我已开始使用英特尔的 TBB 对其进行并行化。我的第一个目标是并行化代码中几乎所有的 for 循环（我什至在 for 循环内并行化了 for ），现在完成后我得到了一些加速。我正在寻找更多的地方/想法/选项来并行化......我知道这可能听起来有点模糊，没有太多参考问题，但我正在寻找通用的想法，我可以在我的代码中探索这些想法。

算法概述（以下算法在图像的所有级别上运行，从最短的开始，每次增加宽度和高度 2，直到达到实际的高度和宽度）。

For all image pairs starting with the smallest pair
    For height = 2 to image_height - 2
        Create a 5 by image_width ROI of both left and right images.
        For width = 2 to image_width - 2
            Create a 5 by 5 window of the left ROI centered around width and find best match in the right ROI using NCC
            Create a 5 by 5 window of the right ROI centered around width and find best match in the left ROI using NCC
            Disparity = current_width - best match
    The edge pixels that did not receive a disparity gets the disparity of its neighbors
    For height = 0 to image_height
        For width = 0 to image_width
            Check smoothness, uniqueness and order constraints*(parallelized separately)
    For height = 0 to image_height
        For width = 0 to image_width
            For disparity that failed constraints, use the average disparity of
            neighbors that passed the constraints
    Normalize all disparity and output to screen

原文

I have some serial code that I have started to parallelize using Intel's TBB. My first aim was to parallelize almost all the for loops in the code (I have even parallelized for within for loop)and right now having done that I get some speedup.I am looking for more places/ideas/options to parallelize...I know this might sound a bit vague without having much reference to the problem but I am looking for generic ideas here which I can explore in my code.

Overview of algo( the following algo is run over all levels of the image starting with shortest and increasing width and height by 2 each time till you reach actual height and width).

For all image pairs starting with the smallest pair
    For height = 2 to image_height - 2
        Create a 5 by image_width ROI of both left and right images.
        For width = 2 to image_width - 2
            Create a 5 by 5 window of the left ROI centered around width and find best match in the right ROI using NCC
            Create a 5 by 5 window of the right ROI centered around width and find best match in the left ROI using NCC
            Disparity = current_width - best match
    The edge pixels that did not receive a disparity gets the disparity of its neighbors
    For height = 0 to image_height
        For width = 0 to image_width
            Check smoothness, uniqueness and order constraints*(parallelized separately)
    For height = 0 to image_height
        For width = 0 to image_width
            For disparity that failed constraints, use the average disparity of
            neighbors that passed the constraints
    Normalize all disparity and output to screen

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

野稚 2024-11-07 06:18:28

仅从某些角度来看，并行化某些东西可能并不总是值得的。

仅仅因为你有一个 for 循环，其中每次迭代都可以彼此独立地完成，并不总是意味着你应该这样做。

TBB 在启动这些parallel_for 循环时会产生一些开销，因此除非您循环大量次，否则您可能不应该对其进行并行化。

但是，如果每个循环都非常昂贵（就像 CirrusFlyer 的示例一样），那么可以随意并行化它。

更具体地说，寻找并行计算的开销相对于并行化成本较小的时间。

另外，执行嵌套的parallel_for 循环时要小心，因为这可能会变得昂贵。您可能只想坚持并行化外部 for 循环。

回复收藏 0 原文

等你爱我 2024-11-07 06:18:28

愚蠢的答案是任何耗时或迭代的事情。我使用 Microsoft 的 .NET v4.0 任务并行库，其设置的有趣之处之一是其“表达的并行性”。一个有趣的术语，用于描述“尝试的并行性”。不过，如果主机平台没有必要的内核，您的编码语句可能会说“在此处使用 TPL”，它将简单地调用旧式串行代码来代替它。

我已经开始在我的所有项目中使用 TPL。特别是任何有循环的地方（这要求我设计我的类和方法，以便循环迭代之间不存在依赖关系）。但是，对于任何可能只是好的老式多线程代码的地方，我都会看看现在是否可以将其放置在不同的内核上。

到目前为止，我最喜欢的是一个应用程序，它会下载约 7,800 个不同的 URL 来分析页面的内容，如果它找到它正在寻找的信息，则会进行一些额外的处理......这过去需要 26 - 29 分钟来完成。我的 Dell T7500 工作站配备双四核 Xeon 3GHz 处理器、24GB RAM 和 Windows 7 Ultimate 64 位版本，现在可以在大约 5 分钟内完成整个任务。对我来说有很大的不同。

我还有一个发布/订阅通信引擎，我一直在重构它以利用 TPL（特别是在将数据从服务器“推送”到客户端时......您可能有 10,000 个客户端计算机，他们已经表达了对特定事物的兴趣，即一旦该事件发生，我需要将数据推送给所有这些）。我还没有完成这项工作，但我真的很期待看到这方面的结果。

值得深思...

回复收藏 0 原文

~没有更多了~