openMP 是否适合并行化每秒运行多次的代码块?

发布于 2024-08-24 08:59:34 字数 301 浏览 1 评论 0原文

假设您有一个典型的游戏循环,每秒运行大约 30 次。一个特定的函数大约需要 50% 的时间,并且看起来像是并行化的主要候选者 - 比如说它是一个大循环,或者有 4 个不同且独立的工作链正在进行。假设我们已经检查过该函数本身可以独立很好地并行到 2 -4 个核心。

在这种情况下,OpenMP 可能会加快速度吗?我希望天真地每帧创建 1-3 个线程来分割工作不会很好,但我真的不知道线程创建/销毁会带来什么开销,如果它是 10 毫秒或 100 毫秒。而且我不知道如果 OMP 在此类事情上高效,或者只真正适合运行时间较长的代码片段。

想法?

Say you have a typical game-loop, running about 30 times a second. One particular function takes about 50% of the time and looks like a prime candidate for parallelization - say it's a big loop or there are 4 distinct and independent strands of work going on. Assume we already checked that the function itself can parallelize well in isolation to 2 -4 cores.

Is OpenMP likely to give a speed up in such a case? I'd expect that naively creating 1-3 threads each frame to split the work would not be great, but I don't really know what overhead a thread creation/destruction brings, if it's 10ms or 100. And i don't know if OMP is efficient at this kind of thing, or is only really suited to longer running pieces of code.

Thoughts?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

机场等船 2024-08-31 08:59:35

许多 OpenMP 实现在程序启动时启动一组线程,仅在完成时关闭它 - 即它们在执行期间不会进行大量破坏/构造。但是,我认为这取决于实现,因此您需要仔细检查您的情况和文档。

在这个问题上没有任何争论的首要原则 - 测试!

编辑:如果您发现您的实现在执行期间确实启动和停止线程,您可能可以将整个程序包装在 omp 并行构造中,并使用主子句来确保程序的单线程部分不会并行化。如果您有 OpenMP 3.0 的实现,那么这可能比早期规范的实现更容易。

Many OpenMP implementations start up a gang of threads at program start up and only close it down at finalisation -- ie they don't do a lot of destruction/construction during execution. However, I think this is implementation dependent so you need to check your situation and documentation carefully.

No arguing from first principles on this issue -- test !

EDIT: If you find that your implementation does start and stop threads during execution, you can probably wrap the whole program in an omp parallel construct and use master clauses to ensure that the single-threaded parts of the program are not parallelised. This is probably easier if you have an implementation of OpenMP 3.0 than an implementation of the earlier specifications.

慈悲佛祖 2024-08-31 08:59:35

每 1/30 秒创建和销毁线程可能不会那么高效。人们会说配置文件,但其他具有丰富多线程经验的人会说减少系统调用的数量。在这种情况下,创建这些线程一次并找到一种方法让它们执行来自主线程的请求会更容易。

如果这就是您所做的全部,您可能只需使用 #pragma omp task#pragma omp taskwait 即可。

Creating and destroying threads every 1/30th of a second is probably not going to be that performant. People will say profile, but others with any significant multithreading experience will say reduce the number of system calls. In this case, it would be easier to create those threads once and figure out a way for them to execute requests from the main thread.

If that is all you are doing, you can probably just use #pragma omp task and #pragma omp taskwait.

删除会话 2024-08-31 08:59:35

不多。 MP=消息传递。这些算法针对高并行集群系统(2000 台计算机处理同一件事)进行了优化,而不是针对“在一个进程中,每秒多次处理小片段”。当然,只有当问题需要大量计算时,这才有效。

例子:

  • 电影的3D渲染,机器可能在几分钟内计算出一帧,你需要计算数万帧。

Not much. MP = message passing. Those algorythms are optimized for high parallel cluster systems (2000 computers working on the same thing), NOT on "in one process, small fragments many times per second". Naturally this only works efficiently if the prolblem requires significant calculation.

Examples:

  • 3d rendering for movies, where a machine may calculate a frame in some minutes, you need many tens of thousand frames calculated.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文