如何构建 C++使用多核处理器的应用程序
我正在构建一个应用程序,该应用程序将从摄像机源进行一些对象跟踪,并使用其中的信息在
我需要做什么才能将相机工作卸载到其他处理器以及如何处理与主应用程序的通信?
编辑: 我运行的是 Windows 7 64 位。
I am building an application that will do some object tracking from a video camera feed and use information from that to run a particle system in OpenGL. The code to process the video feed is somewhat slow, 200 - 300 milliseconds per frame right now. The system that this will be running on has a dual core processor. To maximize performance I want to offload the camera processing stuff to one processor and just communicate relevant data back to the main application as it is available, while leaving the main application kicking on the other processor.
What do I need to do to offload the camera work to the other processor and how do I handle communication with the main application?
Edit:
I am running Windows 7 64-bit.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
基本上,您需要对应用程序进行多线程处理。每个执行线程只能使一个核心饱和。单独的线程往往在单独的内核上运行。如果您坚持每个线程始终在特定核心上执行,那么每个操作系统都有自己的指定方式(亲和性掩码等)......但我不推荐它。
OpenMP 很棒,但它有点笨重,尤其是从并行化连接回来时。 YMMV。它很容易使用,但根本不是性能最佳的选择。它还需要编译器支持。
如果您使用的是 Mac OS X 10.6 (Snow Leopard),则可以使用 中央调度。即使您不使用它,阅读起来也很有趣,因为它的设计实现了一些最佳实践。它也不是最佳的,但它比 OpenMP 更好,尽管它也需要编译器支持。
如果您能够将应用程序分解为“任务”或“作业”,那么您可以将这些作业推送到与核心数量一样多的管道中。将批处理视为原子工作单元。如果您可以正确对其进行分段,则可以同时在两个核心和主线程上运行相机处理。
如果每个工作单元的通信最小化,那么对互斥锁和其他锁定原语的需求也将最小化。粗粒度线程比细粒度线程容易得多。而且,您始终可以使用库或框架来减轻负担。如果您采用手动方法,请考虑 Boost 的线程库 。它提供了可移植的包装器和一个很好的抽象。
Basically, you need to multithread your application. Each thread of execution can only saturate one core. Separate threads tend to be run on separate cores. If you are insistent that each thread ALWAYS execute on a specific core, then each operating system has its own way of specifying this (affinity masks & such)... but I wouldn't recommend it.
OpenMP is great, but it's a tad fat in the ass, especially when joining back up from a parallelization. YMMV. It's easy to use, but not at all the best performing option. It also requires compiler support.
If you're on Mac OS X 10.6 (Snow Leopard), you can use Grand Central Dispatch. It's interesting to read about, even if you don't use it, as its design implements some best practices. It also isn't optimal, but it's better than OpenMP, even though it also requires compiler support.
If you can wrap your head around breaking up your application into "tasks" or "jobs," you can shove these jobs down as many pipes as you have cores. Think of batching your processing as atomic units of work. If you can segment it properly, you can run your camera processing on both cores, and your main thread at the same time.
If communication is minimized for each unit of work, then your need for mutexes and other locking primitives will be minimized. Course grained threading is much easier than fine grained. And, you can always use a library or framework to ease the burden. Consider Boost's Thread library if you take the manual approach. It provides portable wrappers and a nice abstraction.
这取决于你有多少个核心。如果您只有 2 个核心(CPU、处理器、超线程,您知道我的意思),那么 OpenMP 无法提供如此巨大的性能提升,但会有所帮助。您可以获得的最大增益是将您的时间除以处理器数量,因此每帧仍需要 100 - 150 毫秒。
等式为
并行时间 = (([执行任务的总时间] - [无法并行化的代码]) / [CPU 数量]) + [无法并行化的代码]
基本上,OpenMP 在并行循环处理方面表现出色。它相当容易使用
并且非常好用,你的 for 是并行化的。它并不适用于所有情况,并非每个算法都可以这种方式并行化,但许多算法可以重写(破解)以兼容。关键原理是单指令多数据(SIMD),例如将相同的卷积码应用于多个像素。
但仅仅应用这个食谱就违反了优化规则。
1-对您的代码进行基准测试
2-用“科学”证据(数字)找到真正的瓶颈,而不是简单地猜测你认为存在瓶颈的地方
3-如果确实是处理循环,那么 OpenMP 适合您
也许对现有代码进行简单的优化可以给出更好的结果,谁知道呢?
另一种方法是在一个线程中运行 opengl,并在另一个线程上进行数据处理。如果 opengl 或您的粒子渲染系统消耗大量电量,这将有很大帮助,但请记住,线程可能会导致其他类型的同步瓶颈。
It depends on how many cores you have. If you have only 2 cores (cpu, processors, hyperthreads, you know what i mean), then OpenMP cannot give such a tremendous increase in performance, but will help. The maximum gain you can have is divide your time by the number of processors so it will still take 100 - 150 ms per frame.
The equation is
parallel time = (([total time to perform a task] - [code that cannot be parallelized]) / [number of cpus]) + [code that cannot be parallelized]
Basically, OpenMP rocks at parallel loops processing. Its rather easy to use
and bang, your for is parallelized. It does not work for every case, not every algorithm can be parallelized this way but many can be rewritten (hacked) to be compatible. The key principle is Single Instruction, Multiple Data (SIMD), applying the same convolution code to multiple pixels for example.
But simply applying this cookbook receipe goes against the rules of optimization.
1-Benchmark your code
2-Find the REAL bottlenecks with "scientific" evidence (numbers) instead of simply guessing where you think there is a bottleneck
3-If it is really processing loops, then OpenMP is for you
Maybe simple optimizations on your existing code can give better results, who knows?
Another road would be to run opengl in a thread and data processing on another thread. This will help a lot if opengl or your particle rendering system takes a lot of power, but remember that threading can lead to other kind of synchronization bottlenecks.
我建议不要使用 OpenMP,OpenMP 更多的是用于数字代码,而不是您似乎拥有的消费者/生产者模型。
我认为您可以使用 boost 线程来生成工作线程、公共内存段(用于获取数据的通信)以及一些通知机制来告诉您的数据可用(查看 boost 线程中断)。
我不知道你做了什么样的处理,但你可能想看看英特尔线程构建块和英特尔集成原语,它们有几个用于视频处理的函数,可能会更快(假设它们有你的功能)
I would recommend against OpenMP, OpenMP is more for numerical codes rather than consumer/producer model that you seem to have.
I think you can do something simple using boost threads to spawn worker thread, common segment of memory (for communication of acquired data), and some notification mechanism to tell on your data is available (look into boost thread interrupts).
I do not know what kind of processing you do, but you may want to take a look at the Intel thread building blocks and Intel integrated primitives, they have several functions for video processing which may be faster (assuming they have your functionality)
您需要某种框架来处理多核。 OpenMP 似乎是一个相当简单的选择。
You need some kind of framework for handling multicores. OpenMP seems a fairly simple choice.
就像 Pestilence 所说的那样,你只需要你的应用程序是多线程的。已经提到了很多像 OpenMP 这样的框架,所以这里是另一个:
Intel Thread Building Blocks
我从未使用过以前有过,但我听说过很多关于它的事情。
希望这有帮助!
Like what Pestilence said, you just need your app to be multithreaded. Lots of frameworks like OpenMP have been mentioned, so here's another one:
Intel Thread Building Blocks
I've never used it before, but I hear great things about it.
Hope this helps!