提高图像处理速度

发布于 2024-12-29 09:59:03 字数 383 浏览 1 评论 0原文

我正在使用 C++ 和 OpenCV 实时处理从网络摄像头拍摄的一些图像,我希望从我的系统中获得最佳速度。

除了更改处理算法(假设目前您无法更改它)。我应该做些什么来最大限度地提高处理速度?

我想也许多线程可以在这里有所帮助,但我很羞愧地说我真的不知道来龙去脉(尽管显然我以前使用过多线程但没有在 C++ 中使用过)。

假设我有一个 x 核处理器,将处理拆分为 x 个线程实际上会加快速度吗?...或者假设我正在寻找 20fps 的吞吐量,这些线程的管理开销是否会否定它(我认为这会影响你给出的答案,因为它应该告诉你每个线程将完成多少处理)

多线程在这里有帮助吗?

是否有任何专门提高 OpenCV 速度的技巧,或者我可能会陷入降低速度的任何陷阱。

谢谢。

I am using C++ and OpenCV to process some images taken from a Webcam in realtime and I am looking to get the best speed I can from my system.

Other than changing the processing algorithm (assume, for now, that you can't change it). Is there anything that I should be doing to maximize the speed of processing?

I am thinking maybe Multithreading could help here but I'm ashamed to say I don't really know the ins and outs (although obviously I have used multithreading before but not in C++).

Assuming I have an x-core processor, does splitting the processing into x threads actually speed things up?...or would the management overhead of these threads negate it assuming that I am looking for a throughput of 20fps (I assume that will affect the answer you give as it should give you an indication of how much processing will be done per thread)

Would multithreading help here?

Are there any tips for increasing the speed of OpenCV specifically, or any pitfalls that I might be falling into that reduce speed.

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

╰沐子 2025-01-05 09:59:03

我认为更简单的方法可能是管道化框架操作。

您可以使用线程池,按顺序将帧内存缓冲区分配给第一个可用线程,当关联帧上的算法步骤完成时将其释放到池中。

这可能会使您当前的(已调试的:)算法几乎保持不变,但需要更多的内存来缓冲中间结果。

当然,如果没有关于您的任务的详细信息,很难说这是否合适......

The easier way, I think, could be pipelining frame operations.

You could work with a thread pool, allocating sequentially a frame memory buffer to the first available thread, to be released to pool when the algorithm step on the associated frame has completed.

This could leave practically unchanged your current (debugged :) algorithm, but will require substantially more memory for buffering intermediate results.

Of course, without details about your task, it's hard to say if this is appropriate...

独﹏钓一江月 2025-01-05 09:59:03

在 OpenCV 中提高速度有一件重要的事情与处理器或算法无关,那就是在处理矩阵时避免额外的复制。我会给你一个从文档中摘取的例子:

“...通过为另一个矩阵的一部分构造标头。它可以是
单行、单列、多行、多列、矩形
矩阵中的区域(代数中称为小数)或对角线。这样的
操作也是 O(1),因为新标头将引用
相同的数据。您实际上可以使用此修改矩阵的一部分
功能,例如“

// add 5-th row, multiplied by 3 to the 3rd row
M.row(3) = M.row(3) + M.row(5)*3;

// now copy 7-th column to the 1-st column
// M.col(1) = M.col(7); // this will not work
Mat M1 = M.col(1);
M.col(7).copyTo(M1);

也许您已经知道这个问题,但我认为强调 openCV 中的 headers 作为重要且高效的编码工具非常重要。

There is one important thing about increasing speed in OpenCV not related to processor nor algorithm and it is avoiding extra copying when dealing with matrices. I will give you an example taken from the documentation:

"...by constructing a header for a part of another matrix. It can be a
single row, single column, several rows, several columns, rectangular
region in the matrix (called a minor in algebra) or a diagonal. Such
operations are also O(1), because the new header will reference the
same data. You can actually modify a part of the matrix using this
feature, e.g."

// add 5-th row, multiplied by 3 to the 3rd row
M.row(3) = M.row(3) + M.row(5)*3;

// now copy 7-th column to the 1-st column
// M.col(1) = M.col(7); // this will not work
Mat M1 = M.col(1);
M.col(7).copyTo(M1);

Maybe you already knew this issue but I think it is important to highlight headers in openCV as an important and efficient coding tool.

哭泣的笑容 2025-01-05 09:59:03

假设我有一个 x 核处理器,将处理拆分为 x 个线程实际上会加快速度吗?

是的,尽管这在很大程度上取决于所使用的特定算法,以及您编写线程代码来处理同步等事务的技能。您确实没有提供足够的详细信息来做出更好的评估。

有些算法非常容易并行化,例如具有以下形式的算法:

for (i=0; i < DATA_SIZE; i++)
{
   output[i] = f(input[i]);
}

对于某些函数 f。这些被称为“令人尴尬的并行化”;您可以简单地将数据分成 N 个块,并让 N 个线程单独处理每个块。像 OpenMP 这样的库使这种线程变得非常简单。

Assuming I have an x-core processor, does splitting the processing into x threads actually speed things up?

Yes, although it very heavily depends on the particular algorithm being used, as well as your skill in writing threaded code to handle things like synchronization. You didn't really provide enough detail to make a better assessment than that.

Some algorithms are extremely easy to parallelize, like ones that have the form:

for (i=0; i < DATA_SIZE; i++)
{
   output[i] = f(input[i]);
}

for some function f. These are known as embarassingly parallelizable; you can simply split the data into N blocks and have N threads process each block individually. Libraries like OpenMP make this kind of threading extremely simple.

笛声青案梦长安 2025-01-05 09:59:03

除非您使用的特定算法已经针对多线程/并行平台进行了优化,否则将其扔到 x 核处理器上对您没有任何作用。该算法本质上必须是可线程化的,才能从多线程中受益。但如果设计时没有考虑到这一点,就必须进行更改。另一方面,许多图像处理算法至少在概念上是“令人尴尬的并行”。您能否分享有关您想到的算法的更多详细信息?

Unless the particular algorithm you are using is already optimized for a multithreaded/parallel platform, throwing it at an x-core processor will do nothing for you. The algorithm has to be inherently threadable to benefit from multiple threads. But if it wasn't designed with that in mind, it would have to be altered. On the other hand, many image processing algorithms are "embarassingly-parallel", at least in concept. Can you share more details about the algorithm you have in mind?

夏夜暖风 2025-01-05 09:59:03

如果您的线程可以操作不同的数据,那么将其线程化似乎是合理的,也许将每个帧对象排队到线程池中。您可能必须向帧对象添加序列号,以确保从池中出现的已处理帧按照它们进入的顺序传递。

If your threads can operate on different data, it would seem reasonable to thread it off, perhaps queueing each frame object to a thread pool. You may have to add sequence numbers to the frame objects to ensure that the processed frames emerging from the pool are delivered in the same order they went in.

岁月打碎记忆 2025-01-05 09:59:03

作为使用 OpenCV 进行多线程图像处理的示例代码,您可能想查看我的代码:

https:// github.com/vmlaker/sherlock-cpp

这是我想利用 x 核 CPU 来提高对象检测性能的想法。检测程序基本上是一种并行算法,它在多个线程之间分配任务,每个任务都有一个单独的流水线线程:

  1. 分配帧内存和视频捕获。
  2. 对象检测(每个 Haar 分类器一个线程。)
  3. 使用检测结果增强输出并显示帧。
  4. 内存释放。

通过在所有线程之间共享每个捕获帧的内存,我获得了出色的性能和 CPU 利用率。

As example code for multi-threaded image processing with OpenCV, you might want to check out my code:

https://github.com/vmlaker/sherlock-cpp

It's what I came up with wanting to take advantage of x-core CPU to improve performance of object detection. The detect program is basically a parallel algorithm that distributes tasks among multiple threads, a separate pipelined thread for every task:

  1. Allocation of frame memory and video capture.
  2. Object detection (one thread per each Haar classifier.)
  3. Augmenting output with detection result and displaying the frame.
  4. Memory deallocation.

With memory for every captured frame shared between all threads, I got great performance and CPU utilization.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文