OpenCV:寻找 CPU 密集度较低的帧捕获+调整大小并进入缓冲区的方式:如何优化我的代码?

发布于 2024-10-02 13:54:58 字数 2135 浏览 0 评论 0原文

所以我创建了一个函数(C++),

void CaptureFrame(char* buffer, int w, int h, int bytespan)
{
 /* get a frame */
 if(!cvGrabFrame(capture)){              // capture a frame 
  printf("Could not grab a frame\n\7");
  //exit(0);
 }
 CVframe =cvRetrieveFrame(capture);           // retrieve the captured frame

 /* always check */
 if (!CVframe)
 {
  printf("No CV frame captured!\n");
  cin.get();
 }

 /* resize buffer for current frame */
 IplImage* destination = cvCreateImage(cvSize(w, h), CVframe->depth, CVframe->nChannels);

 //use cvResize to resize source to a destination image
 cvResize(CVframe, destination);

 IplImage* redchannel = cvCreateImage(cvGetSize(destination), 8, 1);
 IplImage* greenchannel = cvCreateImage(cvGetSize(destination), 8, 1);
 IplImage* bluechannel = cvCreateImage(cvGetSize(destination), 8, 1);

 cvSplit(destination, bluechannel, greenchannel, redchannel, NULL);
 for(int y = 0; y < destination->height; y++)
 {
  char* line = buffer + y * bytespan;
  for(int x = 0; x < destination->width; x++)
  {
   line[0] = cvGetReal2D(redchannel, y, x);
   line[1] = cvGetReal2D(greenchannel, y, x);
   line[2] = cvGetReal2D(bluechannel, y, x);
   line += 3;
  }
 }
 cvReleaseImage(&redchannel);
 cvReleaseImage(&greenchannel);
 cvReleaseImage(&bluechannel);
 cvReleaseImage(&destination);
}

所以通常它从设备捕获一个帧,创建一个要调整大小的帧并将其复制到缓冲区(RGB 或 YUV420P 对我来说是要求)。

所以我想知道我做错了什么,因为我的函数是 2 个 cpu 密集型的,可以采取什么措施来修复它?

更新:

我的函数在线程中运行:

     void ThreadCaptureFrame()
    {
        while(1){
        t.restart();
        CaptureFrame((char *)frame->data[0], videoWidth, videoHeight, frame->linesize[0]);
        AVFrame* swap = frame;
        frame = readyFrame;
        readyFrame = swap;
        spendedTime = t.elapsed();
        if(spendedTime < desiredTime){
            Sleep(desiredTime - spendedTime);
        }
    }
 }

它在 int main 的开头启动(经过一些初始化):

boost::thread workerThread(ThreadCaptureFrame);

因此,如果可以的话,它每秒运行 24 次,它会占用核心四核的 28%。我捕获的摄像机分辨率约为 320x240。那么:如何优化呢?

So I created a function (C++)

void CaptureFrame(char* buffer, int w, int h, int bytespan)
{
 /* get a frame */
 if(!cvGrabFrame(capture)){              // capture a frame 
  printf("Could not grab a frame\n\7");
  //exit(0);
 }
 CVframe =cvRetrieveFrame(capture);           // retrieve the captured frame

 /* always check */
 if (!CVframe)
 {
  printf("No CV frame captured!\n");
  cin.get();
 }

 /* resize buffer for current frame */
 IplImage* destination = cvCreateImage(cvSize(w, h), CVframe->depth, CVframe->nChannels);

 //use cvResize to resize source to a destination image
 cvResize(CVframe, destination);

 IplImage* redchannel = cvCreateImage(cvGetSize(destination), 8, 1);
 IplImage* greenchannel = cvCreateImage(cvGetSize(destination), 8, 1);
 IplImage* bluechannel = cvCreateImage(cvGetSize(destination), 8, 1);

 cvSplit(destination, bluechannel, greenchannel, redchannel, NULL);
 for(int y = 0; y < destination->height; y++)
 {
  char* line = buffer + y * bytespan;
  for(int x = 0; x < destination->width; x++)
  {
   line[0] = cvGetReal2D(redchannel, y, x);
   line[1] = cvGetReal2D(greenchannel, y, x);
   line[2] = cvGetReal2D(bluechannel, y, x);
   line += 3;
  }
 }
 cvReleaseImage(&redchannel);
 cvReleaseImage(&greenchannel);
 cvReleaseImage(&bluechannel);
 cvReleaseImage(&destination);
}

So generally it captures a frame from device, creates a frame to resize into and copies it into buffer (RGB or YUV420P is requirement for me).

So I wonder what I do wrong, because my function is way 2 cpu intensive, and what can be done to fix it?

Update:

My function is runed in thread:

     void ThreadCaptureFrame()
    {
        while(1){
        t.restart();
        CaptureFrame((char *)frame->data[0], videoWidth, videoHeight, frame->linesize[0]);
        AVFrame* swap = frame;
        frame = readyFrame;
        readyFrame = swap;
        spendedTime = t.elapsed();
        if(spendedTime < desiredTime){
            Sleep(desiredTime - spendedTime);
        }
    }
 }

which is started at the beginning of int main ( after some initialization):

boost::thread workerThread(ThreadCaptureFrame);

So if it can it runs 24 times per second, it eats 28% of core quad. cam resolution I capture is like 320x240. So: how to optimize it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

初熏 2024-10-09 13:54:58

您可以做的事情:

  • 不要以默认分辨率从相机拍摄图像,而是选择您想要的分辨率。
  • 我认为您可以简单地设置 buffer = destination->imageData

这些文章可能会有所帮助:

Things you can do:

  • Instead of taking images from the camera at the default resolution, choose what resolution you want.
  • I think you can simply set buffer = destination->imageData

These articles might be helpful:

悲喜皆因你 2024-10-09 13:54:58
  1. 首先,不要每帧分配和释放图像!
    这可能需要最多的时间。预先分配所有 IplImage,并仅在应用完成时释放它们。
    您可以将 boost::shared_ptr 与自定义删除器结合使用,以避免需要记住释放映像。
  2. 我不明白你为什么要分裂,为什么要这样抄袭。
    如果您必须复制,则只需将整个destination->imageData复制到buffer中即可。
    如果是填充有问题,那么您可以像以前一样在循环中执行此操作,但直接从 destination->imageData 进行。您不需要分离颜色通道。
  3. cvResizeCV_INTER_NN 结合使用。这会降低图像质量,但速度更快。
  1. First, don't allocate and the release the images per every frame!
    That probably takes the most time. Have all your IplImages pre-allocated and release them only when your app is done.
    You can use boost::shared_ptr with a custom deleter to avoid needing to remember to release the images.
  2. I don't get why you're splitting and why you're copying like that.
    If you must copy, then just copy the whole of destination->imageData into buffer.
    If it is the padding that is buggung you then do it in a loop like you did, but directly from destination->imageData. You dont need to separate the color channels.
  3. Use cvResize with CV_INTER_NN. That will reduce the image quality but is faster.
药祭#氼 2024-10-09 13:54:58

我不熟悉 OpenCV,但如果我正确地读取你的代码,你会:

  1. 从相机的缓冲区读取到内存(1 次复制)
  2. 调整图像大小(1 次复制)
  3. 将图像分割为 RGB 通道(3 次复制)
  4. 重新- 将通道合并到缓冲区(1 次复制)

我认为这是很多不必要的复制,对于每个帧,您制作了 6 个图像副本(即,如果您的图像是 320x240、24 位颜色和 24fps,则您将在至少 32MB/秒,对于 1000x1000 帧,您所说的是每秒半千兆字节,请注意,这是一个非常粗略的粗略估计,具体取决于调整大小算法,可能会进行额外的复制,读取/写入未对齐的内存位置可能会产生一些开销等)。

您可能可以跳过步骤 #3 和/或 #4,尽管我对 OpenCV 不太熟悉,无法建议如何操作。

I'm not familiar with OpenCV, but if I'm reading your code correctly, you're:

  1. reading from camera's buffer to memory (1 copying)
  2. resizing the image (1 copying)
  3. splitting the image into RGB channel (3 copying)
  4. re-merge the channels to buffer (1 copying)

I think that's a lot of unnecessary copying, for each frame you made 6 copies of the image (i.e. if your image is 320x240 on 24-bit color and 24fps you'd be moving around at least 32MB/sec, with 1000x1000 frame you're talking about half gigabyte per second; note that this is a very crude back-of-the-envelope underestimate, depending on the resizing algorithm, extra copying may be done, reading/writing to non-aligned memory location may incur some overhead, etc, etc).

You can probably skip step #3 and/or #4, though I'm not familiar enough with OpenCV to suggest how.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文