OpenCV:寻找 CPU 密集度较低的帧捕获+调整大小并进入缓冲区的方式:如何优化我的代码?
所以我创建了一个函数(C++),
void CaptureFrame(char* buffer, int w, int h, int bytespan)
{
/* get a frame */
if(!cvGrabFrame(capture)){ // capture a frame
printf("Could not grab a frame\n\7");
//exit(0);
}
CVframe =cvRetrieveFrame(capture); // retrieve the captured frame
/* always check */
if (!CVframe)
{
printf("No CV frame captured!\n");
cin.get();
}
/* resize buffer for current frame */
IplImage* destination = cvCreateImage(cvSize(w, h), CVframe->depth, CVframe->nChannels);
//use cvResize to resize source to a destination image
cvResize(CVframe, destination);
IplImage* redchannel = cvCreateImage(cvGetSize(destination), 8, 1);
IplImage* greenchannel = cvCreateImage(cvGetSize(destination), 8, 1);
IplImage* bluechannel = cvCreateImage(cvGetSize(destination), 8, 1);
cvSplit(destination, bluechannel, greenchannel, redchannel, NULL);
for(int y = 0; y < destination->height; y++)
{
char* line = buffer + y * bytespan;
for(int x = 0; x < destination->width; x++)
{
line[0] = cvGetReal2D(redchannel, y, x);
line[1] = cvGetReal2D(greenchannel, y, x);
line[2] = cvGetReal2D(bluechannel, y, x);
line += 3;
}
}
cvReleaseImage(&redchannel);
cvReleaseImage(&greenchannel);
cvReleaseImage(&bluechannel);
cvReleaseImage(&destination);
}
所以通常它从设备捕获一个帧,创建一个要调整大小的帧并将其复制到缓冲区(RGB 或 YUV420P 对我来说是要求)。
所以我想知道我做错了什么,因为我的函数是 2 个 cpu 密集型的,可以采取什么措施来修复它?
更新:
我的函数在线程中运行:
void ThreadCaptureFrame()
{
while(1){
t.restart();
CaptureFrame((char *)frame->data[0], videoWidth, videoHeight, frame->linesize[0]);
AVFrame* swap = frame;
frame = readyFrame;
readyFrame = swap;
spendedTime = t.elapsed();
if(spendedTime < desiredTime){
Sleep(desiredTime - spendedTime);
}
}
}
它在 int main 的开头启动(经过一些初始化):
boost::thread workerThread(ThreadCaptureFrame);
因此,如果可以的话,它每秒运行 24 次,它会占用核心四核的 28%。我捕获的摄像机分辨率约为 320x240。那么:如何优化呢?
So I created a function (C++)
void CaptureFrame(char* buffer, int w, int h, int bytespan)
{
/* get a frame */
if(!cvGrabFrame(capture)){ // capture a frame
printf("Could not grab a frame\n\7");
//exit(0);
}
CVframe =cvRetrieveFrame(capture); // retrieve the captured frame
/* always check */
if (!CVframe)
{
printf("No CV frame captured!\n");
cin.get();
}
/* resize buffer for current frame */
IplImage* destination = cvCreateImage(cvSize(w, h), CVframe->depth, CVframe->nChannels);
//use cvResize to resize source to a destination image
cvResize(CVframe, destination);
IplImage* redchannel = cvCreateImage(cvGetSize(destination), 8, 1);
IplImage* greenchannel = cvCreateImage(cvGetSize(destination), 8, 1);
IplImage* bluechannel = cvCreateImage(cvGetSize(destination), 8, 1);
cvSplit(destination, bluechannel, greenchannel, redchannel, NULL);
for(int y = 0; y < destination->height; y++)
{
char* line = buffer + y * bytespan;
for(int x = 0; x < destination->width; x++)
{
line[0] = cvGetReal2D(redchannel, y, x);
line[1] = cvGetReal2D(greenchannel, y, x);
line[2] = cvGetReal2D(bluechannel, y, x);
line += 3;
}
}
cvReleaseImage(&redchannel);
cvReleaseImage(&greenchannel);
cvReleaseImage(&bluechannel);
cvReleaseImage(&destination);
}
So generally it captures a frame from device, creates a frame to resize into and copies it into buffer (RGB or YUV420P is requirement for me).
So I wonder what I do wrong, because my function is way 2 cpu intensive, and what can be done to fix it?
Update:
My function is runed in thread:
void ThreadCaptureFrame()
{
while(1){
t.restart();
CaptureFrame((char *)frame->data[0], videoWidth, videoHeight, frame->linesize[0]);
AVFrame* swap = frame;
frame = readyFrame;
readyFrame = swap;
spendedTime = t.elapsed();
if(spendedTime < desiredTime){
Sleep(desiredTime - spendedTime);
}
}
}
which is started at the beginning of int main ( after some initialization):
boost::thread workerThread(ThreadCaptureFrame);
So if it can it runs 24 times per second, it eats 28% of core quad. cam resolution I capture is like 320x240. So: how to optimize it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以做的事情:
buffer = destination->imageData
这些文章可能会有所帮助:
Things you can do:
buffer = destination->imageData
These articles might be helpful:
这可能需要最多的时间。预先分配所有
IplImage
,并仅在应用完成时释放它们。您可以将
boost::shared_ptr
与自定义删除器结合使用,以避免需要记住释放映像。如果您必须复制,则只需将整个
destination->imageData
复制到buffer
中即可。如果是填充有问题,那么您可以像以前一样在循环中执行此操作,但直接从
destination->imageData
进行。您不需要分离颜色通道。cvResize
与CV_INTER_NN
结合使用。这会降低图像质量,但速度更快。That probably takes the most time. Have all your
IplImage
s pre-allocated and release them only when your app is done.You can use
boost::shared_ptr
with a custom deleter to avoid needing to remember to release the images.If you must copy, then just copy the whole of
destination->imageData
intobuffer
.If it is the padding that is buggung you then do it in a loop like you did, but directly from
destination->imageData
. You dont need to separate the color channels.cvResize
withCV_INTER_NN
. That will reduce the image quality but is faster.我不熟悉 OpenCV,但如果我正确地读取你的代码,你会:
我认为这是很多不必要的复制,对于每个帧,您制作了 6 个图像副本(即,如果您的图像是 320x240、24 位颜色和 24fps,则您将在至少 32MB/秒,对于 1000x1000 帧,您所说的是每秒半千兆字节,请注意,这是一个非常粗略的粗略估计,具体取决于调整大小算法,可能会进行额外的复制,读取/写入未对齐的内存位置可能会产生一些开销等)。
您可能可以跳过步骤 #3 和/或 #4,尽管我对 OpenCV 不太熟悉,无法建议如何操作。
I'm not familiar with OpenCV, but if I'm reading your code correctly, you're:
I think that's a lot of unnecessary copying, for each frame you made 6 copies of the image (i.e. if your image is 320x240 on 24-bit color and 24fps you'd be moving around at least 32MB/sec, with 1000x1000 frame you're talking about half gigabyte per second; note that this is a very crude back-of-the-envelope underestimate, depending on the resizing algorithm, extra copying may be done, reading/writing to non-aligned memory location may incur some overhead, etc, etc).
You can probably skip step #3 and/or #4, though I'm not familiar enough with OpenCV to suggest how.