OpenCV 在模板匹配方面的性能

发布于 2024-11-30 13:34:14 字数 912 浏览 2 评论 0原文

我正在尝试基本上在java上进行模板匹配。我使用简单的算法来查找匹配。这是代码：

minSAD = VALUE_MAX;
// loop through the search image
for ( int x = 0; x <= S_rows - T_rows; x++ ) {
    for ( int y = 0; y <= S_cols - T_cols; y++ ) {
        SAD = 0.0;

        // loop through the template image
        for ( int i = 0; i < T_rows; i++ )
            for ( int j = 0; j < T_cols; j++ ) {

                pixel p_SearchIMG = S[x+i][y+j];

                pixel p_TemplateIMG = T[i][j];

                SAD += abs( p_SearchIMG.Grey - p_TemplateIMG.Grey );
            }
    }

    // save the best found position 
    if ( minSAD > SAD ) {
        minSAD = SAD;
        // give me VALUE_MAX
        position.bestRow = x;
        position.bestCol = y;
        position.bestSAD = SAD;
    }
}

但这是非常慢的方法。我测试了 2 个图像 (768 × 1280) 和子图像 (384 x 640)。这会持续很长时间。 openCV 使用现成的函数 cvMatchTemplate() 执行模板匹配是否更快？

原文

I'm trying to do template matching basically on java. I used straightforward algorithm to find match. Here is the code:

minSAD = VALUE_MAX;
// loop through the search image
for ( int x = 0; x <= S_rows - T_rows; x++ ) {
    for ( int y = 0; y <= S_cols - T_cols; y++ ) {
        SAD = 0.0;

        // loop through the template image
        for ( int i = 0; i < T_rows; i++ )
            for ( int j = 0; j < T_cols; j++ ) {

                pixel p_SearchIMG = S[x+i][y+j];

                pixel p_TemplateIMG = T[i][j];

                SAD += abs( p_SearchIMG.Grey - p_TemplateIMG.Grey );
            }
    }

    // save the best found position 
    if ( minSAD > SAD ) {
        minSAD = SAD;
        // give me VALUE_MAX
        position.bestRow = x;
        position.bestCol = y;
        position.bestSAD = SAD;
    }
}

But this is very slow approach. I tested 2 images (768 × 1280) and subimage (384 x 640). This lasts for ages.
Does openCV perform template matching much faster or not with ready function cvMatchTemplate()?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

明媚如初 2024-12-07 13:34:15

您会发现 openCV cvMatchTemplate() 比您实现的方法快得多。您创建的是一个统计模板匹配方法。它是最常见且最容易实现的，但在大图像上速度非常慢。让我们看一下基本的数学知识，你有一个 768x1280 的图像，你循环遍历每个像素减去边缘，因为这是你的模板限制，所以 (768 - 384) x (1280 - 640) 384 x 640 = 245'因此，在循环中添加任何数学之前，需要循环遍历模板的每个像素（另外 245'760 次操作）的 760 次操作您已经有 (245'760 x 245'760) 60'397'977'600 次操作。仅仅为了循环浏览图像就进行了超过 600 亿次操作，更令人惊讶的是机器执行此操作的速度有多快。

但请记住，它是 245'760 x（245'760 x 数学运算），因此还有更多运算。

现在cvMatchTemplate()实际上使用了傅里叶分析模板匹配操作。这是通过在图像上应用快速傅立叶变换 (FFT) 来实现的，其中构成像素强度变化的信号被分割成每个相应的波形。该方法很难解释清楚，但图像被转换为复数的信号表示。如果您想了解更多请在 goggle 上搜索快速傅里叶变换。现在对模板执行相同的操作，形成模板的信号用于过滤掉图像中的任何其他信号。

简而言之，它会抑制图像中与模板不具有相同特征的所有特征。然后使用快速傅里叶逆变换将图像转换回来，以生成图像，其中高值意味着匹配，低值意味着相反。该图像通常经过标准化处理，因此 1 表示匹配，0 或大约表示该对象不在附近。

但请注意，如果它们的对象不在图像中并且已标准化，则会发生错误检测，因为计算出的最高值将被视为匹配。我可以继续谈论该方法的工作原理及其好处或可能发生的问题，但是...

该方法如此之快的原因是：1）opencv 是高度优化的 c++ 代码。 2) fft 函数对于您的处理器来说很容易处理，因为大多数处理器都有能力在硬件中执行此操作。 GPU 显卡设计为每秒执行数百万次 fft 运算，因为这些计算在高性能游戏图形或视频编码中同样重要。 3）所需的操作量要少得多。

在夏季，统计模板匹配方法很慢并且需要很长时间，而 opencv FFT 或 cvMatchTemplate() 则快速且高度优化。

如果对象不存在，统计模板匹配不会产生错误，而 opencv FFT 会产生错误，除非在应用中小心谨慎。

我希望这能让您有一个基本的了解并回答您的问题。

干杯

克里斯

[编辑]

进一步回答您的问题：

嗨，

cvMatchTemplate 可以与 CCOEFF_NORMED 和 CCORR_NORMED 和 SQDIFF_NORMED 一起使用，包括这些的非标准化版本。此处显示了您可以期望的结果类型，并给出了您的要使用的代码。

http://dasl.mem.drexel.edu/~noahKuntz/openCVTut6 .html#Step%202

这三种方法被广泛引用，许多论文都可以通过 Google 学术。我在下面提供了几篇论文。每个都简单地使用不同的方程来查找形成模板的 FFT 信号与图像中存在的 FFT 信号之间的相关性，根据我的经验，相关系数往往会产生更好的结果，并且更容易找到参考。平方差之和是另一种可用于获得可比结果的方法。我希望其中一些有所帮助：

用于缺陷检测的快速归一化互相关
蔡杜明;林建达；
模式识别字母
第 24 卷，第 15 期，2003 年 11 月，第 2625-2631 页

使用快速归一化互相关进行模板匹配< /a>
凯·布里奇勒；乌韦·D·哈内贝克；

二维散斑跟踪技术的相对性能：归一化相关性、非归一化相关性和绝对差值
弗里梅尔，BH；博斯，LN；特拉希，通用电气；
超声波研讨会，1995。会议记录。，1995 IEEE

A快速数字图像配准算法类
巴尼亚，丹尼尔一世；西尔弗曼，哈维·F.；
计算机，IEEE Transactions，1972 年 2 月

人们通常倾向于使用这些方法的规范化版本，因为任何等于 1 的值都是匹配项，但是如果不存在对象，您可能会得到误报。该方法运行速度快，仅仅是因为它是用计算机语言激发的。所涉及的操作对于处理器架构来说是理想的，这意味着它可以用几个时钟周期完成每个操作，而不是在几个时钟周期内移动内存和信息。处理器多年来一直在解决 FFT 问题，正如我所说，有内置硬件可以做到这一点。基于硬件的速度总是比软件快，并且模板匹配的统计方法是基于基础软件的。可以在这里找到有关硬件的好读物：

数字信号处理器
虽然 Wiki 页面的参考资料值得一看，但这是执行 FFT 计算的硬件

一种新的流水线 FFT 处理器方法
何寿胜;马茨·托克尔森；
我最喜欢的一个，因为它显示了处理器内部发生的情况

高效的本地流水线 FFT 处理器
梁阳;张克伟;刘红霞;黄金;黄世坦;

这些论文确实展示了 FFT 在实现时是多么复杂，但是流程的流水线使得操作可以在几个时钟周期内执行。这就是基于实时视觉的系统利用 FPGA（特别是可以设计来实现一组任务的设计处理器）的原因，因为它们可以在架构中进行极其并行的设计，并且管道更容易实现。

尽管我必须提到，对于图像的 FFT，您实际上使用的是 FFT2，它是水平平面的 FFT 和垂直平面的 FFT，这样当您找到参考时就不会混淆。我不能说我拥有关于如何实现方程和实现 FFT 的专业知识我试图找到好的指南，但找到一个好的指南非常困难，所以我还没有找到一个（我无法理解）至少）。有一天我可能会理解它们，但我知道我对它们的工作原理以及可以预期的结果有很好的理解。

除此之外，如果您想实现自己的版本或了解它是如何工作的，那么我真的无法为您提供更多帮助，那么是时候使用该库了，但我警告您 opencv 代码已经优化得非常好，您将很难提高其性能，但是谁知道你可能会找到一种方法来获得更好的结果，祝你好运，

克里斯

You will find openCV cvMatchTemplate() is much mush quicker than the method you have implemented. What you have created is a statistical template matching method. It is the most common and the easiest to implement however is extremely slow on large images. Lets take a look at the basic maths you have a image that is 768x1280 you loop through each of these pixels minus the edge as this is you template limits so (768 - 384) x (1280 - 640) that 384 x 640 = 245'760 operations in which you loop through each pixel of your template (another 245'760 operations) therefore before you add any maths in your loop you already have (245'760 x 245'760) 60'397'977'600 operations. Over 60 billion operations just to loop through your image It's more surprising how quick machines can do this.

Remember however its 245'760 x (245'760 x Maths Operations) so there are many more operations.

Now cvMatchTemplate() actually uses the Fourier Analysis Template matching operation. This works by applying a Fast Fourier Transform (FFT) on the image in which the signals that make up the pixel changes in intensity are segmented into each of the corresponding wave forms. The method is hard to explain well but the image is transformed into a signal representation of complex numbers. If you wish to understand more please search on goggle for the fast fourier transform. Now the same operation is performed on the template the signals that form the template are used to filter out any other signals from your image.

In simple it suppresses all features within the image that do not have the same features as your template. The image is then converted back using a inverse fast fourier transform to produce an images where high values mean a match and low values mean the opposite. This image is often normalised so 1's represent a match and 0's or there about mean the object is no where near.

Be warned though if they object is not in the image and it is normalised false detection will occur as the highest value calculated will be treated as a match. I could go on for ages about how the method works and its benefits or problems that can occur but...

The reason this method is so fast is: 1) opencv is highly optimised c++ code. 2) The fft function is easy for your processor to handle as a majority have the ability to perform this operation in hardware. GPU graphic cards are designed to perform millions of fft operations every second as these calculations are just as important in high performance gaming graphics or video encoding. 3) The amount of operations required is far less.

In summery statistical template matching method is slow and takes ages whereas opencv FFT or cvMatchTemplate() is quick and highly optimised.

Statistical template matching will not produce errors if an object is not there whereas opencv FFT can unless care is taken in its application.

I hope this gives you a basic understanding and answers your question.

Cheers

Chris

[EDIT]

To further answer your Questions:

Hi,

cvMatchTemplate can work with CCOEFF_NORMED and CCORR_NORMED and SQDIFF_NORMED including the non-normalised version of these. Here shows the kind of results you can expect and gives your the code to play with.

http://dasl.mem.drexel.edu/~noahKuntz/openCVTut6.html#Step%202

The three methods are well cited and many papers are available through Google scholar. I have provided a few papers bellow. Each one simply uses a different equation to find the correlation between the FFT signals that form the template and the FFT signals that are present within the image the Correlation Coefficient tends to yield better results in my experience and is easier to find references to. Sum of the Squared Difference is another method that can be used with comparable results. I hope some of these help:

Fast normalized cross correlation for defect detection
Du-Ming Tsai; Chien-Ta Lin;
Pattern Recognition Letters
Volume 24, Issue 15, November 2003, Pages 2625-2631

Template Matching using Fast Normalised Cross Correlation
Kai Briechle; Uwe D. Hanebeck;

Relative performance of two-dimensional speckle-tracking techniques: normalized correlation, non-normalized correlation and sum-absolute-difference
Friemel, B.H.; Bohs, L.N.; Trahey, G.E.;
Ultrasonics Symposium, 1995. Proceedings., 1995 IEEE

A Class of Algorithms for Fast Digital Image Registration
Barnea, Daniel I.; Silverman, Harvey F.;
Computers, IEEE Transactions on Feb. 1972

It is often favoured to use the normalised version of these methods as anything that equals a 1 is a match however if not object is present you can get false positives. The method works fast simply due to the way it is instigated in the computer language. The operations involved are ideal for the processor architecture which means it can complete each operation with a few clock cycles rather than shifting memory and information around over several clock cycles. Processors have been solving FFT problems for many years know and like I said there is inbuilt hardware to do so. Hardware based is always faster than software and statistical method of template matching is in basic software based. Good reading for the hardware can be found here:

Digital signal processor
Although a Wiki page the references are worth a look an effectively this is the hardware that performs FFT calculations

A new Approach to Pipeline FFT Processor
Shousheng He; Mats Torkelson;
A favourite of mine as it shows whats happening inside the processor

An Efficient Locally Pipelined FFT Processor
Liang Yang; Kewei Zhang; Hongxia Liu; Jin Huang; Shitan Huang;

These papers really show how complex the FFT is when implemented however the pipe-lining of the process is what allows the operation to be performed in a few clock cycles. This is the reason real time vision based systems utilise FPGA (specifically design processors that you can design to implement a set task) as they can be design extremely parallel in the architecture and pipe-lining is easier to implement.

Although I must mention that for FFT of an image you are actually using FFT2 which is the FFT of the horizontal plain and the FFT of the vertical plain just so there is no confusion when you find reference to it. I can not say I have an expert knowledge in how the equations implemented and the FFT is implemented I have tried to find good guides yet finding a good guide is very hard so much I haven't yet found one (Not one I can understand at least). One day I may understand them but for know I have a good understanding of how they work and the kind of results that can be expected.

Other than this I can't really help you more if you want to implement your own version or understand how it works it's time to hit the library but I warn you the opencv code is so well optimised you will struggle to increase its performance however who knows you may figure out a way to gain better results all the best and Good luck

Chris

回复收藏 0 原文

~没有更多了~