OpenCV SURF 比较描述符

发布于 2024-10-15 22:56:21 字数 409 浏览 12 评论 0原文

以下代码片段来自 OpenCV find_obj.cpp，它是使用 SURF 的演示，


double
compareSURFDescriptors( const float* d1, const float* d2, double best, int length )
{
    double total_cost = 0;
    assert( length % 4 == 0 );
    int i;
    for( i = 0; i  best )
            break;
    }
    return total_cost;
}

据我所知它检查欧几里得距离，我不明白的是为什么会这样4人一组做吗？为什么不一次性计算出全部内容呢？

原文

Folowing snippet is from OpenCV find_obj.cpp which is demo for using SURF,


double
compareSURFDescriptors( const float* d1, const float* d2, double best, int length )
{
    double total_cost = 0;
    assert( length % 4 == 0 );
    int i;
    for( i = 0; i  best )
            break;
    }
    return total_cost;
}

As far as I can tell it checking the euclidian distance, what I do not understand is why is it doing it in groups of 4? Why not calculate the whole thing at once?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鼻尖触碰 2024-10-22 22:56:21

通常这样做是为了使 SSE 优化成为可能。 SSE 寄存器有 128 位长，可以包含 4 个浮点数，因此您可以使用一条指令并行执行 4 次减法。

另一个好处是：只有在每第四个差异之后才需要检查循环计数器。即使编译器不利用生成 SSE 代码的机会，这也会使代码更快。例如，VS2008没有，即使使用-O2也没有：

    
      double t0 = d1[i] - d2[i];
00D91666  fld         dword ptr [edx-0Ch] 
00D91669  fsub        dword ptr [ecx-4] 
        double t1 = d1[i+1] - d2[i+1];
00D9166C  fld         dword ptr [ebx+ecx] 
00D9166F  fsub        dword ptr [ecx] 
        double t2 = d1[i+2] - d2[i+2];
00D91671  fld         dword ptr [edx-4] 
00D91674  fsub        dword ptr [ecx+4] 
        double t3 = d1[i+3] - d2[i+3];
00D91677  fld         dword ptr [edx] 
00D91679  fsub        dword ptr [ecx+8] 
        total_cost += t0*t0 + t1*t1 + t2*t2 + t3*t3;
00D9167C  fld         st(2) 
00D9167E  fmulp       st(3),st 
00D91680  fld         st(3) 
00D91682  fmulp       st(4),st 
00D91684  fxch        st(2) 
00D91686  faddp       st(3),st 
00D91688  fmul        st(0),st 
00D9168A  faddp       st(2),st 
00D9168C  fmul        st(0),st 
00D9168E  faddp       st(1),st 
00D91690  faddp       st(2),st

Usually things like this are done for making SSE optimizations possible. SSE registers are 128 bits long and can contain 4 floats, so you can do the 4 subtractions using one instruction, parallelly.

Another upside: you have to check the loop counter only after every fourth difference. That makes the code faster even if the compiler doesn't use the opportunity to generate SSE code. For example, VS2008 didn't, not even with -O2:

    
      double t0 = d1[i] - d2[i];
00D91666  fld         dword ptr [edx-0Ch] 
00D91669  fsub        dword ptr [ecx-4] 
        double t1 = d1[i+1] - d2[i+1];
00D9166C  fld         dword ptr [ebx+ecx] 
00D9166F  fsub        dword ptr [ecx] 
        double t2 = d1[i+2] - d2[i+2];
00D91671  fld         dword ptr [edx-4] 
00D91674  fsub        dword ptr [ecx+4] 
        double t3 = d1[i+3] - d2[i+3];
00D91677  fld         dword ptr [edx] 
00D91679  fsub        dword ptr [ecx+8] 
        total_cost += t0*t0 + t1*t1 + t2*t2 + t3*t3;
00D9167C  fld         st(2) 
00D9167E  fmulp       st(3),st 
00D91680  fld         st(3) 
00D91682  fmulp       st(4),st 
00D91684  fxch        st(2) 
00D91686  faddp       st(3),st 
00D91688  fmul        st(0),st 
00D9168A  faddp       st(2),st 
00D9168C  fmul        st(0),st 
00D9168E  faddp       st(1),st 
00D91690  faddp       st(2),st

回复收藏 0 原文