格劳曼达雷尔斯金字塔匹配核

发布于 2024-12-15 06:18:45 字数 2212 浏览 5 评论 0原文

我是学生。我的任务是选择一篇计算机视觉论文（从提供的列表中）并实现其算法。我选择了 Grauman 和 Darrells 的金字塔匹配内核：使用图像特征集进行判别分类（IEEE，2005）。

我对它进行了编码，但它与图像匹配得不好。事实上，我什至无法从概念上看出如果它使用特征描述符集来匹配它如何工作。

据我了解，该技术是为两个特征集创建直方图金字塔，然后计算这些特征集的（加权）交集。第一层的 bin 大小 == 1，金字塔的每一层 bin 大小都加倍。当 bin 大小 >= max_element_in_feature_sets 时，该过程停止。如果 bin 大小更大，descriptor_value / bin_size 的整数除法将始终返回零，并且所有内容都将位于一个 bin 中。

所以这对我来说是失败的地方。想象一下 bin 大小 = 1/2 * max_element，即每个特征的每个元素都将进入 bin 1 或 bin 0。但是，如果特征向量长度为 128 个元素，则仍然会有 2^128 个 bin。两个特征进入同一个容器的可能性有多大？

当然，答案取决于情况。如果特征是随机噪声，则概率会非常低。论文必须默认相似的图像产生相似的特征。我在测试中没有看到这一点。例如，我拍摄了一张灰色的小图像，并使用 5x5 高斯核进行模糊处理。然后我将它与原始图像进行比较。这是输出：（解释位于输出下方。向下滚动。）

file named art487.jpg extracted features= 50
file named art487_blur.jpg extracted features= 7

Min value= 0, Max val= 164 (of any element)
levels in the pyramid= 8

SUMMARY OF PYRAMID art487.jpg
    level= 0, # bins= 50, bins= 1, 1, 1,..., 1,  count = 50
    level= 1, # bins= 50, bins= 1, 1, 1,..., 1,  count = 50
    level= 2, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
    level= 3, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
    level= 4, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
    level= 5, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
    level= 6, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
    level= 7, # bins= 21, bins= 29, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  count = 50


SUMMARY OF PYRAMID art487_blur.jpg
    level= 0, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1,  count = 7
    level= 1, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1,  count = 7
    level= 2, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1,  count = 7
    level= 3, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1,  count = 7
    level= 4, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1,  count = 7
    level= 5, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1,  count = 7
    level= 6, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1,  count = 7
    level= 7, # bins= 4, bins= 4, 1, 1, 1,  count = 7

raw score= 0
normalized score= 0

“bins=”后的数字列表显示有多少功能落入一个特定的 bin 中。结果正是我对 128 维的期望。每个特征都有自己的箱，除了最粗的级别，其中几个零向量被分组在一起。正如我所期望的，这会产生 0 的相似度分数。

我不知道如何使这个金字塔匹配内核有用。论文称使用 SIFT 特征取得了良好的结果，但论文中没有任何内容可以帮助我理解这是如何实现的。

出了什么问题？我应该对像素强度而不是特征描述符进行分类？

原文

I am student. My assignment is to pick a computer vision paper (from a provided list) and implement its algorithm. I chose Grauman and Darrells' The Pyramid Match Kernel:
Discriminative Classification with Sets of Image Features (IEEE, 2005).

I coded the thing up, but it is not matching images well. In fact, I can't even see conceptually how it could work if it uses feature descriptors are sets to match.

The technique, as I understood it, is to create a pyramid of histograms for two feature sets and then compute the (weighted) intersection of those sets. The bin size == 1 for the first level and the bin size doubles at every level of the pyramid. The process stops when bin size >= max_element_in_feature_sets. If bin size where any bigger, the integer division of descriptor_value / bin_size would always return zero and everything would be in one bin.

So here is where it falls apart for me. Imagine that bin size = 1/2 * max_element, that is every element of every feature will go into bin 1 or bin 0. But with a feature vector length of 128 elements, there will still be 2^128 bins. What is the chance of two features into the same bin?

The answer depends, of course. If the features were random noise, the probability would be very low. The paper must tacitly assume that similar images produce similar features. I'm not seeing that in my test runs. For example, I took a gray small image and blurred with a 5x5 Gaussian kernel. Then I compared it to the original image.
Here is the output:
(explanation is below the output. scroll down.)

file named art487.jpg extracted features= 50
file named art487_blur.jpg extracted features= 7

Min value= 0, Max val= 164 (of any element)
levels in the pyramid= 8

SUMMARY OF PYRAMID art487.jpg
    level= 0, # bins= 50, bins= 1, 1, 1,..., 1,  count = 50
    level= 1, # bins= 50, bins= 1, 1, 1,..., 1,  count = 50
    level= 2, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
    level= 3, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
    level= 4, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
    level= 5, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
    level= 6, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
    level= 7, # bins= 21, bins= 29, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  count = 50


SUMMARY OF PYRAMID art487_blur.jpg
    level= 0, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1,  count = 7
    level= 1, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1,  count = 7
    level= 2, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1,  count = 7
    level= 3, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1,  count = 7
    level= 4, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1,  count = 7
    level= 5, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1,  count = 7
    level= 6, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1,  count = 7
    level= 7, # bins= 4, bins= 4, 1, 1, 1,  count = 7

raw score= 0
normalized score= 0

The list of number after "bins= " show how many features fell into one particular bin. The results are just is what I expect from 128 dimensions. Every feature gets its own bin, except at the coarsest level, where several zero vectors are grouped together.
This produces a similarity score of 0, just I would expect.

I don't know how to make this pyramid match kernel useful.
The paper says good results were achieved using SIFT features, but there is nothing in the paper to help me understand how that was possible.

What is going wrong? I am supposed to bin pixel intensities and not feature descriptors?

分享到QQ

分享到微博