格劳曼达雷尔斯金字塔匹配核
我是学生。我的任务是选择一篇计算机视觉论文(从提供的列表中)并实现其算法。我选择了 Grauman 和 Darrells 的金字塔匹配内核: 使用图像特征集进行判别分类(IEEE,2005)。
我对它进行了编码,但它与图像匹配得不好。事实上,我什至无法从概念上看出如果它使用特征描述符集来匹配它如何工作。
据我了解,该技术是为两个特征集创建直方图金字塔,然后计算这些特征集的(加权)交集。第一层的 bin 大小 == 1,金字塔的每一层 bin 大小都加倍。当 bin 大小 >= max_element_in_feature_sets 时,该过程停止。如果 bin 大小更大,descriptor_value / bin_size 的整数除法将始终返回零,并且所有内容都将位于一个 bin 中。
所以这对我来说是失败的地方。想象一下 bin 大小 = 1/2 * max_element,即每个特征的每个元素都将进入 bin 1 或 bin 0。但是,如果特征向量长度为 128 个元素,则仍然会有 2^128 个 bin。两个特征进入同一个容器的可能性有多大?
当然,答案取决于情况。如果特征是随机噪声,则概率会非常低。论文必须默认相似的图像产生相似的特征。我在测试中没有看到这一点。例如,我拍摄了一张灰色的小图像,并使用 5x5 高斯核进行模糊处理。然后我将它与原始图像进行比较。 这是输出: (解释位于输出下方。向下滚动。)
file named art487.jpg extracted features= 50
file named art487_blur.jpg extracted features= 7
Min value= 0, Max val= 164 (of any element)
levels in the pyramid= 8
SUMMARY OF PYRAMID art487.jpg
level= 0, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
level= 1, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
level= 2, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
level= 3, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
level= 4, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
level= 5, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
level= 6, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
level= 7, # bins= 21, bins= 29, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, count = 50
SUMMARY OF PYRAMID art487_blur.jpg
level= 0, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1, count = 7
level= 1, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1, count = 7
level= 2, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1, count = 7
level= 3, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1, count = 7
level= 4, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1, count = 7
level= 5, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1, count = 7
level= 6, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1, count = 7
level= 7, # bins= 4, bins= 4, 1, 1, 1, count = 7
raw score= 0
normalized score= 0
“bins=”后的数字列表显示有多少功能落入一个特定的 bin 中。结果正是我对 128 维的期望。每个特征都有自己的箱,除了最粗的级别,其中几个零向量被分组在一起。 正如我所期望的,这会产生 0 的相似度分数。
我不知道如何使这个金字塔匹配内核有用。 论文称使用 SIFT 特征取得了良好的结果,但论文中没有任何内容可以帮助我理解这是如何实现的。
出了什么问题?我应该对像素强度而不是特征描述符进行分类?
I am student. My assignment is to pick a computer vision paper (from a provided list) and implement its algorithm. I chose Grauman and Darrells' The Pyramid Match Kernel:
Discriminative Classification with Sets of Image Features (IEEE, 2005).
I coded the thing up, but it is not matching images well. In fact, I can't even see conceptually how it could work if it uses feature descriptors are sets to match.
The technique, as I understood it, is to create a pyramid of histograms for two feature sets and then compute the (weighted) intersection of those sets. The bin size == 1 for the first level and the bin size doubles at every level of the pyramid. The process stops when bin size >= max_element_in_feature_sets. If bin size where any bigger, the integer division of descriptor_value / bin_size would always return zero and everything would be in one bin.
So here is where it falls apart for me. Imagine that bin size = 1/2 * max_element, that is every element of every feature will go into bin 1 or bin 0. But with a feature vector length of 128 elements, there will still be 2^128 bins. What is the chance of two features into the same bin?
The answer depends, of course. If the features were random noise, the probability would be very low. The paper must tacitly assume that similar images produce similar features. I'm not seeing that in my test runs. For example, I took a gray small image and blurred with a 5x5 Gaussian kernel. Then I compared it to the original image.
Here is the output:
(explanation is below the output. scroll down.)
file named art487.jpg extracted features= 50
file named art487_blur.jpg extracted features= 7
Min value= 0, Max val= 164 (of any element)
levels in the pyramid= 8
SUMMARY OF PYRAMID art487.jpg
level= 0, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
level= 1, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
level= 2, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
level= 3, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
level= 4, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
level= 5, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
level= 6, # bins= 50, bins= 1, 1, 1,..., 1, count = 50
level= 7, # bins= 21, bins= 29, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, count = 50
SUMMARY OF PYRAMID art487_blur.jpg
level= 0, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1, count = 7
level= 1, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1, count = 7
level= 2, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1, count = 7
level= 3, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1, count = 7
level= 4, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1, count = 7
level= 5, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1, count = 7
level= 6, # bins= 7, bins= 1, 1, 1, 1, 1, 1, 1, count = 7
level= 7, # bins= 4, bins= 4, 1, 1, 1, count = 7
raw score= 0
normalized score= 0
The list of number after "bins= " show how many features fell into one particular bin. The results are just is what I expect from 128 dimensions. Every feature gets its own bin, except at the coarsest level, where several zero vectors are grouped together.
This produces a similarity score of 0, just I would expect.
I don't know how to make this pyramid match kernel useful.
The paper says good results were achieved using SIFT features, but there is nothing in the paper to help me understand how that was possible.
What is going wrong? I am supposed to bin pixel intensities and not feature descriptors?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我和一个同学聊过。事实证明,作者使用了 SIFT 特征检测器,但使用了 10 维 PCA-SIFT 描述符。我会尝试一下。
更新:这解决了问题。特征向量的长度应该只有大约 10 个元素。
I spoke with a classmate. It turns out the authors used a SIFT feature detector but used 10 dimensional PCA-SIFT descriptors. I'll give that a whirl.
UPDATE: That solved the problem. The features vectors should only be about 10 elements long.
如果 bin 大小是 1/2 * max-size,则必须向金字塔添加一些内容。当我在四叉树中查找最近邻并将查询限制为四叉树的 1/2 时,情况是一样的。但这个方法对我来说是新的。
If the bin size is 1/2 * max-size you must add something to your pyramid. It's the same when I lookup for nearest-neighbor in a quadtree and I limit the query to 1/2 of the quadtree. But this method is new to me.