我正在尝试制作一个程序,可以从图像数据集中找到相似的图像。步骤是
- 提取所有图像的 SURF 描述
- 符存储描述符
- 对存储的描述符应用 knn
- 使用 kNN 将存储的描述符与查询图像描述符匹配
现在每个图像 SURF 描述符将存储为分层 k 均值树,现在我存储每个树作为一个单独的文件,或者是否可以构建某种包含所有图像描述符的单一树,并在图像添加到数据集中时进行更新。
这是我的程序所基于的论文。
I am trying to make a program that will find similar images from a dataset of images. The steps are
- extract SURF descriptors for all images
- store the descriptors
- Apply knn on the stored descriptors
- Match the stored descriptors to the query image descriptor using kNN
Now each images SURF descriptor will be stored as Hierarchical k-means tree, now do I store each tree as a separate file or is it possible to build some sort of single tree with all the images descriptors and updated as images are added to dataset.
This is the paper I am basing the program on.
发布评论
评论(2)
您确定要使用 SURF 描述符来实现吗?我自己正在开发一个类似的应用程序,它基于 this发誓 SIFT 描述符是正确的选择。但是,我想您也可以使用任何其他描述符来做到这一点。
查看您引用的论文,它比我链接到的工作更新,但它没有引用 Nister 论文 或这项工作 ( Sivic、Zisserman),据我所知,这是所有基于内容的图像检索问题的基础。
为了更好地理解这个问题,在开始实现之前,我首先阅读了 Sivic, Zisserman 了解系统背后的总体思路。他们仅在从所有特征中提取所有 SIFT 描述符后应用简单的聚类。他们使用两种不同类型的特征来提高准确性,形状适应(以角状特征为中心)和最大稳定(对应于高对比度的斑点 - 您可以在这篇论文(Matas 等人))。由于每个功能都是直接存储,因此他们系统的可扩展性并不是那么好,但他们引入了反向文件的概念,这是一种来自文本分析的技术(您可以阅读其基础知识 此处),这显着简化了查找过程。
完成这项工作后,我建议继续访问Nister,Stewenius,他们在那里引入L级分层k均值聚类的概念,用于存储特征以及后期图像数据库的搜索。现在,除非我非常非常错误,否则您不会将每个描述符存储为单独的树。相反,您可以根据现有特征来创建树(其中每个级别中的聚类中心实际上是每个聚类的代表性“中心”特征)。一旦树构建到所需的深度(他们建议 6 个级别上有 10 个簇),最后一层的簇中心就代表极少数的特征 - 因此,您实际上可以忘记所有原始特征! (或者至少是它们的描述符)。每个原始特征都可以由相应的聚类中心表示,并且对于每个图像,您只需要存储有关它包含哪些聚类中心(特征)的信息,而不是描述符。这要容易得多,因为您只需要为每个功能存储一个或两个整数 - 通过树对其路径进行编码。最简单的查看方法是,如果您只编码该特征在每个级别所属的簇的编号 - 其中有 10 个(4 位) - 每个级别(其中 6 个,4*6 < 32 位,因此它适合一个整数)。当然,您可以以任何您认为合适的方式实现实际编码。哦,他们还在 MSER 区域上使用 SIFT 描述符。
另外,如果您用于构建词汇树的图像具有代表性(例如,您正在处理开放空间图片的数据集,并且仅根据图像的代表性部分构建树,但您知道没有工业图片)工厂工作场所在数据集的其余部分),您可以非常非常快地添加新图片。要在数据集中添加任何新图片,您唯一需要做的就是确定哪个计算出的聚类中心最能代表图像特征(如前所述,最后一级聚类中心非常精确)并存储有关该聚类中心的信息。聚类中心(前面提到的整数)。查找聚类中心应该非常快 - 6 个级别中的每个级别只有 10 次比较。
希望有一天这对某人实际上有用,因为这个问题已经存在一年多了。 :)
Are you quite sure you want to do it with SURF descriptors? I'm just working on a similar application myself, and it is based on this paper (Nister, Stewenius) and they swear SIFT descriptors are the way to go. But, I guess you could do it with any other descriptors as well.
Looking at the paper you referenced, it is newer that the work I linked to, but it does not reference either the Nister paper or this work (Sivic, Zisserman), which are, to my knowledge, the base works for all content based image retrieval problems.
For the better understanding of the problem, before I started implementing it, I first read Sivic, Zisserman to get the general idea behind the system. They only apply a simple clustering after extracting all the SIFT descriptors from all the features. They use two different types of features for better accuracy, Shape Adapted (centered on corner like features) and Maximally Stable (corresponding to blobs of high contrast - you can look them up in this paper (Matas et. al.)). The scalability of their system is not all that good because of the direct storage of each feature but they introduced the concept of Inverted files, a technique from Text Analytics (you can read about it's basics here), which simplifies the look up process significantly.
After conquering that work, I recommend going on to Nister, Stewenius where they introduce the concept of hierarchical k-means clustering in L levels for storing the features, as well as for the latter search of the image database. Now, unless I am very very much mistaken, you don't store each descriptor as a separate tree. Instead, you make the tree based on the existing features (where the cluster centers in each level are actually the representative, "central", features for each cluster). Once the tree is constructed to the required depth (they recommend 10 cluster over 6 levels), the cluster centers at the last level are representative for a very small number of features - and thus, you can actually forget all the original features! (or at least, their descriptors). Each original feature can be represented by the corresponding cluster center, and instead of descriptors, for each image you only need to store the information about which cluster centers - features - it contains. This is much easier since you only need to store one or two integers per feature - coding it's path through a tree. The simplest way to see it is if you just code the number of the cluster that the feature belongs to at each level - there's 10 of them (4 bits) - for each level (6 of them, 4*6 < 32bits, so it fits into an integer). You can of course implement the actual coding any way you see fit. Oh, and they also use SIFT descriptors on MSER regions.
Also, if the images you use for building a vocabulary tree are representative (e.g. you're working on a dataset of open-space pictures and you build the tree from just a representative portion of the images, but you know there's no pictures of industrial plant workplaces in the rest of the dataset), you can add the new pictures very very fast. The only thing you need to do for adding any new picture in the dataset is determine which of the calculated cluster centers represent the image features the best (as mentioned before, the last level of cluster centers is quite precise) and store the information about the cluster centers (the before-mentioned integers). Looking up the cluster centers should be really fast - there's only 10 comparisons at each of the 6 levels.
Hope this is actually useful to somebody someday, since the question is just over a year old. :)
使用 KD-Tree 代替。您将能够构建分层的 K 维树,您只需要弄清楚什么类型的信息被发送到树上进行存储。您可以将图像的向量/描述符保存到磁盘上,每次启动程序时加载 KD 树。新的计算机向量/描述符可以发送到树和磁盘
希望这会有所帮助
Use a KD-Tree instead. You will be able to build the hierarchical K-dimensional tree, you just need figure out what sort of information is sent down the tree to be stored. You can save the vectors/descriptors of the images onto the disk, load the KD-Tree every time you start-up your program. Newly computer vectors/descriptors can be both sent to the tree and the disk
Hope this helps