使用 OpenCV 检测一个图像中的对象是否在另一图像中

发布于 2024-12-11 09:56:46 字数 1061 浏览 0 评论 0原文

我有一个包含对象的示例图像,例如下图中的耳环:

https://i.sstatic.net/N5w9a .jpg

然后,我有一大堆候选图像,我需要确定哪一个最有可能包含该对象,例如:

https://i.sstatic.net/xYL90.jpg

所以我需要为每个图像生成一个分数,其中最高分数对应于最有可能包含目标对象的图像。现在,在这种情况下,我需要使用/解决以下条件/约束:

1)我可以从不同角度获取多个样本图像。

2) 样本图像可能与候选图像具有不同的分辨率、角度和距离。

3)有很多候选图像(> 10,000),所以它必须相当快。

4) 我愿意为了速度而牺牲一些精度,所以如果这意味着我们必须搜索前 100 个而不是仅前 10 个,那很好,并且可以手动完成。

5)我可以手动操作样本图像,例如勾勒出我想要检测的物体;由于候选图像太多,无法手动操作。

6)我根本没有 OpenCV 或计算机视觉方面的真正背景,所以我从头开始。

我最初的想法是首先在示例图像中的对象周围绘制一个粗略的轮廓。然后,我可以识别对象中的角点和候选图像中的角点。我可以对每个角周围的像素进行分析,看看它们是否看起来相似,然后根据每个角的最大相似度分数的总和进行排名。我也不确定如何量化相似的像素。我猜只是它们的 RGB 值的欧几里得距离?

问题在于它忽略了对象的中心。在上面的例子中,如果耳环的角都靠近金框,那么就不会考虑耳环内部的红、绿、蓝宝石。我想我可以通过查看所有角点对并通过沿它们之间的线采样一些点来确定相似性来改进这一点。

所以我有几个问题:

A)这种思路总体上有意义还是我遗漏了什么?

B) 我应该使用 OpenCV 中的哪些特定算法进行研究?我知道有多种角点检测算法,但我只需要一种,如果差异都在边缘进行优化,那么我可以选择最快的算法。

C)有任何使用算法的示例代码有助于帮助我理解吗?

我的语言选择是 Python 或 C#。

I have a sample image which contains an object, such as the earrings in the following image:

https://i.sstatic.net/N5w9a.jpg

I then have a large candidate set of images for which I need to determine which one most likely contains the object, e.g.:

https://i.sstatic.net/xYL90.jpg

So I need to produce a score for each image, where the highest score corresponds to the image which most likely contains the target object. Now, in this case, I have the following conditions/constraints to work with/around:

1) I can obtain multiple sample images at different angles.

2) The sample images are likely to be at different resolutions, angles, and distances than the candidate images.

3) There are a LOT of candidate images (> 10,000), so it must be reasonably fast.

4) I'm willing to sacrifice some precision for speed, so if it means we have to search through the top 100 instead of just the top 10, that's fine and can be done manually.

5) I can manipulate the sample images manually, such as outlining the object that I wish to detect; the candidate images cannot be manipulated manually as there are too many.

6) I have no real background in OpenCV or computer vision at all, so I'm starting from scratch here.

My initial thought is to start by drawing a rough outline around the object in the sample image. Then, I could identify corners in the object and corners in the candidate image. I could profile the pixels around each corner to see if they look similar and then rank by the sum of the maximum similarity scores of every corner. I'm also not sure how to quantify similar pixels. I guess just the Euclidean distance of their RGB values?

The problem there is that it kind of ignores the center of the object. In the above examples, if the corners of the earrings are all near the gold frame, then it would not consider the red, green, and blue stones inside the earring. I suppose I could improve this by then looking at all pairs of corners and determining similarity by sampling some points along the line between them.

So I have a few questions:

A) Does this line of thinking make sense in general or is there something I'm missing?

B) Which specific algorithms from OpenCV should I investigate using? I'm aware that there are multiple corner detection algorithms, but I only need one and if the differences are all optimizing on the margins then I'm fine with the fastest.

C) Any example code using the algorithms that would be helpful to aid in my understanding?

My options for languages are either Python or C#.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

∞琼窗梦回ˉ 2024-12-18 09:56:46

幸运的是,OpenCV 的好心人已经为你做到了这一点。检查您的示例文件夹“opencv\samples\cpp\matching_to_many_images.cpp”。编译并使用默认图像尝试一下。

该算法可以很容易地进行调整,使其更快或更精确。

物体识别算法主要分为两部分:关键点检测和关键点检测。描述和对象匹配。对于它们两者,都有许多算法/变体,您可以直接在 OpenCV 中使用。

检测/描述可以通过:SIFT/SURF/ORB/GFTT/STAR/FAST等进行。

为了进行匹配,您可以使用:强力、汉明等。(某些方法特定于给定的检测算法)

开始提示:

  • 裁剪原始图像,以便有趣的对象尽可能多地覆盖图像区域。将其用作训练。

  • SIFT 是最准确且最惰性的描述符。 FAST 是精度和准确度的良好结合。 GFTT 很旧而且很不可靠。 ORB 是 OPENCV 中新添加的,无论是速度还是精度都非常有前途。

  • 结果取决于另一幅图像中物体的姿势。如果调整大小、旋转、挤压、部分覆盖等,请尝试 SIFT。如果它是一个简单的任务(即它以几乎相同的大小/旋转/等出现,大多数描述符都能很好地处理)
  • ORB 可能尚未出现在 OpenCV 版本中。尝试从 openCV trunk 下载最新版本并编译 https://code.ros.org/svn/ opencv/trunk

因此,您可以通过反复试验找到最适合您的组合。

有关每个实现的详细信息,您应该阅读原始论文/教程。谷歌学术是一个好的开始

Fortunately, the kind guys from OpenCV just did that for you. Check in your samples folder "opencv\samples\cpp\matching_to_many_images.cpp". Compile and give it a try wih the default images.

The algorithm can be easily adapted to make it faster or more precise.

Mainly, object recognition algorithms are split in two parts: keypoint detection& description adn object matching. For both of them there are many algorithms/variants, with wich you can play directly into OpenCV.

Detection/description can be done by: SIFT/SURF/ORB/GFTT/STAR/FAST and others.

For matching you have: brute force, hamming, etc. (Some methods are specific for a given detection algorithm)

HINTS to start:

  • crop your original image so the interesting object covers as much as possible of the image area. Use it as training.

  • SIFT is the most accurate and the laziest descriptor. FAST is a good combination of precision and accuracy. GFTT is old and quite unreliable. ORB is newly added to OPENCV and is very promising, both in speed and accuracy.

  • The results depend on the pose of the object in the other image. If it is resized, rotated, squeezed, partly covered, etc, try SIFT. if it is a simple task (i.e. it appears at the almost same size/rotation/etc, most of the descriptors will cope well)
  • ORB may not be yet in the OpenCV release. Try to download the latest from openCV trunk and compile it https://code.ros.org/svn/opencv/trunk

So, you can find the best combination for you by trial and error.

For the details of every implementation, you should read the original papers/tutorials. google scholar is a good start

a√萤火虫的光℡ 2024-12-18 09:56:46

查看 SURF 功能,它们是 openCV 的一部分。这里的想法是,您有一个算法可以在两个图像中查找“兴趣点”。您还有一个算法用于计算每个兴趣点周围图像块的描述符。通常,该描述符捕获块中边缘方向的分布。然后尝试找到点对应关系,即对于图像 A 中的每个兴趣点,尝试在图像 B 中找到相应的兴趣点。这是通过比较描述符并寻找最接近的匹配来完成的。然后,如果您有一组通过某种几何变换相关的对应关系,那么您就进行了检测。

当然,这是一个非常高层次的解释。细节决定成败,对于那些你应该阅读一些论文。从 David Lowe 的来自尺度不变关键点的独特图像特征开始,然后阅读有关 SURF 的论文。

另外,考虑将此问题移至信号和图像处理堆栈交换

Check out the SURF features, which are a part of openCV. The idea here is that you have an algorithm for finding "interest points" in two images. You also have an algorithm for computing a descriptor of an image patch around each interest point. Typically this descriptor captures the distribution of edge orientations in the patch. Then you try to find point correspondences, i. e. for each interest point in image A try to find a corresponding interest point in image B. This is accomplished by comparing the descriptors, and looking for the closest matches. Then, if you have a set of correspondences that are related by some geometric transformation, you have a detection.

Of course, this is a very high level explanation. The devil is in the details, and for those you should read some papers. Start with Distinctive image features from scale-invariant keypoints by David Lowe, and then read the papers on SURF.

Also, consider moving this question to Signal and Image Processing Stack Exchange

杯别 2024-12-18 09:56:46

如果将来有人出现,这里有一个使用 openCV 执行此操作的小示例。它基于 opencv 示例,但是(在我的意见),这更清楚一些,所以我也将其包括在内。

使用 openCV 2.4.4 进行测试

#!/usr/bin/env python

'''
Uses SURF to match two images.
  Finds common features between two images and draws them

Based on the sample code from opencv:
  samples/python2/find_obj.py

USAGE
  find_obj.py <image1> <image2>
'''

import sys

import numpy
import cv2


###############################################################################
# Image Matching
###############################################################################

def match_images(img1, img2, img1_features=None, img2_features=None):
    """Given two images, returns the matches"""
    detector = cv2.SURF(3200)
    matcher = cv2.BFMatcher(cv2.NORM_L2)

    if img1_features is None:
        kp1, desc1 = detector.detectAndCompute(img1, None)
    else:
        kp1, desc1 = img1_features

    if img2_features is None:
        kp2, desc2 = detector.detectAndCompute(img2, None)
    else:
        kp2, desc2 = img2_features

    #print 'img1 - %d features, img2 - %d features' % (len(kp1), len(kp2))

    raw_matches = matcher.knnMatch(desc1, trainDescriptors=desc2, k=2)
    kp_pairs = filter_matches(kp1, kp2, raw_matches)
    return kp_pairs


def filter_matches(kp1, kp2, matches, ratio=0.75):
    """Filters features that are common to both images"""
    mkp1, mkp2 = [], []
    for m in matches:
        if len(m) == 2 and m[0].distance < m[1].distance * ratio:
            m = m[0]
            mkp1.append(kp1[m.queryIdx])
            mkp2.append(kp2[m.trainIdx])
    kp_pairs = zip(mkp1, mkp2)
    return kp_pairs


###############################################################################
# Match Diplaying
###############################################################################

def draw_matches(window_name, kp_pairs, img1, img2):
    """Draws the matches"""
    mkp1, mkp2 = zip(*kp_pairs)

    H = None
    status = None

    if len(kp_pairs) >= 4:
        p1 = numpy.float32([kp.pt for kp in mkp1])
        p2 = numpy.float32([kp.pt for kp in mkp2])
        H, status = cv2.findHomography(p1, p2, cv2.RANSAC, 5.0)

    if len(kp_pairs):
        explore_match(window_name, img1, img2, kp_pairs, status, H)


def explore_match(win, img1, img2, kp_pairs, status=None, H=None):
    """Draws lines between the matched features"""
    h1, w1 = img1.shape[:2]
    h2, w2 = img2.shape[:2]
    vis = numpy.zeros((max(h1, h2), w1 + w2), numpy.uint8)
    vis[:h1, :w1] = img1
    vis[:h2, w1:w1 + w2] = img2
    vis = cv2.cvtColor(vis, cv2.COLOR_GRAY2BGR)

    if H is not None:
        corners = numpy.float32([[0, 0], [w1, 0], [w1, h1], [0, h1]])
        reshaped = cv2.perspectiveTransform(corners.reshape(1, -1, 2), H)
        reshaped = reshaped.reshape(-1, 2)
        corners = numpy.int32(reshaped + (w1, 0))
        cv2.polylines(vis, [corners], True, (255, 255, 255))

    if status is None:
        status = numpy.ones(len(kp_pairs), numpy.bool_)
    p1 = numpy.int32([kpp[0].pt for kpp in kp_pairs])
    p2 = numpy.int32([kpp[1].pt for kpp in kp_pairs]) + (w1, 0)

    green = (0, 255, 0)
    red = (0, 0, 255)
    for (x1, y1), (x2, y2), inlier in zip(p1, p2, status):
        if inlier:
            col = green
            cv2.circle(vis, (x1, y1), 2, col, -1)
            cv2.circle(vis, (x2, y2), 2, col, -1)
        else:
            col = red
            r = 2
            thickness = 3
            cv2.line(vis, (x1 - r, y1 - r), (x1 + r, y1 + r), col, thickness)
            cv2.line(vis, (x1 - r, y1 + r), (x1 + r, y1 - r), col, thickness)
            cv2.line(vis, (x2 - r, y2 - r), (x2 + r, y2 + r), col, thickness)
            cv2.line(vis, (x2 - r, y2 + r), (x2 + r, y2 - r), col, thickness)
    vis0 = vis.copy()
    for (x1, y1), (x2, y2), inlier in zip(p1, p2, status):
        if inlier:
            cv2.line(vis, (x1, y1), (x2, y2), green)

    cv2.imshow(win, vis)

###############################################################################
# Test Main
###############################################################################

if __name__ == '__main__':
    if len(sys.argv) < 3:
        print "No filenames specified"
        print "USAGE: find_obj.py <image1> <image2>"
        sys.exit(1)

    fn1 = sys.argv[1]
    fn2 = sys.argv[2]

    img1 = cv2.imread(fn1, 0)
    img2 = cv2.imread(fn2, 0)

    if img1 is None:
        print 'Failed to load fn1:', fn1
        sys.exit(1)

    if img2 is None:
        print 'Failed to load fn2:', fn2
        sys.exit(1)

    kp_pairs = match_images(img1, img2)

    if kp_pairs:
        draw_matches('find_obj', kp_pairs, img1, img2)
    else:
        print "No matches found"

    cv2.waitKey()
    cv2.destroyAllWindows()

In case someone comes along in the future, here's a small sample doing this with openCV. It's based on the opencv sample, but (in my opinion), this is a bit clearer, so I'm including it as well.

Tested with openCV 2.4.4

#!/usr/bin/env python

'''
Uses SURF to match two images.
  Finds common features between two images and draws them

Based on the sample code from opencv:
  samples/python2/find_obj.py

USAGE
  find_obj.py <image1> <image2>
'''

import sys

import numpy
import cv2


###############################################################################
# Image Matching
###############################################################################

def match_images(img1, img2, img1_features=None, img2_features=None):
    """Given two images, returns the matches"""
    detector = cv2.SURF(3200)
    matcher = cv2.BFMatcher(cv2.NORM_L2)

    if img1_features is None:
        kp1, desc1 = detector.detectAndCompute(img1, None)
    else:
        kp1, desc1 = img1_features

    if img2_features is None:
        kp2, desc2 = detector.detectAndCompute(img2, None)
    else:
        kp2, desc2 = img2_features

    #print 'img1 - %d features, img2 - %d features' % (len(kp1), len(kp2))

    raw_matches = matcher.knnMatch(desc1, trainDescriptors=desc2, k=2)
    kp_pairs = filter_matches(kp1, kp2, raw_matches)
    return kp_pairs


def filter_matches(kp1, kp2, matches, ratio=0.75):
    """Filters features that are common to both images"""
    mkp1, mkp2 = [], []
    for m in matches:
        if len(m) == 2 and m[0].distance < m[1].distance * ratio:
            m = m[0]
            mkp1.append(kp1[m.queryIdx])
            mkp2.append(kp2[m.trainIdx])
    kp_pairs = zip(mkp1, mkp2)
    return kp_pairs


###############################################################################
# Match Diplaying
###############################################################################

def draw_matches(window_name, kp_pairs, img1, img2):
    """Draws the matches"""
    mkp1, mkp2 = zip(*kp_pairs)

    H = None
    status = None

    if len(kp_pairs) >= 4:
        p1 = numpy.float32([kp.pt for kp in mkp1])
        p2 = numpy.float32([kp.pt for kp in mkp2])
        H, status = cv2.findHomography(p1, p2, cv2.RANSAC, 5.0)

    if len(kp_pairs):
        explore_match(window_name, img1, img2, kp_pairs, status, H)


def explore_match(win, img1, img2, kp_pairs, status=None, H=None):
    """Draws lines between the matched features"""
    h1, w1 = img1.shape[:2]
    h2, w2 = img2.shape[:2]
    vis = numpy.zeros((max(h1, h2), w1 + w2), numpy.uint8)
    vis[:h1, :w1] = img1
    vis[:h2, w1:w1 + w2] = img2
    vis = cv2.cvtColor(vis, cv2.COLOR_GRAY2BGR)

    if H is not None:
        corners = numpy.float32([[0, 0], [w1, 0], [w1, h1], [0, h1]])
        reshaped = cv2.perspectiveTransform(corners.reshape(1, -1, 2), H)
        reshaped = reshaped.reshape(-1, 2)
        corners = numpy.int32(reshaped + (w1, 0))
        cv2.polylines(vis, [corners], True, (255, 255, 255))

    if status is None:
        status = numpy.ones(len(kp_pairs), numpy.bool_)
    p1 = numpy.int32([kpp[0].pt for kpp in kp_pairs])
    p2 = numpy.int32([kpp[1].pt for kpp in kp_pairs]) + (w1, 0)

    green = (0, 255, 0)
    red = (0, 0, 255)
    for (x1, y1), (x2, y2), inlier in zip(p1, p2, status):
        if inlier:
            col = green
            cv2.circle(vis, (x1, y1), 2, col, -1)
            cv2.circle(vis, (x2, y2), 2, col, -1)
        else:
            col = red
            r = 2
            thickness = 3
            cv2.line(vis, (x1 - r, y1 - r), (x1 + r, y1 + r), col, thickness)
            cv2.line(vis, (x1 - r, y1 + r), (x1 + r, y1 - r), col, thickness)
            cv2.line(vis, (x2 - r, y2 - r), (x2 + r, y2 + r), col, thickness)
            cv2.line(vis, (x2 - r, y2 + r), (x2 + r, y2 - r), col, thickness)
    vis0 = vis.copy()
    for (x1, y1), (x2, y2), inlier in zip(p1, p2, status):
        if inlier:
            cv2.line(vis, (x1, y1), (x2, y2), green)

    cv2.imshow(win, vis)

###############################################################################
# Test Main
###############################################################################

if __name__ == '__main__':
    if len(sys.argv) < 3:
        print "No filenames specified"
        print "USAGE: find_obj.py <image1> <image2>"
        sys.exit(1)

    fn1 = sys.argv[1]
    fn2 = sys.argv[2]

    img1 = cv2.imread(fn1, 0)
    img2 = cv2.imread(fn2, 0)

    if img1 is None:
        print 'Failed to load fn1:', fn1
        sys.exit(1)

    if img2 is None:
        print 'Failed to load fn2:', fn2
        sys.exit(1)

    kp_pairs = match_images(img1, img2)

    if kp_pairs:
        draw_matches('find_obj', kp_pairs, img1, img2)
    else:
        print "No matches found"

    cv2.waitKey()
    cv2.destroyAllWindows()
橘虞初梦 2024-12-18 09:56:46

如前所述,像 SIFT 和 SURF 这样的算法包含一个特征点,该特征点对于许多失真和一个描述符是不变的,该描述符旨在对特征点及其周围环境进行鲁棒建模。

后者越来越多地用于图像分类和识别,通常称为“词袋”或“视觉词”方法。

以最简单的形式,可以从所有图像的所有描述符收集所有数据并对它们进行聚类,例如使用 k 均值。然后,每个原始图像都具有有助于多个聚类的描述符。这些簇的质心,即视觉词,可以用作图像的新描述符。然后可以将它们用于具有倒排文件设计的架构中。

该方法允许软匹配和一定程度的概括,例如检索带有飞机的所有图像。

  • VLfeat 网站包含一个优秀的 SIFT库,这是此方法的一个很好的演示,对 caltech 101 数据集进行分类。

  • Caltech 本身提供 Matlab/C++ 软件以及相关出版物。

  • LEAR

    的工作也是一个好的

As said, algorithms like SIFT and SURF contain a feature point, which is invariant to a number of distortions and a descriptor, which aims to robustly model the feature point its surroundings.

The latter is increasingly used for image categorization and identification in what is commonly known as the "bag of word" or "visual words" approach.

In the most simple form, one can collect all data from all descriptors from all images and cluster them, for example using k-means. Every original image then has descriptors that contribute to a number of clusters. The centroids of these clusters, i.e. the visual words, can be used as a new descriptor for the image. These can then be used in an architecture with an inverted file design.

This approach allows for soft matching and for a certain amount generalization, e.g retrieve all images with airplanes.

  • The VLfeat website contains, next to an excellent SIFT library, a nice demo of this approach, classifying the caltech 101 dataset.

  • Caltech itself offers Matlab/C++ software together with relevant publications.

  • Also a good start is the work by LEAR

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文