使用Sift在许多图像中查找重复项

发布于 2025-02-01 09:12:21 字数 1714 浏览 1 评论 0原文

我有一个源图像和数千个上下文图像。我想知道源图像是任何目的地图像的裁剪,模糊还是旋转版本。我正在使用基于SIFT的图像匹配算法来识别重复的图像。它拍摄2张图像,并说出1张其他图像的裁剪,模糊或旋转版本是:

def get_sift_results(img1,img2):
  MIN_MATCH_COUNT = 1000

  img1 = cv2.imread(img1, 0)          # queryImage
  img2 = cv2.imread(img2, 0) # trainImage

  # Initiate SIFT detector
  sift = cv2.xfeatures2d.SIFT_create()

  # find the keypoints and descriptors with SIFT
  kp1, des1 = sift.detectAndCompute(img1,None)
  kp2, des2 = sift.detectAndCompute(img2,None)



  FLANN_INDEX_KDTREE = 0
  index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
  search_params = dict(checks = 50)

  flann = cv2.FlannBasedMatcher(index_params, search_params)

  matches = flann.knnMatch(des1, des2, k=2)

  # store all the good matches as per Lowe's ratio test.
  good = []
  for m,n in matches:
      if m.distance < 0.7*n.distance:
          good.append(m)


  if len(good)>MIN_MATCH_COUNT:
      src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,1,2)
      dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,1,2)

      M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC,5.0)
      matchesMask = mask.ravel().tolist()

      h,w, = img1.shape
      pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).reshape(-1,1,2)
      dst = cv2.perspectiveTransform(pts,M)

      img2 = cv2.polylines(img2,[np.int32(dst)],True,255,3, cv2.LINE_AA)
      print("Duplictae Images")
      return True
  else:
      print ("Not enough matches are found - %d/%d" % (len(good),MIN_MATCH_COUNT))
      matchesMask = None
      return False

该解决方案效果很好,并以每对图像的5-6秒为单位提供结果。如何优化它以从数千幅图像中找到重复项?我尝试为每个图像存储描述符,并将其传递给Flann,但这也需要大量时间和空间。任何帮助将不胜感激。谢谢!

I have a source image and thousands of context images. I want to know if the source image is a cropped, blurred, or rotated version of any of the destination images. I am using SIFT based image matching algorithm to identify duplicated images. It takes 2 images and tells whether 1 is the cropped, blurred or rotated version of the other images like this:

def get_sift_results(img1,img2):
  MIN_MATCH_COUNT = 1000

  img1 = cv2.imread(img1, 0)          # queryImage
  img2 = cv2.imread(img2, 0) # trainImage

  # Initiate SIFT detector
  sift = cv2.xfeatures2d.SIFT_create()

  # find the keypoints and descriptors with SIFT
  kp1, des1 = sift.detectAndCompute(img1,None)
  kp2, des2 = sift.detectAndCompute(img2,None)



  FLANN_INDEX_KDTREE = 0
  index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
  search_params = dict(checks = 50)

  flann = cv2.FlannBasedMatcher(index_params, search_params)

  matches = flann.knnMatch(des1, des2, k=2)

  # store all the good matches as per Lowe's ratio test.
  good = []
  for m,n in matches:
      if m.distance < 0.7*n.distance:
          good.append(m)


  if len(good)>MIN_MATCH_COUNT:
      src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,1,2)
      dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,1,2)

      M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC,5.0)
      matchesMask = mask.ravel().tolist()

      h,w, = img1.shape
      pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).reshape(-1,1,2)
      dst = cv2.perspectiveTransform(pts,M)

      img2 = cv2.polylines(img2,[np.int32(dst)],True,255,3, cv2.LINE_AA)
      print("Duplictae Images")
      return True
  else:
      print ("Not enough matches are found - %d/%d" % (len(good),MIN_MATCH_COUNT))
      matchesMask = None
      return False

The solution works fine and gives the results in 5-6 seconds per pair of images. How can I optimize it to find duplicates from thousands of images? I have tried storing the descriptors for each image and passing them to flann but that is also taking lots of time and space. Any help will be highly appreciated. Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文