使用Sift在许多图像中查找重复项

发布于 2025-02-01 09:12:21 字数 1714 浏览 1 评论 0原文

我有一个源图像和数千个上下文图像。我想知道源图像是任何目的地图像的裁剪，模糊还是旋转版本。我正在使用基于SIFT的图像匹配算法来识别重复的图像。它拍摄2张图像，并说出1张其他图像的裁剪，模糊或旋转版本是：

def get_sift_results(img1,img2):
  MIN_MATCH_COUNT = 1000

  img1 = cv2.imread(img1, 0)          # queryImage
  img2 = cv2.imread(img2, 0) # trainImage

  # Initiate SIFT detector
  sift = cv2.xfeatures2d.SIFT_create()

  # find the keypoints and descriptors with SIFT
  kp1, des1 = sift.detectAndCompute(img1,None)
  kp2, des2 = sift.detectAndCompute(img2,None)



  FLANN_INDEX_KDTREE = 0
  index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
  search_params = dict(checks = 50)

  flann = cv2.FlannBasedMatcher(index_params, search_params)

  matches = flann.knnMatch(des1, des2, k=2)

  # store all the good matches as per Lowe's ratio test.
  good = []
  for m,n in matches:
      if m.distance < 0.7*n.distance:
          good.append(m)


  if len(good)>MIN_MATCH_COUNT:
      src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,1,2)
      dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,1,2)

      M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC,5.0)
      matchesMask = mask.ravel().tolist()

      h,w, = img1.shape
      pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).reshape(-1,1,2)
      dst = cv2.perspectiveTransform(pts,M)

      img2 = cv2.polylines(img2,[np.int32(dst)],True,255,3, cv2.LINE_AA)
      print("Duplictae Images")
      return True
  else:
      print ("Not enough matches are found - %d/%d" % (len(good),MIN_MATCH_COUNT))
      matchesMask = None
      return False

该解决方案效果很好，并以每对图像的5-6秒为单位提供结果。如何优化它以从数千幅图像中找到重复项？我尝试为每个图像存储描述符，并将其传递给Flann，但这也需要大量时间和空间。任何帮助将不胜感激。谢谢！

原文

I have a source image and thousands of context images. I want to know if the source image is a cropped, blurred, or rotated version of any of the destination images. I am using SIFT based image matching algorithm to identify duplicated images. It takes 2 images and tells whether 1 is the cropped, blurred or rotated version of the other images like this:

def get_sift_results(img1,img2):
  MIN_MATCH_COUNT = 1000

  img1 = cv2.imread(img1, 0)          # queryImage
  img2 = cv2.imread(img2, 0) # trainImage

  # Initiate SIFT detector
  sift = cv2.xfeatures2d.SIFT_create()

  # find the keypoints and descriptors with SIFT
  kp1, des1 = sift.detectAndCompute(img1,None)
  kp2, des2 = sift.detectAndCompute(img2,None)



  FLANN_INDEX_KDTREE = 0
  index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
  search_params = dict(checks = 50)

  flann = cv2.FlannBasedMatcher(index_params, search_params)

  matches = flann.knnMatch(des1, des2, k=2)

  # store all the good matches as per Lowe's ratio test.
  good = []
  for m,n in matches:
      if m.distance < 0.7*n.distance:
          good.append(m)


  if len(good)>MIN_MATCH_COUNT:
      src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,1,2)
      dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,1,2)

      M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC,5.0)
      matchesMask = mask.ravel().tolist()

      h,w, = img1.shape
      pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).reshape(-1,1,2)
      dst = cv2.perspectiveTransform(pts,M)

      img2 = cv2.polylines(img2,[np.int32(dst)],True,255,3, cv2.LINE_AA)
      print("Duplictae Images")
      return True
  else:
      print ("Not enough matches are found - %d/%d" % (len(good),MIN_MATCH_COUNT))
      matchesMask = None
      return False

The solution works fine and gives the results in 5-6 seconds per pair of images. How can I optimize it to find duplicates from thousands of images? I have tried storing the descriptors for each image and passing them to flann but that is also taking lots of time and space. Any help will be highly appreciated. Thanks!

分享到QQ

分享到微博