使用关键点特征匹配 +用于拉直文档的单应性 (Aadhaar)

发布于 2025-01-14 21:30:47 字数 3332 浏览 0 评论 0原文

您好,我正在尝试创建一个 OCR,其​​中模型应该能够读取上传的文档。然而,很多时候,上传的文件是歪斜或倾斜的。我计划根据模板拉直文档和/或调整文档大小。

为了实现这一目标,我打算使用特征映射和单应性。然而,每当我计算关键点和描述符(使用 ORB),并尝试使用强力匹配来匹配它们时,似乎没有一个特征匹配。这是我迄今为止使用的代码及其结果。如果我遗漏了某些东西或以某种不正确的方式做事,有人可以指出我正确的方向吗?

def straighten_image(ORIG_IMG, IMG2):
    # read both the images:
    orig_image = cv2.imread(ORIG_IMG)
    img_input = cv2.imread(IMG2)
    
    orig_gray_scale = cv2.cvtColor(orig_image, cv2.COLOR_BGR2GRAY)
    gray_scale_img = cv2.cvtColor(img_input, cv2.COLOR_BGR2GRAY)
    
    #Detect ORB features and compute descriptors
    MAX_NUM_FEATURES = 100
    orb = cv2.ORB_create(MAX_NUM_FEATURES)
    keypoints1, descriptors1 = orb.detectAndCompute(orig_gray_scale, None)
    keypoints2, descriptors2= orb.detectAndCompute(gray_scale_img, None)
    
    #display image with keypoints
    orig_wid_decriptors = cv2.drawKeypoints(orig_gray_scale, keypoints1, outImage = np.array([]), color= (255, 0, 0), flags= cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
    inp_wid_decriptors = cv2.drawKeypoints(img_input, keypoints2, outImage = np.array([]), color= (255, 0, 0), flags= cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

    #Match features
    
    matcher = cv2.DescriptorMatcher_create(cv2.DESCRIPTOR_MATCHER_BRUTEFORCE_HAMMING)
    matches = matcher.match(descriptors1, descriptors2, None)
    
    print(type(matches))
    
    #sort matches
#     matches.sort(key=lambda x: x.distance, reverse=False)
    
    
    #Remove not-so-good matches
    numGoodMatches = int(len(matches)*0.1)
    matches = matches[:numGoodMatches]
    
    #Draw Top matches
    im_matches = cv2.drawMatches(orig_gray_scale, keypoints1, gray_scale_img, keypoints2, matches, None)
    
    cv2.imshow("", im_matches)
    cv2.waitKey(0)
    
    #Homography
    points1 = np.zeros((len(matches), 2), dtype = np.float32)
    points2 = np.zeros((len(matches), 2), dtype = np.float32)
    
    for i, match in enumerate(matches):
        points1[i, :] = keypoints1[match.queryIdx].pt
        points2[i, :] = keypoints2[match.trainIdx].pt
        
    #Find homography:
    h, mask = cv2.findHomography(points2, points1, cv2.RANSAC)
    
    #Warp image
    # Use homography to warp image
    height, width = orig_gray_scale.shape
    inp_reg = cv2.warpPerspective(gray_scale_img, h, (width, height), borderValue = 255)
    
    return inp_reg


import cv2
import matplotlib.pyplot as plt
import numpy as np
template = "template_aadhaar.jpg"
test = "test.jpeg"

str_img = straighten_image(template, test)

cv2.imshow("", str_img)
cv2.waitKey(0)

这是模板图像

以及需要拉直的测试图像

匹配功能

编辑:如果我使用我自己的身份证(完全笔直)作为模板,并尝试对齐倾斜的同一张身份证,它匹配特征并完美地重新对齐倾斜的图像。但是,我需要模型能够根据模板重新对齐任何其他 ID 卡。对于任何 ID,我的意思是细节可能不同,但位置和字体将完全相同。

编辑#2:按照@Olli 的建议,我尝试使用仅包含所有 Aadhaar 卡相同功能的模板。附图片。但特征匹配仍然有点随意。

模板更改值已删除

Hi I'm trying to create an OCR where the model should be able to read an uploaded document. However, lot of times, the documents uploaded are skewed or tilted. I plan to straighten and/or resize the document based on a template.

To achieve this, I intend to use feature mapping and homography. However, whenever I calculate my keypoints and descriptors (using ORB), and try to match them using Brute Force Matching, none of the features seem to match. Here's the code that I've used so far and the results with it. Can someone point me in the right direction if I'm missing something or doing it in a certain incorrect way?

def straighten_image(ORIG_IMG, IMG2):
    # read both the images:
    orig_image = cv2.imread(ORIG_IMG)
    img_input = cv2.imread(IMG2)
    
    orig_gray_scale = cv2.cvtColor(orig_image, cv2.COLOR_BGR2GRAY)
    gray_scale_img = cv2.cvtColor(img_input, cv2.COLOR_BGR2GRAY)
    
    #Detect ORB features and compute descriptors
    MAX_NUM_FEATURES = 100
    orb = cv2.ORB_create(MAX_NUM_FEATURES)
    keypoints1, descriptors1 = orb.detectAndCompute(orig_gray_scale, None)
    keypoints2, descriptors2= orb.detectAndCompute(gray_scale_img, None)
    
    #display image with keypoints
    orig_wid_decriptors = cv2.drawKeypoints(orig_gray_scale, keypoints1, outImage = np.array([]), color= (255, 0, 0), flags= cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
    inp_wid_decriptors = cv2.drawKeypoints(img_input, keypoints2, outImage = np.array([]), color= (255, 0, 0), flags= cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

    #Match features
    
    matcher = cv2.DescriptorMatcher_create(cv2.DESCRIPTOR_MATCHER_BRUTEFORCE_HAMMING)
    matches = matcher.match(descriptors1, descriptors2, None)
    
    print(type(matches))
    
    #sort matches
#     matches.sort(key=lambda x: x.distance, reverse=False)
    
    
    #Remove not-so-good matches
    numGoodMatches = int(len(matches)*0.1)
    matches = matches[:numGoodMatches]
    
    #Draw Top matches
    im_matches = cv2.drawMatches(orig_gray_scale, keypoints1, gray_scale_img, keypoints2, matches, None)
    
    cv2.imshow("", im_matches)
    cv2.waitKey(0)
    
    #Homography
    points1 = np.zeros((len(matches), 2), dtype = np.float32)
    points2 = np.zeros((len(matches), 2), dtype = np.float32)
    
    for i, match in enumerate(matches):
        points1[i, :] = keypoints1[match.queryIdx].pt
        points2[i, :] = keypoints2[match.trainIdx].pt
        
    #Find homography:
    h, mask = cv2.findHomography(points2, points1, cv2.RANSAC)
    
    #Warp image
    # Use homography to warp image
    height, width = orig_gray_scale.shape
    inp_reg = cv2.warpPerspective(gray_scale_img, h, (width, height), borderValue = 255)
    
    return inp_reg


import cv2
import matplotlib.pyplot as plt
import numpy as np
template = "template_aadhaar.jpg"
test = "test.jpeg"

str_img = straighten_image(template, test)

cv2.imshow("", str_img)
cv2.waitKey(0)

This is the template image

and the test image that needs to be straightened

Matched features

EDIT: If I use my own ID-card (perfectly straight) as the template and try to align the same ID-card that is tilted, it matches the features and re-aligns the tilted image perfectly. However, I need the model to be able to re-align any other ID-card based on the template. By any ID, I mean the details could be different but the location and font would be exactly the same.

EDIT#2: As suggested by @Olli, I tried using a template with only those features that are same for all Aadhaar cards. Image attached. But still the feature matching is a bit arbitrary.

Template with changing values removed

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

诗笺 2025-01-21 21:30:47

特征映射尝试检测图像上最重要的特征并尝试匹配它们。仅当功能确实相同时这才有效。如果功能相似但不同,就会失败。

如果您有一些始终相同的功能(例如左上角的徽标),您可以尝试创建一个仅包含这些功能的模板,并在所有其他区域中留空,即删除人物、姓名和二维码,然后...

但是因为差异更多(“图像上的绿色区域内的印度政府,以及上面的其他图像,...)而不是相似之处,我会尝试根据角和/或边缘找到旋转形状。
例如:

  • 转换为灰度
  • 执行canny边缘检测
  • 检测角点,例如使用cv2.goodFeaturesToTrack。如果某些角被隐藏,请尝试使用霍夫线找到边。
  • 反扭曲

如果某些图像在反扭曲后旋转了 90、180 或 270 度,您可以使用滤镜找到橙色和绿色区域并旋转,使该区域再次位于顶部。

Feature mapping tries to detect the most significant features on an image and tries to match them. This only works if the features really are the same. If the features are similar but different, it will fail.

If you have some features that are always the same (e.g. the logo on the top left), you could try to create a template with only these features and blank in all other areas, i.e. remove the person and the name and the QR code and...

But because there are more differences ("Government of India inside the green area on image and above on the other,...) than similarities, I would try to find the rotation based on the corners and/or the edges of the shape.
For example:

  • convert to grayscale
  • perform canny edge detection
  • detect corners, e.g. using cv2.goodFeaturesToTrack. If some corners are hidden, try finding the sides using Hough lines instead.
  • undistort

If some images are rotated 90, 180 or 270 degrees after undistortion, you could use a filter to find the orange and green areas and rotate so that this area is at the top again.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文