打开简历快照指向特定尺寸的矩形

发布于 2025-01-28 23:59:01 字数 2186 浏览 3 评论 0原文

我试图在具有旋转和翻译方差的质量下降质量页面上检测某种类型的图像。我需要从页面上“裁剪”检测到的图像，因此我需要旋转和检测到的图像的坐标。例如，已在A4页面上拍摄的图像。

我正在使用SIFT检测对象扫描页面。这些图像可以旋转和翻译，但没有她看上去或有任何视角失真。我正在使用经典（SIFT，SURF，ORB等）方法，但是它假设透视转换是为了创建边界多边形的4个点。此处的问题是因为关键点不能完美地排列（由于图像质量的变化，投影假定空间失真，并且多边形被正确扭曲。

我想尝试的方法是“捕捉”检测到的多边形点/尺寸/ 的旋转角度

输入

在页面上确定图像
图像的区域应允许我但是，他们假设样品的点是等距的，并且没有近似值
。

def detect(img, frame, detector):
    frame = frame.copy()
    kp1, desc1 = detector.detectAndCompute(img, None)
    kp2, desc2 = detector.detectAndCompute(frame, None)

    index_params = dict(algorithm=0, trees=5)
    search_params = dict()
    flann = cv2.FlannBasedMatcher(index_params, search_params)
    matches = flann.knnMatch(desc1, desc2, k=2)
    good_points = []
    for m, n in matches:
        if m.distance < 0.5 * n.distance:
            good_points.append(m)
            if(len(good_points) == 20):
                break

    # out_img=cv2.drawMatches(img, kp1, frame, kp2, good_points, flags=2, outImg=None)
    # plt.figure(figsize = (6*4, 8*4))
    # plt.imshow(out_img)        
    
    if len(good_points) > 10: # at least 6 matches are required
        # Get the matching points
        query_pts = np.float32([kp1[m.queryIdx].pt for m in good_points]).reshape(-1, 1, 2)
        train_pts = np.float32([kp2[m.trainIdx].pt for m in good_points]).reshape(-1, 1, 2)
        
        matrix, mask = cv2.findHomography(query_pts, train_pts, cv2.RANSAC, 5.0)
        matches_mask = mask.ravel().tolist()
        h, w = img.shape
        pts = np.float32([[0, 0], [0, h], [w, h], [w, 0]]).reshape(-1, 1, 2)
        dst = cv2.perspectiveTransform(pts, matrix)
        
        
        overlayImage = cv2.polylines(frame, [np.int32(dst)], True, (0, 0, 0), 3)
        plt.figure(figsize = (6*2, 8*2))
        plt.imshow(overlayImage)

orb = cv2.SIFT_create()
for frame in frames:
    detect(img, frame, orb)

我们试图在其上检测到的图像的页面的示例。
这
是

原文

I am attempting to detect an image of a certain type on a page of degraded quality, that has rotational and translational variance. I need to "cropped" the detected image out of the page, so I will need the rotation and coords of the detected image. For example an image that has been photocopied on an A4 page.

I am using SIFT to detect objects the scanned page. These images can be rotated and translated but are not sheered or have any perspective distortion. I am using the classic (SIFT, SURF, ORB, etc) approach however it assumes perspective transform in order to create the 4 points of the bounding polygon. The issue here is since the key points dont line up perfectly (due to varying image qualities, the projection assumes spatial distortion and the polygon is rightfully distorted.

The approach I want to try is to "snap" the detected polygon points to the dimensions/area of the input image. This should allow me to determine the angle of rotation and translation of the image on the page.

Things I have tried are (And Failed):

Filter key point to remove outliers to minimise distortion.
Affine/Rotations/etc matrices, however they assume point from the samples are equidistant and dont do approximations.
ICP: Would probably work, but there is not enough samples and it seems to be more of an approach than a method. I am certain there is a better way.

def detect(img, frame, detector):
    frame = frame.copy()
    kp1, desc1 = detector.detectAndCompute(img, None)
    kp2, desc2 = detector.detectAndCompute(frame, None)

    index_params = dict(algorithm=0, trees=5)
    search_params = dict()
    flann = cv2.FlannBasedMatcher(index_params, search_params)
    matches = flann.knnMatch(desc1, desc2, k=2)
    good_points = []
    for m, n in matches:
        if m.distance < 0.5 * n.distance:
            good_points.append(m)
            if(len(good_points) == 20):
                break

    # out_img=cv2.drawMatches(img, kp1, frame, kp2, good_points, flags=2, outImg=None)
    # plt.figure(figsize = (6*4, 8*4))
    # plt.imshow(out_img)        
    
    if len(good_points) > 10: # at least 6 matches are required
        # Get the matching points
        query_pts = np.float32([kp1[m.queryIdx].pt for m in good_points]).reshape(-1, 1, 2)
        train_pts = np.float32([kp2[m.trainIdx].pt for m in good_points]).reshape(-1, 1, 2)
        
        matrix, mask = cv2.findHomography(query_pts, train_pts, cv2.RANSAC, 5.0)
        matches_mask = mask.ravel().tolist()
        h, w = img.shape
        pts = np.float32([[0, 0], [0, h], [w, h], [w, 0]]).reshape(-1, 1, 2)
        dst = cv2.perspectiveTransform(pts, matrix)
        
        
        overlayImage = cv2.polylines(frame, [np.int32(dst)], True, (0, 0, 0), 3)
        plt.figure(figsize = (6*2, 8*2))
        plt.imshow(overlayImage)

orb = cv2.SIFT_create()
for frame in frames:
    detect(img, frame, orb)

This is an example of a page with the image we are trying to detect on it.
Blue line: rectangle with correct size
Red Line: determines polygon using perspective transform

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

赴月观长安 2025-02-04 23:59:01

我偶然发现了一个帖子，该帖子向您展示了如何从一组点中提取最小边界框。这也非常有效，因为它也披露了旋转。

def detect_ICP(img, frame, detector):
frame = frame.copy()
kp1, desc1 = detector.detectAndCompute(img, None)
kp2, desc2 = detector.detectAndCompute(frame, None)

index_params = dict(algorithm=0, trees=5)
search_params = dict()
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(desc1, desc2, k=2)
matches = sorted(matches, key = lambda x:x[0].distance + 0.5 * x[1].distance)
good_points = []
for m, n in matches:
    if m.distance < 0.5 * n.distance:
        good_points.append(m)
    
out_img=cv2.drawMatches(img, kp1, frame, kp2, good_points, flags=2, outImg=None)
plt.figure(figsize = (6*4, 8*4))
plt.imshow(out_img)        

if len(good_points) > 10: # at least 6 matches are required
    # Get the matching points
    query_pts = np.float32([kp1[m.queryIdx].pt for m in good_points]).reshape(-1, 1, 2)
    train_pts = np.float32([kp2[m.trainIdx].pt for m in good_points]).reshape(-1, 1, 2)
    
    matrix, mask = cv2.findHomography(query_pts, train_pts, cv2.RANSAC, 5.0)
    # matches_mask = mask.ravel().tolist()
    h, w = img.shape
    pts = np.float32([[0, 0], [0, h], [w, h], [w, 0]]).reshape(-1, 1, 2)
    dst = cv2.perspectiveTransform(pts, matrix)
    
    # determine the minimum bounding box
    minAreaRect = cv2.minAreaRect(dst)    # This will have size and rotation information
    rotatedBox = cv2.boxPoints(minAreaRect)
    rotatedBox = np.float32(rotatedBox).reshape(-1, 1, 2)
    
    overlayImage = cv2.polylines(frame, [np.int32(rotatedBox)], True, (0, 0, 0), 3)
    plt.figure(figsize = (6*2, 8*2))
    plt.imshow(overlayImage)

I stumbled on a post that show you how to extract the minimum bounding box from a set of points. This works really well as it also discloses the rotation.

def detect_ICP(img, frame, detector):
frame = frame.copy()
kp1, desc1 = detector.detectAndCompute(img, None)
kp2, desc2 = detector.detectAndCompute(frame, None)

index_params = dict(algorithm=0, trees=5)
search_params = dict()
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(desc1, desc2, k=2)
matches = sorted(matches, key = lambda x:x[0].distance + 0.5 * x[1].distance)
good_points = []
for m, n in matches:
    if m.distance < 0.5 * n.distance:
        good_points.append(m)
    
out_img=cv2.drawMatches(img, kp1, frame, kp2, good_points, flags=2, outImg=None)
plt.figure(figsize = (6*4, 8*4))
plt.imshow(out_img)        

if len(good_points) > 10: # at least 6 matches are required
    # Get the matching points
    query_pts = np.float32([kp1[m.queryIdx].pt for m in good_points]).reshape(-1, 1, 2)
    train_pts = np.float32([kp2[m.trainIdx].pt for m in good_points]).reshape(-1, 1, 2)
    
    matrix, mask = cv2.findHomography(query_pts, train_pts, cv2.RANSAC, 5.0)
    # matches_mask = mask.ravel().tolist()
    h, w = img.shape
    pts = np.float32([[0, 0], [0, h], [w, h], [w, 0]]).reshape(-1, 1, 2)
    dst = cv2.perspectiveTransform(pts, matrix)
    
    # determine the minimum bounding box
    minAreaRect = cv2.minAreaRect(dst)    # This will have size and rotation information
    rotatedBox = cv2.boxPoints(minAreaRect)
    rotatedBox = np.float32(rotatedBox).reshape(-1, 1, 2)
    
    overlayImage = cv2.polylines(frame, [np.int32(rotatedBox)], True, (0, 0, 0), 3)
    plt.figure(figsize = (6*2, 8*2))
    plt.imshow(overlayImage)

回复收藏 0 原文

~没有更多了~