当前位置：文江博客话题详情

如何将图像裁剪到仅使用Python OpenCV的文本部分？

发布于 2025-01-28 01:35:00 字数 445 浏览 1 评论 0 原文

我想裁剪图像以仅提取文本部分。有成千上万的尺寸不同，所以我无法硬码坐标。我试图删除左侧和底部的不需要线。我该怎么做？

原始	预期

原文

I want to crop the image to only extract the text sections. There are thousands of them with different sizes so I can't hardcode coordinates. I'm trying to remove the unwanted lines on the left and on the bottom. How can I do this?

Original	Expected

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

So要识趣 2025-02-04 01:35:00

通过查找图像中的所有非零点来确定最小跨度的边界框。最后，使用此边界框裁剪您的图像。在这里找到轮廓是耗时且不必要的，尤其是因为您的文本是与轴线一致的。您可以通过组合 cv2.findnonzero 和 cv2.BoundingRect 来实现目标。

希望这会起作用！：

import numpy as np
import cv2
img = cv2.imread(r"W430Q.png")
  # Read in the image and convert to grayscale
img = img[:-20, :-20]  # Perform pre-cropping
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = 255*(gray < 50).astype(np.uint8)  # To invert the text to white
gray = cv2.morphologyEx(gray, cv2.MORPH_OPEN, np.ones(
    (2, 2), dtype=np.uint8))  # Perform noise filtering
coords = cv2.findNonZero(gray)  # Find all non-zero points (text)
x, y, w, h = cv2.boundingRect(coords)  # Find minimum spanning bounding box
# Crop the image - note we do this on the original image
rect = img[y:y+h, x:x+w]
cv2.imshow("Cropped", rect)  # Show it
cv2.waitKey(0)
cv2.destroyAllWindows()

在上面的代码中，代码是我将阈值设置为50以下以使深文本为白色的地方。但是，由于这输出了二进制图像，因此我将其转换为 uint8 ，然后缩放255。文本有效地倒置。

然后，使用 cv2.findnonzero，我们发现了此图像的所有非零位置。然后，我们将其传递给 cv2.BoundingRect ，它返回了左上角边界框以及其宽度和高度。最后，我们可以利用它来裁剪图像。这是在原始图像上完成的，而不是倒版本。

Determine the least spanning bounding box by finding all the non-zero points in the image. Finally, crop your image using this bounding box. Finding the contours is time-consuming and unnecessary here, especially because your text is axis-aligned. You may accomplish your goal by combining cv2.findNonZero and cv2.boundingRect.

Hope this will work ! :

import numpy as np
import cv2
img = cv2.imread(r"W430Q.png")
  # Read in the image and convert to grayscale
img = img[:-20, :-20]  # Perform pre-cropping
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = 255*(gray < 50).astype(np.uint8)  # To invert the text to white
gray = cv2.morphologyEx(gray, cv2.MORPH_OPEN, np.ones(
    (2, 2), dtype=np.uint8))  # Perform noise filtering
coords = cv2.findNonZero(gray)  # Find all non-zero points (text)
x, y, w, h = cv2.boundingRect(coords)  # Find minimum spanning bounding box
# Crop the image - note we do this on the original image
rect = img[y:y+h, x:x+w]
cv2.imshow("Cropped", rect)  # Show it
cv2.waitKey(0)
cv2.destroyAllWindows()

in above code from forth line of code is where I set the threshold below 50 to make the dark text white. However, because this outputs a binary image, I convert to uint8, then scale by 255. The text is effectively inverted.

Then, using cv2.findNonZero, we discover all of the non-zero locations for this image.We then passed this to cv2.boundingRect, which returns the top-left corner of the bounding box, as well as its width and height. Finally, we can utilise this to crop the image. This is done on the original image, not the inverted version.

回复收藏 0 原文

葵雨 2025-02-04 01:35:00

这是一种简单的方法：

获取二进制图像。 加载图像，，，然后 OTSU的阈值获得二进制黑色/白色图像。
卸下水平线。由于我们只是尝试提取文本，因此我们删除水平线以帮助我们下一步，因此不正确的轮廓不会合并在一起。
将文本合并到一个轮廓中。这个想法是，彼此相邻的字符是文本墙的一部分。因此，我们可以
查找轮廓并提取ROI。 noreferrer“>查找轮廓，按区域排序轮廓，然后使用numpy切片提取最大的轮廓ROI。

这是每个步骤的可视化：

中的水平线

二进制图像 `- ＆gt;` 绿色1	2

扩张以组合成单个轮廓 - /code>检测到的ROI在绿色

3	4

结果

代码

import cv2
import numpy as np

# Load image, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3, 3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Remove horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (25,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=1)
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(thresh, [c], -1, 0, -1)

# Dilate to merge into a single contour
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,30))
dilate = cv2.dilate(thresh, vertical_kernel, iterations=3)

# Find contours, sort for largest contour and extract ROI
cnts, _ = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[-2:]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:-1]
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 4)
    ROI = original[y:y+h, x:x+w]
    break

cv2.imshow('image', image)
cv2.imshow('dilate', dilate)
cv2.imshow('thresh', thresh)
cv2.imshow('ROI', ROI)
cv2.waitKey()

Here's a simple approach:

Obtain binary image. Load the image, grayscale, Gaussian blur, then Otsu's threshold to obtain a binary black/white image.
Remove horizontal lines. Since we're trying to only extract text, we remove horizontal lines to aid us in our next step so incorrect contours will not merge together.
Merge text into a single contour. The idea is that characters which are adjacent to each other are part of the wall of text. So we can dilate individual contours together to obtain a single contour to extract.
Find contours and extract ROI. We find contours, sort contours by area, then extract the largest contour ROI using Numpy slicing.

Here's the visualization of each step:

Binary image -> Removed horizontal lines in green

1	2

Dilate to combine into a single contour -> Detected ROI to extract in green

3	4

Result

Code

import cv2
import numpy as np

# Load image, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3, 3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Remove horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (25,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=1)
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(thresh, [c], -1, 0, -1)

# Dilate to merge into a single contour
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,30))
dilate = cv2.dilate(thresh, vertical_kernel, iterations=3)

# Find contours, sort for largest contour and extract ROI
cnts, _ = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[-2:]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:-1]
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 4)
    ROI = original[y:y+h, x:x+w]
    break

cv2.imshow('image', image)
cv2.imshow('dilate', dilate)
cv2.imshow('thresh', thresh)
cv2.imshow('ROI', ROI)
cv2.waitKey()

回复收藏 0 原文

~没有更多了~