Tesseract和OpENCV给出错误 - 图像太大
我在码头容器中运行AA Spring Boot应用程序,该应用程序安装了Tesseract。
在Java程序中,我正在使用OpenCV如下预处理图像
MatOfByte mat = new MatOfByte(myByteArraySource);
Mat adaptive = new Mat();
Imgproc.adaptiveThreshold(mat, adaptive, 255, Imgproc.ADAPTIVE_THRESH_GAUSSIAN_C, Imgproc.THRESH_BINARY, 13, 7);
// convert to BufferedImage
MatOfByte matOfByte = new MatOfByte();
Imgcodecs.imencode(".png", adaptive, matOfByte);
BufferedImage bf = ImageIO.read(new ByteArrayInputStream(matOfByte.toArray()));
// run tesseract
tesseract.doOCR(bf);
,但运行tesseract.doocr(bf);
> 给出错误:图像太大:(1,146327)
有什么想法我做错了什么? 奇怪的是,文件大小只有146kb,所以我不知道为什么Tesseract认为它太大了?
另外,如果我删除AdaptivEthreshold步骤并直接在垫子上执行imencode
,则Tesseract扫描可行。
我尝试使用OpenJDK:11和OpenJDK:8-JDK-Alpine,它们都会给出相同的错误。
任何帮助都将受到赞赏。
I have a a spring boot application running in a docker container which has tesseract installed on it.
In the java program, I am using opencv to preprocess an image as follows
MatOfByte mat = new MatOfByte(myByteArraySource);
Mat adaptive = new Mat();
Imgproc.adaptiveThreshold(mat, adaptive, 255, Imgproc.ADAPTIVE_THRESH_GAUSSIAN_C, Imgproc.THRESH_BINARY, 13, 7);
// convert to BufferedImage
MatOfByte matOfByte = new MatOfByte();
Imgcodecs.imencode(".png", adaptive, matOfByte);
BufferedImage bf = ImageIO.read(new ByteArrayInputStream(matOfByte.toArray()));
// run tesseract
tesseract.doOCR(bf);
but running tesseract.doOCR(bf);
gives error: Image too large: (1, 146327)
Any ideas what I am doing wrong?
What's strange is the file size is only 146kb so I don't know why tesseract considers it too large?
Also, if I remove the adaptiveThreshold step and perform imencode
on mat directly, then the tesseract scan works.
I have tried with both openjdk:11 and openjdk:8-jdk-alpine, they both give the same error.
Any help is appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
因此,我意识到自己的错误是将错误(1,146327)解释为文件大小,实际上是尺寸。
我需要使用
imdecode
函数mat mat = imgcodecs.imdecode(new Matofbyte(os.tobytearray(),..,..,.. 。)
正确返回大小(例如400,400而不是1,146327)
So I realize my mistake was interpreting the error (1, 146327) as the file size, where it's really the dimensions.
Instead of creating and using
MatOfByte
directly, I need to use theimdecode
functionMat mat = Imgcodecs.imdecode(new MatOfByte(os.toByteArray(), ...)
This correctly returns the size (e.g. 400,400 rather than 1,146327)