使用 .NET 进行屏幕抓取
我有大约 100K 扫描图像 [pdf 格式/tif、jpg],需要从中读取数据然后上传到硬盘。我计划推出一个小型应用程序,帮助自动化数据输入工作。
市场上是否有免费的屏幕抓取工具可以帮助自动化该过程。
我最初的想法是一张一张地读取每张图像并通过应用程序提供数据。但是,查看并逐一提供数据肯定需要一些时间,并且在读取图像时也可能会出现与人为相关的错误。
所有的想法/方法都会非常有帮助。
我需要在下周开始之前提供一些解决方案。
I have around 100K scanned images [in pdf format/tif, jpg] from which data needs to be read and then uploaded to a hard drive. I am planning to come with a small application that will help to automate the data entry work.
Is there are free screen scraping tool avaialable in the market that will help in automating the process.
What I thought initially was to read each image one by one and feed data through an application. But to see and then feed data one-by-one will definitely take some time and there are chances of human related error as well while reading the images.
All ideas / methods will be very helpful.
I need to provide some solution by start of next week.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
屏幕抓取正在下载网页并从中提取信息。
要从图像中提取文本,您需要执行称为“光学字符识别”或简称为“OCR”的操作。有许多可用的软件产品可以为您完成此操作。
Screen Scraping is downloading a webpage and extracting information from it.
To extract text from an image, you need to perform something called Optical Character Recognition or OCR for short. There are many software products available that will do this for you.
通过扫描或传真创建的 PDF 文件具有图像内容(即文本的图片)。如果您的 PDF 是通过基于文本的应用程序的打印驱动程序创建的(Word 打印为 PDF,例如“Bullzip”),那么它将具有可以“抓取”的文本内容。我在以前的版本中获得了良好的体验PDFConverter,尽管还有其他产品可以满足您的需求。
PDF files which are created by way of scanning or faxing have image content (it is a picture of the text). If your PDFs were created through a print driver from a text based application (Word printed as a PDF, by say "Bullzip", then it would have text content that could be 'scraped'. I have had a good experience with a previous version of PDFConverter, though there are other products that will do what you want.