大批量扫描和ocr自动解决方案?

发布于 2024-07-21 13:14:57 字数 365 浏览 18 评论 0原文

我们需要一个大容量扫描和 OCR 解决方案,

我们正在谈论每天数字化大约 4000 个文档,并使用 OCR(带有隐藏文本)将它们保存为 pdf 文件...

该解决方案应该让操作员扫描文档并自动保存文件到特定的网络资源,由将其上传到数据库的应用程序获取...

我们正在评估 kofax 的企业解决方案 http://www.kofax.com/

您还知道哪些其他产品?

有类似要求的经验吗?

任何开源(或至少可访问)解决方案?

com、activex api 支持吗?

We need a high volume scanning and ocr solution

we are talkin about digitalizing about 4000 documents a day, and saving them as pdf file with ocr (with hidden text)...

the solution should let the operators scan a document and automatically save the files to a specific network resource, to be taken by an app that uploads it to a DB...

we are evaluating an enterprise solution from kofax http://www.kofax.com/

what other products are you aware of?

any experience with similar requirements?

any open source (or at least accesible) solution?

com, activex api support?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

万劫不复 2024-07-28 13:14:57

有许多扫描产品供应商可以满足您的需求 - 扫描、索引、生成带有 OCR 覆盖层的 PDF(就我个人而言,我更喜欢 PDF 中的 OCR 底层)。 对于专门从事扫描的供应商来说,这些要求非常简单。 仅举几个除 Kofax 之外的其他供应商/产品:

  • EMC/Captiva 的 InputAccel 产品
  • Datacap
  • eCopy ShareScan
  • Verity/Cardiff/Autonomy

许多文档管理解决方案也具有内置扫描前端,但它们通常不如专用扫描前端功能强大。捕获产品。 几乎所有这些解决方案都具有 COM/ActiveX API 支持。 我不知道有任何开源扫描解决方案,但我也没有真正搜索过任何解决方案。

大多数扫描软件供应商确实使用“批量”或“容量”许可证。 通常,卷会在期限结束时续订(即每年 100 万页 - 每年自动续订,无需额外费用)。 因此,您不必严格按“每页”付费,因为如果您每年购买 100 万张图像的容量,而您最终只扫描了 500K 页,则您不会获得退款。 尽管不太常见,但一次性卷有可能不会自动续订,并且当它用完时,您将需要购买额外的卷。 大多数供应商正在放弃使用加密狗来控制数量,并转向软件许可。

关于 Kofax 的旁注:

Kofax 历来是通过增值经销商系统销售的,因此各种实施的质量可能存在很大差异。 此外,它具有高度可定制性,具有多种风格,并具有大量附加模块,因此一个客户的 Kofax 系统可能与其他系统显着不同。

Kofax 用于企业级系统,每年扫描和自动捕获数百万份文档。 它在文档扫描市场占有很大份额。 不,我不是 Kofax 的粉丝,如果我是的话我就不会提到竞争产品; 不过,我对它很熟悉。 与市场上的其他产品一样,它也有优点和缺点。 我意识到迈克尔只是转述他所听到的,但我不能不加评论就让这种笼统的概括通过。 说一个拥有很大比例市场份额的产品对于扫描“没有用或用户友好”有点像说“Windows 不是一个有用的服务器操作系统”。 这种概括过于宽泛。

干杯,

布莱恩

There are many vendors of scanning products that can do what you want - scan, index, generate PDF with OCR overlay (personally, I prefer OCR underlay in a PDF). Those requirements are pretty trivial for a vendor that specializes in scanning. To name just a few other vendors/products in addition to Kofax:

  • EMC/Captiva's InputAccel product
  • Datacap
  • eCopy ShareScan
  • Verity/Cardiff/Autonomy

Many document management solutions also have built-in scanning front ends but they're typically not as functional as the specialized capture products. Nearly all of these solutions have COM/ActiveX API support. I don't know of any open source solutions for scanning but I haven't ever really searched for any either.

Most of the scanning software vendors do use a "volume" or "capacity" license. Typically the volume renews at the end of the term (i.e. 1M pages a year - auto renewing each year without additional cost). Thus, you don't pay strictly "per page" in the sense that if you purchase a capacity of 1M images per year and you only end up scanning 500K pages you don't get a refund. It is possible, although much less common to have a one-time volume that doesn't automatically renew and when it runs out you would be required to purchase additional volume. Most vendors are moving away from dongles to control the volume and are moving to software licensing.

A side note about Kofax:

Kofax has historically been sold through a system of Value Added Resellers so the quality of various implementations can vary widely. In addition it is highly customizable and comes in a variety of flavors with lots of add-on modules so one customer's Kofax system can be significantly different from other systems.

Kofax is used in enterprise-grade systems for scanning and automatic capture of millions and millions of documents a year. It has a significant chunk of the document scanning market share. No, I'm not a Kofax fanboy, if I was I wouldn't have mentioned competitive products; however, I am very familiar with it. Like the other products on the market, it has strengths and weaknesses. I realize that Michael was just relaying what he had heard but I just couldn't let that sweeping generalization pass without comment. Saying a product that has a significant percentage of market share is "not useful or user friendly" for scanning is kind of like saying "Windows isn't a useful server operating system". It's just too broad of a generalization.

Cheers,

Brian

抚你发端 2024-07-28 13:14:57

Kofax 不是很有用或用户友好(根据我与县政府合作的同行的说法)。 足够了,但还不够好。

我们使用全 Adob​​e 解决方案。 详细信息如下(我不负责管理该区域,因此我必须为您收集一些信息)。

更新:我们使用

Adob​​e Acrobat Capture 3.0
两台带 ADF 的理光彩色扫描仪 IS760D
Acrobat Standard 或 Professional(取决于用户)

我们拥有内容丰富的库(近 6,000 个文档),其中包含数十万个扫描页面。 进行扫描的计算机上有一个我们购买的加密狗(250,000 次扫描,直到我们需要购买“更新”); 自从处理该问题的先生回家后,我就没有可用的成本,但我记得每页只有几美分。

我们经常扫描当天需要完成的数百页文档,并且我们可以毫无问题地完成该任务。

我们的一些成果(Web 前端或我们的库的某种形式)的链接可在

至于将这些 PDF 放入数据库,创建一个应用程序(可能是一个服务)来监视目录并抓取 Capture 运行后弹出的每个 PDF,将信息复制到数据库,然后删除是非常容易的或将其搬到新家。

Kofax is not very useful or user-friendly (per my counterparts working with the County). It's adequate, but not good.

We use an all Adobe solution. Details to follow (I'm not in charge of running that area, so I have to gather some information for you).

Update: We use

Adobe Acrobat Capture 3.0
Two RICOH Color Scanner IS760D with ADF
Acrobat Standard or Professional (depending upon the user)

We have an extensive library (almost 6,000 documents) with hundreds of thousands of scanned pages available. The computer doing the scanning has a dongle on it that we purchase (250,000 scans until we need to purchase an 'update'); I don't have the cost available since the gentleman that handles that has gone home for the day, but I remember it being in the micro-cents per page.

We often scan documents with several hundred pages that need to be done that day and we have no problem completing that task.

A link to some of our efforts (a web front-end, or sorts, to our library) is available at http://acequia.ccrfcd.org/FileLibrary2/FileLibrary.aspx if you'd like to get an idea of what we've done.

As for putting these PDFs into a database, it'd be pretty easy to create an application (perhaps a service) to monitor a directory and grab each PDF that pops up there after Capture runs, copy the information to the database, then either delete it or move it to its new home.

末骤雨初歇 2024-07-28 13:14:57

PSIGEN 是 Kofax 的绝佳替代品,功能齐全且价格合理。

Kofax 替代扫描和捕获应用程序

PSIGEN makes a great alternative to Kofax, is packed with features and reasonably priced.

Kofax Alternative Scanning and Capture Application

魂牵梦绕锁你心扉 2024-07-28 13:14:57

您希望您的 OCR 效果如何? 您是否需要所有内容都是人类可读的,还是只需要一些内容能够对文档进行分类(客户编号;文档类型;条形码...)。

http://www.irislink.com 是一家开发文档扫描和分类解决方案的公司。
他们的软件包含在多个品牌的多功能和消费扫描仪中。
该公司更致力于提取信息并使用它(将发票自动输入到会计软件中)。
我的经验是,它比 Kofax(我们都使用)更好地处理 OCR 文本(纠正单词等); 尽管 Kofax 可以进一步扩展以达到更好的水平(这意味着更多的设置工作和更多的维护)。

这两个软件对于处理文档的方式都非常有用。
如果您唯一的愿望是扫描文档; 转换为 pdf 并将其保存在网络共享上; 您可能有足够的钱购买一台好的扫描仪并使用附带的软件。
您可能还希望查看 tesseract 项目; 它是一个开源的ocr引擎,效果很好。

How well do you want your OCR to be? Do you need all content to be human readable or do you just needs some content to be able to classify document (customer nr; type of document; barcodes ...).

http://www.irislink.com is a company that develops solutions for scanning and classifying documents.
Their software is included in several brands of multifunctionals and consumer scanners.
The corporate is more aimed towards extracting info and using it (f.e. automatic input of invoices into accounting software).
My experience is that it handles the OCR'ed text better (correcting words etc.) than Kofax (we use both); though Kofax can be expanded more as to reach a better level (this means more setup work and more maintenance).

Both softwares are really usefull for how they treat documents.
If your only wish is to scan the documents; convert to pdf and save it on a network share; you may have enough buying a good scanner and using the included software.
You may also wish to check out the tesseract project; it's an open source ocr engine with good results.

抽个烟儿 2024-07-28 13:14:57

您可以尝试 ChronoScan,它通过 tesseract 提供免费的 OCR,并具有表单识别选项,并且免费用于非商业用途。

该软件正处于高级开发阶段,您可以通过论坛直接与开发人员交谈。

http://www.chronoscan.org
短视频阅读表格

You can try ChronoScan, it has free OCR through tesseract, and has Forms Recognition Options, and it's free for non-commercial use.

The software is in and advanced development stage, and you have a forum to talk directly with the developers.

http://www.chronoscan.org
Short video reading forms

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文