文档/图像数据库存储库设计问题

发布于 2024-07-07 04:10:47 字数 959 浏览 12 评论 0原文

问题：

我应该编写应用程序来直接访问数据库图像存储库，还是编写一个中间件来处理文档请求。

背景：

我有一个自定义文档成像和工作流程应用程序，当前存储了大约 1500 万个文档/文档图像（90%+ 单页、组 4 tiff，其余为 PDF、Word 和 Excel 文档）。图像存储库是一个商业的第三方应用程序，非常昂贵，而且坦率地说，开销太大。我只需要一个系统来存储和检索文档图像。

我正在考虑将映像直接移至 SQL Server 2005 数据库中。索引信息非常有限——基本上是 2 个索引字段。这是一个人寿保险保单管理系统，因此我使用保单号和系统范围内的唯一 ID 号对图像进行索引。还有其他索引值，但它们与图像数据分开存储和维护。这些索引值使我能够查找单个图像检索的唯一 id 值。

数据库服务器是一个双四核 Windows 2003 机器，带有托管 DB 文件的 SAN 驱动器。目前镜像库大小约为650GB。我还没有进行任何测试来了解转换后的数据库有多大。我并不是真正询问数据库设计 - 我正在与我们的 DBA 就这方面进行合作。如果情况发生变化，我会回来的:-)

当前要替换的系统显然是一个中间件应用程序，但它是一个分布在 3 个 Windows 服务器上的非常重量级的系统。如果我走这条路，它将是一个单一服务器系统。

我主要关心的是可扩展性和性能——性能非常重要。我有大约 100 个用户，未来几年使用量增长可能会很缓慢。大多数用户主要是阅读用户 - 他们不会经常向系统添加图像。我们有一个部门负责扫描和以其他方式将图像添加到存储库。我们还有一些其他应用程序接收文档（通过 ftp），并在收到文档时自动将其插入存储库，要么完整索引信息，要么作为用户审阅和索引的“批次”。

大多数（90%+）文档/图像都非常小，< 100K，大概< 50K，所以我相信将图像存储在数据库文件中将是最有效的，而不是使用 SQL 2008 并使用文件流。

原文

Question:

Should I write my application to directly access a database Image Repository or write a middleware piece to handle document requests.

Background:

I have a custom Document Imaging and Workflow application that currently stores about 15 million documents/document images (90%+ single page, group 4 tiffs, the rest PDF, Word and Excel documents). The image repository is a commercial, 3rd party application that is very expensive and frankly has too much overhead. I just need a system to store and retrieve document images.

I'm considering moving the imaging directly into a SQL Server 2005 database. The indexing information is very limited - basically 2 index fields. It's a life insurance policy administration system so I index images with a policy number and a system wide unique id number. There are other index values, but they're stored and maintained separately from the image data. Those index values give me the ability to look-up the unique id value for individual image retrieval.

The database server is a dual-quad core windows 2003 box with SAN drives hosting the DB files. The current image repository size is about 650GB. I haven't done any testing to see how large the converted database will be. I'm not really asking about the database design - I'm working with our DBAs on that aspect. If that changes, I'll be back :-)

The current system to be replaced is obviously a middleware application, but it's a very heavyweight system spread across 3 windows servers. If I go this route, it would be a single server system.

My primary concerns are scalabity and performace - heavily weighted toward performance. I have about 100 users, and usage growth will probably be slow for the next few years.
Most users are primarily read users - they don't add images to the system very often. We have a department that handles scanning and otherwise adding images to the repository. We also have a few other applications that receive documents (via ftp) and they insert them into the repository automatically as they are received, either will full index information or as "batches" that a user reviews and indexes.

Most (90%+) of the documents/images are very small, < 100K, probably < 50K, so I believe that storage of the images in the database file will be the most efficient rather than getting SQL 2008 and using a filestream.

分享到QQ

分享到微博