文档管理系统-数据库设计
我正在用 Java 编写自己的文档管理系统 (DMS)(可用的系统不支持满足我的需求)。
这些文件应按照合格的 DublinCore 元数据标准进行描述。在我看来,最简单的方法是将键值对打包到具有 XML 表示形式的 RDF 模型中。
要存储所有文档的元数据,我有两个想法(文档文件将存储在文件系统中):
- 将所有文档的所有元数据存储在单个 XML 文件中
- 为每个文档创建一个 XML 文件 文档并将其存储在文件系统或 RDBMS(如 Java 的 H2 数据库引擎)中,键值数据库无法解决此问题,因为一个文档的键不是唯一的。
由于(许多)文档相互链接,第一种方法可能更适合分析数据,但第二种方法可能要快得多。
您会推荐哪种解决方案?或者还有更好的解决方案吗?
斯特凡
I'm writing my own Document Management System (DMS) in Java (the ones available don't satisfy my needs).
The documents shall be described by the Qualified DublinCore Metadata Standard. The easiest way to do this, in my opinion is do pack the key-value pairs in a RDF model with a XML representation.
To store the metadata for all documents i have two ideas (the document files will be stored in the filesystem):
- Store all metadata of all documents in a single XML file
- Make a XML file for each document and store it either in the filesystem or in a RDBMS (like the H2 database engine for Java), a key-value database won't solve this because the keys for one document are not unique.
Since (many) documents are linked among each other the first approach may would be better for analysing the data, but the second approach may be much faster.
Which solution you would recommend? Or are there any better solutions?
Stefan
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我不知道你的分析是如何工作的,但是如果你需要内存中的完整图形来进行分析,那么使用varie1(将所有文档的所有元数据存储在单个XML文件中),因为你不会得到任何好处(但只会得到好处)在这种情况下,来自变体 2 的额外工作)。
添加
如果变体 2 的额外工作不是太多,那么我推荐变体 2,因为它更可扩展。
I don't know how your analysis work, but if you need the complete graph in memory to do your analysis then use variante 1 (Store all metadata of all documents in a single XML file), because you will get no gain (but only extra work) from variante 2 in this scenario.
added
If this extra work for variant 2 is not to much, then I recomend variant 2, because it can be more calable.
您是否考虑过使用 MongoDB 和 GridFS? http://www.mongodb.org/display/DOCS/GridFS+Specification
您可以将文档以二进制形式直接存储在 MongoDB 中,甚至可以以您想要的任何格式存储该特定文件的关联元数据。即使文档具有相同的名称,它也能够存储它们,并且会生成自己的唯一 ID。
Have you considered using MongoDB and GridFS? http://www.mongodb.org/display/DOCS/GridFS+Specification
You can store your documents directly in MongoDB as binary and even store the associated metadata for that particular file in any format you want. It would have the ability to store documents even if they have the same name and it will generate it's own unique IDs.
顺便说一句:即使它不属于您的问题:请查看 JCR (Java Content存储库)实现如 JackRabbit。您可以使用它来存储文档,也许还可以存储元数据。
BTW: even if it does not belong to your question: have a look at a JCR (Java Content Repository) implementation like JackRabbit. You could use it to store your documents and maybe your meta data too.
我会研究像 Couch DB 这样的 NO SQL 文档解决方案,看看它是否可以帮助您。
我不喜欢文件系统解决方案;那里没有任何抽象可以帮助你。
I'd look into a NO SQL document solution like Couch DB to see if it could help you.
I don't like the file system solution; there's no abstraction whatsoever to help you there.
如果您总是访问所有文档,那么您的方法不会比其他方法慢。但我推荐第二种方法。在分析数据时,您需要读取所有文档,因此无论它们在不同文件中还是在一个文件中都没有区别......
If your are always accessing all documents, none of your approaches would be slower than the other. But I would recommend the second approach. When it comes to analyzing the data, you'll need to read all documents, so there is no difference if they are in different files or in one file...