构建文档管理系统的想法
客户需要一个文档管理系统,我正在构建有关此的信息。
我了解共享点和共享点 露天,但在这种情况下,我正在评估从头开始构建它所需的信息,所以请不要建议使用其中任何一个(我们正在单独对它们进行评估,这都是关于开发,而不是实现现有的解决方案)。
以下是要求:
- 对我们当地政府特定的文件的法律管理提出非常具体的要求,但除此之外:
- 从最终用户的角度来看,类似于 google 文档的操作
- 需要 200 条以上的商店信息+ 最终用户(更新:实际上是 +700 个最终用户)
- 主要是办公文档、pdf、文本。 我已经从这个二进制文件中提取了纯文本。
- 没有维基,没有门户创建,几乎没有工作流程,但非常简单,只是文件管理
- 中央存储库,在整个公司共享,与活动目录集成
- 快速搜索
- 透明桌面集成
- Web界面
- 多平台,如果可能的
话这就是我拥有的东西在我的头顶:
- 存储:我知道共享点将所有内容保存在数据库中(露天也是如此?)。 恕我直言,那是一场噩梦。 我更喜欢将元数据放在数据库中,将文件放在磁盘上。
我正在考虑在这种情况下强制使用 ZFS & 利用他们的版本控制、快照和更新功能 缩放。 或者也许使用 git 作为存储后端(git 可以正常工作吗?)
那么,我可以在哪里了解有关如何在 ZFS 或任何常规文件系统中处理大量文档的更多信息? 例如,如何布局文件夹结构以方便管理& 快速响应,轻松备份等。
- 元数据:我认为这里是常规数据库,但想知道是否有更多优点将所有内容保存在 Lucene 中(我对 Lucene 有一些经验,但担心 Lucene 无法联合,对吧?)。
如果我使用搜索引擎作为元数据数据库,我可以节省一些工作(不需要第二次索引),但常规数据库引擎更标准。
- 技术:我可能会在 Django、PyLucene、Postgress 中构建它,并为 Windows 进行 shell 集成(我这样做没有问题)。
我将感谢有关如何正确实施此解决方案的任何提示或信息。
A customer need a document managment system and I'm building information about this.
I know about sharepoint & alfresco, but in this case I'm evaluating the necesary info for build it from scratch, so please refrain to suggest the use of any of these (we are doing the evaluation of them separately, this is all about develop, not implement a existent solution).
This are the requeriments:
- Have a very specific requeriment from legal managment of the documents that is specific to our local goverment, but apart from this:
- A operation similar to google docs from the point of view of the end-user
- Need store info from 200 + end-users (UPDATE: Are really +700 end-users)
- Mainly office documents, pdf, text. I already have the extraction of plain text from this binary files.
- No wiki, no portal creation, barely workflow but very simple, is only managment of files
- Central repository, share across the company, integrated with the Active directory
- Fast searching
- Transparent desktop integration
- Web interface
- Multiplataform, if possible
So, this is the things I have on top of my head:
- Storage: I know that sharepoint save all in the db (Alfresco too?). That is a nightmare, IMHO. I prefer put the metadata in a DB, and the files on disk.
I thinking about force the use of ZFS in this case & leverage their capabilities for versioning, snapshots & scaling. Or maybe use git as storage backend (git will work fine?)
So, where I can know more about how handle a large pool of documents, in ZFS or any regular file system? For example, how layout the folder structure to easy managemnt & fast responses, easy backup, etc.
- Metadata: I think in a regular DB here, but wonder if have more merit save everything in Lucene (I have some experience on Lucene, but worry because Lucene can't be federated, rigth?).
If I use a search engine as metadata database I can save some work (not need a second pass for indexing), but a regular database engine is more standard.
- Tech: I probably will build this in Django, PyLucene, Postgress, and do the shell integration for windows (I have not problems for do that).
I will apreciate any hints or info in how properly implement this solution.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
就我个人而言,我发现“类似于 Google 文档”和“透明桌面集成”要求有点模糊,恕我直言。 但从问题来看,您更关心后端和文档存储,并且更多地关注使用更开源的堆栈(与 AD 集成)?
无论如何,我个人使用 KnowledgeTree 作为我们的文档管理系统,其实现是所有文件都驻留在文件目录和数据库将跟踪路径、相应的元数据、访问日志和版本控制信息。 如果文档已更新,他们基本上会保留同一文件的多个版本 - 考虑到 Microsoft Office 文档大多是二进制的(直到 2003 年),我认为这是一个足够公平的想法。
您可能想了解他们当前拥有多少文档以及他们预计每天有多少文档流入该系统。 (或者从不同的角度来看,他们计划存储什么样的文档通常会给您提示您的服务器应该处理什么样的负载)
我的猜测是,您很可能可以通过设置本地文件系统和数据库存储元数据,除非您确定系统每天需要处理大量文档(想象一下 Flickr 的文档;))。
Personally I find the "similar to Google Docs" and "Transparent desktop integration" requirements a bit vague, IMHO. But judging from the question you are more concerned about the backend and document storage, and looking more on using a more open source stack (with integration with AD)?
Anyway, personally I'm using KnowledgeTree as our Document Management System and their implementation is that all files resides on a file directory and the database will keep track on the path, corresponding metadata, access logs and versioning information. They basically kept several versions of the same file if a document has been updated - which I think was a fair enough idea implementation wise considering Microsoft Office documents are mostly binary (up until 2003).
You may want to understand how much documents they currently have and how many documents that they are sort of expecting to flow into this system on a daily basis. (Or from a different point of view, what kind of documents they are planning to store would generally give you hints on what kind of load your server is supposed to handle)
My guess is that most likely you could get away with the setup of having local filesystems and database storing metadata stuff unless you are sure that the system is expected to be handling a massive load of documents on a daily basis (imagine being Flickr for documents ;) ).
SharePoint 和 Alfresco 是您可以进行大量自定义的平台,因此即使使用它们也确实意味着您正在构建一些东西。
SharePoint 和 Alfresco 是
SharePoint 默认将 blob 存储在数据库中,但有办法将它们放在文件系统上
如果您自己制作,请支持 Office 应用程序用于与 SharePoint 和 Alfresco 通信的首页扩展,并使用正确的标题来提供文档,以告诉 IE 启动应用程序。 通过这种方式,您可以获得与 SharePoint 相同的与 Office 应用程序的集成(用户非常喜欢此功能)——它只是一个简单的 HTTP 协议
如果您使用 SharePoint,我的公司将作为 免费文档预览器,可以查看 PDF,很快就会有 Office 文档。 我们销售底层技术,但仅限 Windows。
我喜欢 Django,并将其用于所有个人项目,但我真的认为 .NET 和 Java 将为您需要的东西提供更多第三方支持,并且如果您愿意,您的大部分代码将可以移植到 SharePoint 或 Alfresco稍后决定走这条路。
编辑:根据要求提供有关#3的更多信息
http://blogs.msdn.com/mikefitz/archive/2005/03/14/395112.aspx http://blogs.msdn.com/st Cheng/archive/2008/12/17 /wss-use-rpc-protocol-to-access-wss-v3-site.aspx
官方文档:
http://msdn.microsoft.com/en-us/library/ms442469。 ASPX
SharePoint and Alfresco are platforms where you can do quite a bit of customization, so even using them really means you are building something.
SharePoint stores blobs in the DB by default, but has ways to put them on a filesystem
If you make it yourself, support the frontpage extensions that Office apps use to communicate with SharePoint and Alfresco, and serve the documents with the right headers that tell IE to start the app. This way you get the same integration to Office apps that SharePoint has (users really love this feature) -- it's just a simple HTTP protocol
If you go with SharePoint, my company as a free document previewer that can view PDF and soon will have Office docs. We sell the underlying tech, but it's Windows only.
I love Django, and use it for all personal projects, but I really think .NET and Java will have more third-party support for the things you need, and much of your code will be portable to SharePoint or Alfresco if you decide to go that way later.
EDIT: More info on #3 as requested
http://blogs.msdn.com/mikefitz/archive/2005/03/14/395112.aspx http://blogs.msdn.com/stcheng/archive/2008/12/17/wss-use-rpc-protocol-to-access-wss-v3-site.aspx
Official docs:
http://msdn.microsoft.com/en-us/library/ms442469.aspx
露天应该是一个很好的解决方案。 除了政府事务之外,它支持您的每一项要求。
但如果你是“从头开始”构建,也许至少可以从中汲取想法?
存储:文件内容保存在文件系统上。 易于管理、存储、备份等。 这些文件不保留名称,只是它们的内容以二进制格式保存,并且文件被命名为散列(我猜是内容的散列?)
元数据:放置在数据库中。 快速访问、更改、更新等。 每个节点都有属性 - 名称、标题、描述、日期、审核信息,无论您需要什么。 它只是信息,全部保存到“属性”表中。
搜索:Alfresco 使用 Solr 进行搜索,以前是 Lucene。 我有相当大的安装,如果你把 lucene 索引放在 SSD 上,它的速度会非常快。 (Lucene 无论如何都很快)。 它对文件内容和属性进行索引 - 因此您可以非常快速地获取节点 ID。
Alfresco 实施了 CIFS,以及 webdav、ftp 等。 关键是,您可以将其作为文件夹或磁盘安装到用户的桌面上。
Web 界面在那里,中央仓库管理在那里,所有的要求都在那里。 由于它是开源的,您可以获得一些源代码并在您的项目中使用它。 不过,如果您感觉还好的话,加入 Alfresco 社区并回馈一点会更好。
Alfresco should be a great solution here. It supports every single one of your list of requirements except for the government thing.
But if you are building "from scratch", maybe take the ideas from it, at least?
Storage: the file content is saved on the filesystem. Easy to manage, store, backup and stuff. The files do not keep the names though, just their content is saved in binary format and the file is named as hashes (I guess hash of the content?)
Metadata: is placed in the database. Fast to access, change, update and stuff. Each node has properties - those are name, title, descripion, dates, audit info, whatever you need. It is just info and it is all saved into the "properties" table.
Search: Alfresco uses Solr for search, it used to be Lucene. I had pretty big installations, and if you put lucene index on the SSD, it's blazing fast. (lucene is fast anyway). It indexes both file content and properties - so you get to the node ID very fast.
Alfresco has CIFS implemented, as well as webdav, ftp and whatnot. The point is, you can just mount it to the users' desktops as folders or disks.
Web interface is there, central repo mgmt is there, all the reqs. And since it is open source, you could get some of that source and use it in your project. Although it would be much better to take Alfresco Community and just contribute back a bit if you feel okay.
您是否正在尝试构建文档管理系统? 露天& 共享点? 露天& SharePoint 是项目管理解决方案而不是文档管理解决方案。 Alfresco 是某种 DMS 解决方案,但并不是它的优点。 是的! 对于项目管理解决方案来说,它是一个很好的软件。
我建议你购买文档管理解决方案,这是对文档的合法管理,也是针对当地政府的。 有一些文档管理系统提供商,例如 Laserfiche 和 Laserfiche。 OnBase,他们的工作类似于Google Docs。 您可以为公司或企业的每位员工创建一个帐户。
是的,所有文档均为 MS Office 格式,如 Ms-Word、Ms-excel、PDF 和 PDF。 PPT
文档管理系统的工作流程非常高效且易于处理是
的,通过使用 DMS,您可以在几分钟内轻松找到文件(Laserfiche 软件需要 10 分钟才能提取文件或文件夹)
Laserfiche DMs 是网络界面软件。 您可以登录该软件并轻松地从不同位置访问文件或文件夹
存储
在 DMS 系统中,所有数据都受到保护并存储在云存储中。 您只需登录您的帐户即可轻松访问该文档。 如果发生丢失或任何变形,您可以从公司获取丢失的数据。
元数据
DMs系统是常规数据库引擎,所有业务数据都定期保存在云存储中
技术
无需构建任何东西; 您只需要购买DMS软件。 我向您推荐 Laserfiche 因为我们正在使用他们的服务
Are you trying to build the Document management system? Alfresco & SharePoint? Alfresco & SharePoint are the project management solutions not the document management solutions. Alfresco is some kind of DMS solution but not the good in that. Yes! For the project management solution, it is a good software.
I’ll suggest you buy the document management solution which is legal management fo the documents and also specific to the local government. There are some document management system providers like Laserfiche & OnBase, their work is similar to the Google Docs. You can create an account for every employee of the firm or the business.
Yes all the documents are in the MS office format like Ms-Word, Ms-excel, PDF & PPT
Workflow with the Document Management system is much efficient and easy to handle
Yes in by using DMS you can easily find the file within minutes (Laserfiche Software take 10 mints to extract the file or folder)
Laserfiche DMs is web interface software. You can login to the software and reach the file or folder from different locations easily
Storage
In DMS system all the data is secured and stored in a cloud storage. You can easily reach the document just by Logging in to your account. In case of lost or any misshapen, you can get the lost data from the company.
Meta Data
DMs system is the regular database engine as all the business data is secured in the cloud storage on the regular basis
Tech
There is no need to build anything; you only need to purchase the DMS software. I recommend you the Laserfiche because we are using their services