MOSS 2007 作为 PDF 文档的大型存储库
实际上我尝试研究基于MOSS2007建立PDF文档存储库的可能性。没有工作流程,只有大量文档和对文档库的访问(也可搜索)。
问题是构建这样一个解决方案的可行性,假设: - PDF 文档一旦放入文档库并由外部网络提供,最多可达一百万(!);
提议的农场是: - 1x 前端 Web 服务器 - 2x 索引服务器 - 1x 查询服务器 - 1x MS SQL 服务器 - 2x 12TB 存储
是否可以为如此大量的文件提供合理的性能? 有人必须处理类似类型的数字图书馆解决方案的构建吗?
Actually I try to examine the possibility of building a repository of PDF documents based on MOSS2007. No workflow, only huge amount of documents and access to document libraries (also searchable).
The question is feasibility of building such a solution, assuming that:
- PDF documents can be up to one million (!) once thrown into document libraries and provided by the web on the outside;
The farm is what is proposed:
- 1x Front Web Server
- 2x Index Server
- 1x Query Server
- 1x MS SQL Server
- 2x 12TB Storage
Is it possible to provide reasonable performance with such a huge number of files?
Has anyone had to deal with with the building of a similar type solutions of Digital Library?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
克里斯的回答并不完全正确。列表中可以包含超过 2000 个项目,只要它们不全部显示在单个视图中即可。
在文档库(您将在其中存储 PDF 文档)中,您最多可以拥有 500 万个项目。只要您找到与 << 一起使用的文件夹结构/视图即可2000 个项目/视图限制。
所以问题是,您能以对您有意义的方式分离文档吗?如果是这样,我就不会担心可扩展性。
我在这里提到的数字均来自这篇 technet 文章。
TL;DR 版本:http://www.sharepointkings .com/2009/01/limitation-and-upper-boundaries-of_28.html
Chris's answer is not exactly correct. You can have a lot more than 2000 items in a list, as long as they are not all displayed in a single view.
In a document library (where you would store your PDF documents) you can have up to 5 million items. As long as you find a folder structure / views that work with the < 2000 items / view constraint.
So the question is, can you separate your documents in a way that makes sense to you? If so, I wouldn't worry about scalability.
The numbers I mention here all come from this technet article.
The TL;DR version : http://www.sharepointkings.com/2009/01/limitation-and-upper-boundaries-of_28.html
到目前为止我还没有看到提到的是文件大小。
假设每个 PDF 的平均大小为 1MB,您将先遇到内容数据库大小限制,然后再遇到上述关于 # 个项目/范围的限制。
容量规划完全取决于妥协 - 如果您想要存储 100 万个文档,您将需要考虑将文件拆分到多个内容数据库 - 以及多个网站集。
虽然在某些边缘情况下,Microsoft 支持 SharePoint 2010 中每个数据库最多 1TB 的内容(对于静态存储库),但我不知道 SharePoint 2007 是否有类似的支持方案
。至于 FileStream(我假设您在这里指的是 RBS),如果没有经过仔细考虑,我不会在生产场景中推荐它。我主要将其视为一种节省成本的方法,并请记住,它会显着增加备份和灾难恢复策略的复杂性。
希望有帮助。
Something that I haven't seen mentioned so far is file size.
Assuming that each PDF is on average 1MB in size you will run into content database sizing limitations way before the aforementioned limitations around # items / scope.
Capacity planning is all about compromise - if you want to store 1 million documents you will need to think about splitting the files across multiple content databases - and therefore multiple site collections.
Whilst in some fringe cases Microsoft support up to 1TB of content per database in SharePoint 2010 (for static repositories), I am not aware of a similar support scenario for SharePoint 2007.
As regards FileStream (I assume you are referring to RBS here), I would not recommend it in a production scenario without very careful consideration. I would view it primarily as a cost saver and bear in mind that it can add significant complexity to your backup and DR strategy.
Hope that helps.
这里发生了一些事情,没有人可以用您向我们提供的事实来回答您的所有问题。
首先,只要您遵循上述有关在文件夹中存储项目的建议,您建议的文档数量就可以由单个文档库(或多个文档库)处理。这很关键。
我们无法告诉您的是您是否有足够的硬件。当然,很容易知道您是否有足够的存储空间,但获得适量的 SP 硬件取决于您的用例和其他因素:
最后,您提到您需要 2 个用于 MOSS2007 的索引服务器。虽然 MOSS2007 中的某些场景依赖于多个索引框,但它们并不像您想象的那样冗余。更有可能的是,您有一个索引框和多个查询框(或同时也是查询服务器的 Web 服务器)。
There are a couple things going on here and no one can answers all your questions with the facts that you have given us.
First up, the amount of documents you propose can be handled by a single document library (or several document libraries) so long as you follow the advice above about storing items in folders. That is critical.
What we can't tell you is if you have enough hardware. Sure it is pretty easy to know if you have enough storage but getting the right amount of SP hardware is dependant on your use cases and other factors:
Lastly, you mention that you want 2 index servers for MOSS2007. While there are scenarios in MOSS2007 that rely on multiple index boxes they aren't redundant as you would think. More likely you'd have a single index box and multiple query boxes (or web servers that are also query servers).
如果您将超过 2000 个项目放入一个列表中,您将遇到性能问题。解决此问题的一种策略是使用文件夹作为存储桶,每个存储桶中最多包含 2000 个项目。
考虑分成几个网站集也是明智的做法,这样所有这些文档就不在一个 SQL 数据库中。
更新和整合:
正如 Benjamin J Athawes 指出的那样,内容大小也是需要考虑的重要因素。详情请参阅他的回答。
nRouteNPingMe 提出将 2010 年视为解决方案,因为这个问题已在新版本中得到解决。如果你不拘泥于2007年,我会考虑走这条路。
You will run into performance issues if you put more than 2000 items in a single list. One strategy to get around this problem is to use folders as buckets with a limit of 2000 items in each one.
It would also be wise to consider separating into several Site Collections so that all of these documents are not in a single SQL database.
Updating and consolidating:
As Benjamin J Athawes points out, content sizing is also an important factor to consider. See his answer for details.
nRouteNPingMe offers up considering 2010 as a solution since this has been addressed in the newer version. If you're not tied to 2007, I would consider taking this route.