couchdb 适合在多个服务器上处理大量带有文件附件的文档吗?
我很想听听您对 couchdb 的想法,它是否可以处理我的用例。
我要做的是,我将拥有一个数据库,其中存储大小约为 20kb 的文档,每个文档带有 1-10MB 的附件。
使用我的架构,Couch 可以处理每台服务器 10TB 或更多的数据库吗?(在 4u 情况下,您可以放置 24 个 2TB 驱动器,每个 Couch 节点这是否太多了?,读取次数会非常少,所以我需要速度)
沙发能够复制所有带附件的文档
如何将所有数据拆分到多个服务器(例如 4 个节点)?它能处理那么多附件吗?
将
您在这里看到什么问题?
需要更多信息请询问:)
i would love to hear your thoughts about couchdb, and would it handle my use case.
What i will do, i will have database where i store documents in size about 20kb with attachment of 1-10MB for each.
will couch handle database 10TB or more per server with my schema?(in 4u case you can put 24 2TB drives is this too much per couch node?, there will be very less reads, so i down need speed)
will couch be able replicate all documents with attachments
how about splitting all data to multiple servers (for example to 4 nodes)? will it handle that much attachments?
what problems do you see here?
need more info please ask :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
![扫码二维码加入Web技术交流群](/public/img/jiaqun_03.jpg)
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不认为你会遇到 10TB 文件的物理限制,也就是说,我不认为 couch 有一些内置的“不能使用大于 X 的文件”,其中 X << 10TB。
然而。
最大的问题是文件压缩。为了回收空间,Couch 想要压缩文件。这实际上意味着复制文件。因此,至少在某些时候,10TB 需要是 20TB,因为它将实时数据复制到新副本中。
如果您主要是附加到文件,也就是说您只是添加新数据而不是更新或覆盖旧数据,那么这将不成问题,因为压缩不会给您带来那么多好处。如果您的数据基本上是静态的,那么我将构建该文件并最后一次压缩它,然后处理它。
Couch 有“第 3 方”分片解决方案,Lounge 很受欢迎。
当我采用沙发解决方案时,首先要考虑的是您的查询条件是什么。沙发最重要的是风景,真的。您正在寻找什么样的观点?坦率地说,如果您只是通过一些简单的键(文件名、日期或其他)来存储数据,那么您最好只使用文件系统和适当的目录结构。
因此,我想更多地了解您打算使用的观点,因为您不打算进行大量阅读。
附录:
您还没有提到您正在寻找什么样的查询。实际上,查询是设计组件,特别是对于 Couch DB 来说,因为在大型数据集上添加新查询变得越来越困难。
当您提到附件时,我假设您指的是 Couch DB 负载的附件(因为它可以处理附件)。
因此,总而言之,您可以轻松创建元数据文档来捕获您想要捕获的所有信息,并作为该文档的一部分添加到存储在文件系统上的实际文件的路径名。这将显着减少 Couch 文件的整体大小,从而使维护更快、更高效。当然,您会失去一些将所有内容都包含在单个文档中的“自包含”部分。
I don't think you will hit a physical limitation with a 10TB file, that is I don't think couch has some inbuilt "can't use files bigger than X" with X being < 10TB.
However.
The biggest issue is the file compaction. In order to reclaim space, Couch wants to compress the file. This effectively means copying the file. So, for some point at least, 10TB needs to be 20TB as it duplicates the live data in the new copy.
If you are mostly appending to the file, that is you are simply adding new data and not updating or overwriting old data, then this will be less of a problem, as compaction won't gain you quite that much. If your data is basically static, then I would build the file and compact it a final time and be doe with it.
There are "3rd party" sharding solution for Couch, Lounge is popular.
When I approach a couch solution the primary thing to consider is what your query criteria is. Couch is all about the views, really. What kind of views are you looking at? If you're simply storing data by some simple key (file name, the date, or whatever), you may well be better off simply using a file system, and an appropriate directory structure, frankly.
So I'd like to hear more about your views you plan to use since you don't intend to do a lot of reading.
Addenda:
You still haven't mentioned what kind of queries you're looking for. The queries are, effectively, THE design component, especially for a Couch DB since it gets more and more difficult to add new queries on large datasets.
When you said attachments, I assumed you meant attachments to the Couch DB payload (since it can handle attachments).
So, all that said, you could easily create meta-data document capturing all of the whatever information you want to capture, and as part of that document add a path name to the actual file stored on the file system. This will reduce the overall size of the Couch file dramatically, which makes the maintenance faster and more efficient. You lose some of the "Self contained" part of having it all in a single document, of course.