节省 mongodb 空间的技巧
各种 mongodb 服务按磁盘使用情况进行计量。使用 mongodb 时有哪些节省空间的技巧?
谢谢。
Various mongodb services meters by disk use. What are some tips for saving space when working with mongodb?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这个问题确实比较模糊。有些事情可能适用于您,也可能不适用于您(排名不分先后):
缩短冗长的字段名称
最好用一个示例来说明:
可以通过删除各个字段名称中不必要的冗长内容来缩短前面的文档。
这将为您节省非常小的空间,但它将乘以每个文档的大小(字段数量)和文档数量(如果您有数百万个文档,则可能会变得很大)。这是一篇精彩帖子讨论这种方法的优点和缺点。
上限集合
上限集合允许您指定要存储的文档数量的限制。它以先进先出的方式工作(最旧的文档将被丢弃)。如果您正在记录日志并希望存储最新的
x
文档,但旧的文档没有相关性,则这尤其适用。使用上限集合有一些注意事项。有关完整详细信息,请参阅 MongoDB 文档。
考虑文档的关系
文档可以具有嵌入文档或与其他文档(在其他集合中)外键样式的关系。 讨论了每种方法的优缺点 经常,但最终由您选择适合您的方法。
以博客为例,每个博客文章可能都有一个作者。您可以将此作者信息嵌入到每个帖子中,也可以选择将它们放入自己的
authors
或users
集合中。后一种方法可以节省空间,特别是如果许多用户经常发表许多帖子(而不是一两个帖子)。请注意,由于没有连接,您将产生额外的数据库调用。编辑:扩展关系
除了嵌入文档之外,还可以通过多种方式完成文档之间的关系。您可以像这样使用相关文档的 ID(重复使用上面的博客示例):
并且在
users
集合中,该相关文档将存在:这可能是最简单的关系方法(除了嵌入它们),但完全由您在自己的代码中维护它。您需要在需要时调用以获取相关用户,并在必要时更新它。也就是说,我认为这种方法没有任何问题,并且已经在一些场合看到过它的使用。
类似的方法是使用 DBRef。这是描述上述关系的更正式的方法。您不只是将其他文档的 ID 放入其中,而是指定一个 DBRef,它是对另一个文档的正式引用。我希望这是有道理的。我在这里描述的两种方法都在 mongodb 文档中详细讨论。值得注意的是,手动引用将比 DBRef 占用(稍微)更少的空间,因为 DBRef 保存额外的(可能是冗余的)信息,例如引用了哪个集合。它的优点是受到许多驱动程序库的本机支持,因此它使您的生活变得更加轻松。
最终,什么方法有效且相关取决于您想要做什么。考虑各种选择、权衡并决定是否应该这样做。并进行实验。
This question is really rather vague. Some things which may or may not apply to you (in no particular order):
Shorten verbose field names
This is best illustrated with an example:
The previous document could be shortened by removing unnecessary wordiness in the various field names.
This will give you a very tiny saving in space, but it will be multiplied by the size of each document (number of fields) and the number of documents (could become significant if you have millions). Here is a superb post discussing the benefits and drawbacks of this method.
Capped Collections
Capped collections allow you to specify a limit to how many documents you wish to store. It works in a first-in-first-out manner (oldest documents will be discarded). This is particularly applicable if you are logging and wish to store the most recent
x
documents, but old ones have no relevance.There are some caveats to the use of capped collections. See the MongoDB docs for full details.
Consider your documents' relationships
Documents can either have embedded documents or relationships to other documents (in other collections) foreign-key style. The pros and cons of each approach are discussed frequently, but ultimately it is for you to choose which approach works for you.
Taking the example of a blog, it may be that each blog post has an author. You could either embed this author information within each post, or you might choose to put them in their own
authors
orusers
collection. The latter approach would save space, particularly if many users often make many posts (rather than just one or two). Be aware that you will incur an extra database call since there are no joins.Edit: Expanding on Relationships
Relationships between documents can be done in a couple of ways in addition to embedding them. You could just use the ID of the related document like so (reusing the blog example above):
And in the
users
collection, that related document would exist:This is probably the simplest possible approach to relationships (besides embedding them), but it will be up to you to maintain it within your own code entirely. You will need to make the call to grab the related user when you need it, and to update it whenever that might be necessary. That said, I see nothing wrong with this approach, and have seen it used on a few occasions.
A similar approach is to use DBRef. This is a more formal method for describing a relationship like the above. Instead of just putting the ID of the other document in, you specify a DBRef which is a sort of reference to another document, formalized. I hope that makes sense. Both approaches I have described here are discussed in detail in the mongodb docs. It is worth noting that manual references will take up (slightly) less space than a DBRef, since a DBRef holds extra (possibly redundant) information, such as which collection is referred to. It has the advantage of being supported natively by many of the driver libs though, so it makes your life that little bit easier.
Ultimately, what methods work and are relevant depend on what it is you are trying to do. Consider the options, the tradeoff and make the call as to whether its something you should do. And experiment.
如果存储不需要可搜索的大量数据,请尽量避免重复数据,并可能使用某种形式的压缩。
Try to avoid duplicating data and possibly use some form of compression if storing large amounts of data that does not need to be searchable.
我认为好的方法是使用一个文档来存储相关数据
,例如,如果您有用户集合,您可以向每个用户提供文档,并在该文档中植入其他内容,例如头像或 acl 等
i think good way is use one document for related data
for example if you have user collection you can give document to each user and in this document implant other things like avatar or acl and other things