当前位置：文江博客话题详情

节省 mongodb 空间的技巧

发布于 2024-10-05 07:57:27 字数 67 浏览 7 评论 0原文

各种 mongodb 服务按磁盘使用情况进行计量。使用 mongodb 时有哪些节省空间的技巧？

谢谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

玩套路吗 2024-10-12 07:57:27

这个问题确实比较模糊。有些事情可能适用于您，也可能不适用于您（排名不分先后）：

缩短冗长的字段名称

最好用一个示例来说明：

{
    surname: "Smith",
    forename: "John",
    location: { grid_e: 100.02, grid_n: 450.08 }
}

可以通过删除各个字段名称中不必要的冗长内容来缩短前面的文档。

{
    sn: "Smith",
    fn: "John",
    loc: { e: 100.02, n: 450.08 }
}

这将为您节省非常小的空间，但它将乘以每个文档的大小（字段数量）和文档数量（如果您有数百万个文档，则可能会变得很大）。这是一篇精彩帖子讨论这种方法的优点和缺点。

上限集合

上限集合允许您指定要存储的文档数量的限制。它以先进先出的方式工作（最旧的文档将被丢弃）。如果您正在记录日志并希望存储最新的 x 文档，但旧的文档没有相关性，则这尤其适用。

使用上限集合有一些注意事项。有关完整详细信息，请参阅 MongoDB 文档。

考虑文档的关系

文档可以具有嵌入文档或与其他文档（在其他集合中）外键样式的关系。讨论了每种方法的优缺点经常，但最终由您选择适合您的方法。

以博客为例，每个博客文章可能都有一个作者。您可以将此作者信息嵌入到每个帖子中，也可以选择将它们放入自己的authors 或users 集合中。后一种方法可以节省空间，特别是如果许多用户经常发表许多帖子（而不是一两个帖子）。请注意，由于没有连接，您将产生额外的数据库调用。

编辑：扩展关系

除了嵌入文档之外，还可以通过多种方式完成文档之间的关系。您可以像这样使用相关文档的 ID（重复使用上面的博客示例）：

{
    _id: <whatever>,
    title: "Document Relationships in MongoDB",
    body: "bla bla bla bla",
    // ...
    user_id: <id of the user document>
}

并且在 users 集合中，该相关文档将存在：

{
    _id: <whatever>,
    name: "Mark Embling",
    email: "[email protected]",
    ///...
}

这可能是最简单的关系方法（除了嵌入它们），但完全由您在自己的代码中维护它。您需要在需要时调用以获取相关用户，并在必要时更新它。也就是说，我认为这种方法没有任何问题，并且已经在一些场合看到过它的使用。

类似的方法是使用 DBRef。这是描述上述关系的更正式的方法。您不只是将其他文档的 ID 放入其中，而是指定一个 DBRef，它是对另一个文档的正式引用。我希望这是有道理的。我在这里描述的两种方法都在 mongodb 文档中详细讨论。值得注意的是，手动引用将比 DBRef 占用（稍微）更少的空间，因为 DBRef 保存额外的（可能是冗余的）信息，例如引用了哪个集合。它的优点是受到许多驱动程序库的本机支持，因此它使您的生活变得更加轻松。

最终，什么方法有效且相关取决于您想要做什么。考虑各种选择、权衡并决定是否应该这样做。并进行实验。

This question is really rather vague. Some things which may or may not apply to you (in no particular order):

Shorten verbose field names

This is best illustrated with an example:

{
    surname: "Smith",
    forename: "John",
    location: { grid_e: 100.02, grid_n: 450.08 }
}

The previous document could be shortened by removing unnecessary wordiness in the various field names.

{
    sn: "Smith",
    fn: "John",
    loc: { e: 100.02, n: 450.08 }
}

This will give you a very tiny saving in space, but it will be multiplied by the size of each document (number of fields) and the number of documents (could become significant if you have millions). Here is a superb post discussing the benefits and drawbacks of this method.

Capped Collections

Capped collections allow you to specify a limit to how many documents you wish to store. It works in a first-in-first-out manner (oldest documents will be discarded). This is particularly applicable if you are logging and wish to store the most recent x documents, but old ones have no relevance.

There are some caveats to the use of capped collections. See the MongoDB docs for full details.

Consider your documents' relationships

Documents can either have embedded documents or relationships to other documents (in other collections) foreign-key style. The pros and cons of each approach are discussed frequently, but ultimately it is for you to choose which approach works for you.

Taking the example of a blog, it may be that each blog post has an author. You could either embed this author information within each post, or you might choose to put them in their own authors or users collection. The latter approach would save space, particularly if many users often make many posts (rather than just one or two). Be aware that you will incur an extra database call since there are no joins.

Edit: Expanding on Relationships

Relationships between documents can be done in a couple of ways in addition to embedding them. You could just use the ID of the related document like so (reusing the blog example above):

{
    _id: <whatever>,
    title: "Document Relationships in MongoDB",
    body: "bla bla bla bla",
    // ...
    user_id: <id of the user document>
}

And in the users collection, that related document would exist:

{
    _id: <whatever>,
    name: "Mark Embling",
    email: "[email protected]",
    ///...
}

This is probably the simplest possible approach to relationships (besides embedding them), but it will be up to you to maintain it within your own code entirely. You will need to make the call to grab the related user when you need it, and to update it whenever that might be necessary. That said, I see nothing wrong with this approach, and have seen it used on a few occasions.

A similar approach is to use DBRef. This is a more formal method for describing a relationship like the above. Instead of just putting the ID of the other document in, you specify a DBRef which is a sort of reference to another document, formalized. I hope that makes sense. Both approaches I have described here are discussed in detail in the mongodb docs. It is worth noting that manual references will take up (slightly) less space than a DBRef, since a DBRef holds extra (possibly redundant) information, such as which collection is referred to. It has the advantage of being supported natively by many of the driver libs though, so it makes your life that little bit easier.

Ultimately, what methods work and are relevant depend on what it is you are trying to do. Consider the options, the tradeoff and make the call as to whether its something you should do. And experiment.

回复收藏 0 原文