了解 MongoDB BSON 文档大小限制

发布于 2024-10-12 05:29:02 字数 356 浏览 8 评论 0原文

来自 MongoDB 权威指南:

大于 4MB 的文档(转换为 BSON 时)无法 保存到数据库。这是一个有点任意的限制(并且可能是 将来提出);它主要是为了防止糟糕的模式设计并确保 一致的性能。

我不明白这个限制,这是否意味着包含带有大量评论的博客文章且恰好大于 4MB 的文档不能存储为单个文档?

这也算嵌套文档吗?

如果我想要一个审计值更改的文档该怎么办? (它最终可能会增长,超过 4MB 限制。)

希望有人正确解释这一点。

我刚刚开始阅读 MongoDB(我正在学习的第一个 nosql 数据库)。

谢谢。

From MongoDB The Definitive Guide:

Documents larger than 4MB (when converted to BSON) cannot be
saved to the database. This is a somewhat arbitrary limit (and may be
raised in the future); it is mostly to prevent bad schema design and ensure
consistent performance.

I don't understand this limit, does this mean that A Document containing a Blog post with a lot of comments which just so happens to be larger than 4MB cannot be stored as a single document?

Also does this count the nested documents too?

What if I wanted a document which audits the changes to a value. (It will eventually may grow, exceeding 4MB limit.)

Hope someone explains this correctly.

I have just started reading about MongoDB (first nosql database I'm learning about).

Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

水溶 2024-10-19 05:29:02

首先,这实际上会在下一个版本中提高到 8MB16MB ...但我认为要正确看待这一点,来自 10gen 的 Eliot(开发了 MongoDB)最好的说法:

编辑: 尺寸已官方< /a>“提升”至16MB

因此,在您的博客示例中,4MB 是
实际上很多..例如,
“战争”的完整解压缩文本
the Worlds”只有 364k (html):
http://www.gutenberg.org/etext/36

如果你的博客文章那么长
这么多评论,我一个都没有
要去读它:)

对于引用引用,如果您专用 1MB
对他们来说,你可以轻松拥有更多
超过 10k(可能接近 20k)

所以除了真正奇怪的
情况下,它会很好用。并且在
例外情况或垃圾邮件,我真的
我认为你不会想要一个 20mb 的对象
反正。我认为引用上限为
15k 左右很有意义,不
无论什么性能。或者在
最不特殊的外壳(如果有的话)
发生了。

-艾略特

我认为你很难达到极限......随着时间的推移,如果你升级......你将不得不越来越少地担心。

限制的主要目的是让您不会用完服务器上的所有 RAM(因为您在查询时需要将文档的所有 MB 加载到 RAM 中) )

因此,限制是通用系统上正常可用 RAM 的一定百分比......这将逐年增长。

在 MongoDB 中存储文件的注意事项

如果您需要存储大于16MB的文档(或文件),您可以使用GridFS API 它将自动将数据分解成段并将它们流式传输回给您(从而避免大小限制/RAM 的问题。)

GridFS 不是将文件存储在单个文档中,而是将文件划分为多个部分或块,并将每个块存储为单独的文档。

GridFS 使用两个集合来存储文件。一个集合存储文件块,另一个集合存储文件元数据。

您可以使用此方法在数据库中存储图像、文件、视频等,就像在 SQL 数据库中一样。我什至用它来存储多千兆字节的视频文件。

First off, this actually is being raised in the next version to 8MB or 16MB ... but I think to put this into perspective, Eliot from 10gen (who developed MongoDB) puts it best:

EDIT: The size has been officially 'raised' to 16MB

So, on your blog example, 4MB is
actually a whole lot.. For example,
the full uncompresses text of "War of
the Worlds" is only 364k (html):
http://www.gutenberg.org/etext/36

If your blog post is that long with
that many comments, I for one am not
going to read it :)

For trackbacks, if you dedicated 1MB
to them, you could easily have more
than 10k (probably closer to 20k)

So except for truly bizarre
situations, it'll work great. And in
the exception case or spam, I really
don't think you'd want a 20mb object
anyway. I think capping trackbacks as
15k or so makes a lot of sense no
matter what for performance. Or at
least special casing if it ever
happens.

-Eliot

I think you'd be pretty hard pressed to reach the limit ... and over time, if you upgrade ... you'll have to worry less and less.

The main point of the limit is so you don't use up all the RAM on your server (as you need to load all MBs of the document into RAM when you query it.)

So the limit is some % of normal usable RAM on a common system ... which will keep growing year on year.

Note on Storing Files in MongoDB

If you need to store documents (or files) larger than 16MB you can use the GridFS API which will automatically break up the data into segments and stream them back to you (thus avoiding the issue with size limits/RAM.)

Instead of storing a file in a single document, GridFS divides the file into parts, or chunks, and stores each chunk as a separate document.

GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.

You can use this method to store images, files, videos, etc in the database much as you might in a SQL database. I have used this to even store multi gigabyte video files.

淡莣 2024-10-19 05:29:02

社区中的许多人希望对性能没有限制,并发出警告,请参阅此评论以获得合理的论点:
https://jira.mongodb.org/browse/SERVER-431?focusedCommentId=22283&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-22283

我的看法,主要开发人员对此问题很顽固,因为他们很早就认为这是一个重要的“功能”。他们不会很快改变它,因为有人质疑它,他们的感情受到了伤害。这是个性和政治损害开源社区产品的另一个例子,但这并不是一个真正严重的问题。

Many in the community would prefer no limit with warnings about performance, see this comment for a well reasoned argument:
https://jira.mongodb.org/browse/SERVER-431?focusedCommentId=22283&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-22283

My take, the lead developers are stubborn about this issue because they decided it was an important "feature" early on. They're not going to change it anytime soon because their feelings are hurt that anyone questioned it. Another example of personality and politics detracting from a product in open source communities but this is not really a crippling issue.

冰雪之触 2024-10-19 05:29:02

在这里为那些被 Google 引导到这里的人发布一个澄清答案。

文档大小包括文档中的所有内容,包括子文档、嵌套对象等。

因此文档:

{
  "_id": {},
  "na": [1, 2, 3],
  "naa": [
    { "w": 1, "v": 2, "b": [1, 2, 3] },
    { "w": 5, "b": 2, "h": [{ "d": 5, "g": 7 }, {}] }
  ]
}

最大大小为 16 MB。

子文档和嵌套对象都计入文档的大小。

To post a clarification answer here for those who get directed here by Google.

The document size includes everything in the document including the subdocuments, nested objects etc.

So a document of:

{
  "_id": {},
  "na": [1, 2, 3],
  "naa": [
    { "w": 1, "v": 2, "b": [1, 2, 3] },
    { "w": 5, "b": 2, "h": [{ "d": 5, "g": 7 }, {}] }
  ]
}

Has a maximum size of 16 MB.

Subdocuments and nested objects are all counted towards the size of the document.

绳情 2024-10-19 05:29:02

我还没有看到不涉及文档本身存储的大文件的限制问题。已经有多种数据库在存储/检索大文件方面非常有效;它们被称为操作系统。数据库作为操作系统之上的一层存在。如果您出于性能原因使用 NoSQL 解决方案,为什么要通过在应用程序和数据之间放置 DB 层来增加数据访问的额外处理开销?

JSON 是一种文本格式。因此,如果您通过 JSON 访问数据,并且您有二进制文件,则尤其如此,因为它们必须以 uuencode、十六进制或 Base 64 进行编码。转换路径可能看起来像

二进制文件 <> JSON(编码)<> BSON(编码)

将数据文件的路径(URL)放入文档中并将数据本身保留为二进制会更有效。

如果您确实想将这些长度未知的文件保留在数据库中,那么您最好将这些文件放入 GridFS 中,而不是在访问大文件时冒着终止并发的风险。

I have not yet seen a problem with the limit that did not involve large files stored within the document itself. There are already a variety of databases which are very efficient at storing/retrieving large files; they are called operating systems. The database exists as a layer over the operating system. If you are using a NoSQL solution for performance reasons, why would you want to add additional processing overhead to the access of your data by putting the DB layer between your application and your data?

JSON is a text format. So, if you are accessing your data through JSON, this is especially true if you have binary files because they have to be encoded in uuencode, hexadecimal, or Base 64. The conversion path might look like

binary file <> JSON (encoded) <> BSON (encoded)

It would be more efficient to put the path (URL) to the data file in your document and keep the data itself in binary.

If you really want to keep these files of unknown length in your DB, then you would probably be better off putting these in GridFS and not risking killing your concurrency when the large files are accessed.

终止放荡 2024-10-19 05:29:02

BSON 文档的嵌套深度:
MongoDB 支持 BSON 文档的嵌套不超过 100 层。

更多信息请访问

Nested Depth for BSON Documents:
MongoDB supports no more than 100 levels of nesting for BSON documents.

More more info vist

惯饮孤独 2024-10-19 05:29:02

根据 https: //www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1

如果您预计博客文章可能会超出 16Mb 文档限制,那么您应该将评论提取到一个单独的集合中,并从评论中引用博客文章并进行应用程序级联接。

// posts
[
  {
    _id: ObjectID('AAAA'),
    text: 'a post',
    ...
  }
]

// comments
[
  {
    text: 'a comment'
    post: ObjectID('AAAA')
  },
  {
    text: 'another comment'
    post: ObjectID('AAAA')
  }
]

According to https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1

If you expect that a blog post may exceed the 16Mb document limit, you should extract the comments into a separate collection and reference the blog post from the comment and do an application-level join.

// posts
[
  {
    _id: ObjectID('AAAA'),
    text: 'a post',
    ...
  }
]

// comments
[
  {
    text: 'a comment'
    post: ObjectID('AAAA')
  },
  {
    text: 'another comment'
    post: ObjectID('AAAA')
  }
]
请持续率性 2024-10-19 05:29:02

也许存储博客文章 ->非关系数据库中的注释 relation 并不是真正最好的设计。

无论如何,您可能应该将评论存储在博客文章的单独集合中。

[编辑]

请参阅下面的评论以进行进一步讨论。

Perhaps storing a blog post -> comments relation in a non-relational database is not really the best design.

You should probably store comments in a separate collection to blog posts anyway.

[edit]

See comments below for further discussion.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文