了解 MongoDB BSON 文档大小限制
来自 MongoDB 权威指南:
大于 4MB 的文档(转换为 BSON 时)无法 保存到数据库。这是一个有点任意的限制(并且可能是 将来提出);它主要是为了防止糟糕的模式设计并确保 一致的性能。
我不明白这个限制,这是否意味着包含带有大量评论的博客文章且恰好大于 4MB 的文档不能存储为单个文档?
这也算嵌套文档吗?
如果我想要一个审计值更改的文档该怎么办? (它最终可能会增长,超过 4MB 限制。)
希望有人正确解释这一点。
我刚刚开始阅读 MongoDB(我正在学习的第一个 nosql 数据库)。
谢谢。
From MongoDB The Definitive Guide:
Documents larger than 4MB (when converted to BSON) cannot be
saved to the database. This is a somewhat arbitrary limit (and may be
raised in the future); it is mostly to prevent bad schema design and ensure
consistent performance.
I don't understand this limit, does this mean that A Document containing a Blog post with a lot of comments which just so happens to be larger than 4MB cannot be stored as a single document?
Also does this count the nested documents too?
What if I wanted a document which audits the changes to a value. (It will eventually may grow, exceeding 4MB limit.)
Hope someone explains this correctly.
I have just started reading about MongoDB (first nosql database I'm learning about).
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
首先,这实际上会在下一个版本中提高到
8MB
或16MB
...但我认为要正确看待这一点,来自 10gen 的 Eliot(开发了 MongoDB)最好的说法:编辑: 尺寸已官方< /a>“提升”至
16MB
我认为你很难达到极限......随着时间的推移,如果你升级......你将不得不越来越少地担心。
限制的主要目的是让您不会用完服务器上的所有 RAM(因为您在查询时需要将文档的所有
MB
加载到 RAM 中) )因此,限制是通用系统上正常可用 RAM 的一定百分比......这将逐年增长。
在 MongoDB 中存储文件的注意事项
如果您需要存储大于
16MB
的文档(或文件),您可以使用GridFS API 它将自动将数据分解成段并将它们流式传输回给您(从而避免大小限制/RAM 的问题。)您可以使用此方法在数据库中存储图像、文件、视频等,就像在 SQL 数据库中一样。我什至用它来存储多千兆字节的视频文件。
First off, this actually is being raised in the next version to
8MB
or16MB
... but I think to put this into perspective, Eliot from 10gen (who developed MongoDB) puts it best:EDIT: The size has been officially 'raised' to
16MB
I think you'd be pretty hard pressed to reach the limit ... and over time, if you upgrade ... you'll have to worry less and less.
The main point of the limit is so you don't use up all the RAM on your server (as you need to load all
MB
s of the document into RAM when you query it.)So the limit is some % of normal usable RAM on a common system ... which will keep growing year on year.
Note on Storing Files in MongoDB
If you need to store documents (or files) larger than
16MB
you can use the GridFS API which will automatically break up the data into segments and stream them back to you (thus avoiding the issue with size limits/RAM.)You can use this method to store images, files, videos, etc in the database much as you might in a SQL database. I have used this to even store multi gigabyte video files.
社区中的许多人希望对性能没有限制,并发出警告,请参阅此评论以获得合理的论点:
https://jira.mongodb.org/browse/SERVER-431?focusedCommentId=22283&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-22283
我的看法,主要开发人员对此问题很顽固,因为他们很早就认为这是一个重要的“功能”。他们不会很快改变它,因为有人质疑它,他们的感情受到了伤害。这是个性和政治损害开源社区产品的另一个例子,但这并不是一个真正严重的问题。
Many in the community would prefer no limit with warnings about performance, see this comment for a well reasoned argument:
https://jira.mongodb.org/browse/SERVER-431?focusedCommentId=22283&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-22283
My take, the lead developers are stubborn about this issue because they decided it was an important "feature" early on. They're not going to change it anytime soon because their feelings are hurt that anyone questioned it. Another example of personality and politics detracting from a product in open source communities but this is not really a crippling issue.
在这里为那些被 Google 引导到这里的人发布一个澄清答案。
文档大小包括文档中的所有内容,包括子文档、嵌套对象等。
因此文档:
最大大小为 16 MB。
子文档和嵌套对象都计入文档的大小。
To post a clarification answer here for those who get directed here by Google.
The document size includes everything in the document including the subdocuments, nested objects etc.
So a document of:
Has a maximum size of 16 MB.
Subdocuments and nested objects are all counted towards the size of the document.
我还没有看到不涉及文档本身存储的大文件的限制问题。已经有多种数据库在存储/检索大文件方面非常有效;它们被称为操作系统。数据库作为操作系统之上的一层存在。如果您出于性能原因使用 NoSQL 解决方案,为什么要通过在应用程序和数据之间放置 DB 层来增加数据访问的额外处理开销?
JSON 是一种文本格式。因此,如果您通过 JSON 访问数据,并且您有二进制文件,则尤其如此,因为它们必须以 uuencode、十六进制或 Base 64 进行编码。转换路径可能看起来像
二进制文件 <> JSON(编码)<> BSON(编码)
将数据文件的路径(URL)放入文档中并将数据本身保留为二进制会更有效。
如果您确实想将这些长度未知的文件保留在数据库中,那么您最好将这些文件放入 GridFS 中,而不是在访问大文件时冒着终止并发的风险。
I have not yet seen a problem with the limit that did not involve large files stored within the document itself. There are already a variety of databases which are very efficient at storing/retrieving large files; they are called operating systems. The database exists as a layer over the operating system. If you are using a NoSQL solution for performance reasons, why would you want to add additional processing overhead to the access of your data by putting the DB layer between your application and your data?
JSON is a text format. So, if you are accessing your data through JSON, this is especially true if you have binary files because they have to be encoded in uuencode, hexadecimal, or Base 64. The conversion path might look like
binary file <> JSON (encoded) <> BSON (encoded)
It would be more efficient to put the path (URL) to the data file in your document and keep the data itself in binary.
If you really want to keep these files of unknown length in your DB, then you would probably be better off putting these in GridFS and not risking killing your concurrency when the large files are accessed.
BSON 文档的嵌套深度:
MongoDB 支持 BSON 文档的嵌套不超过 100 层。
更多信息请访问
Nested Depth for BSON Documents:
MongoDB supports no more than 100 levels of nesting for BSON documents.
More more info vist
根据 https: //www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1
如果您预计博客文章可能会超出 16Mb 文档限制,那么您应该将评论提取到一个单独的集合中,并从评论中引用博客文章并进行应用程序级联接。
According to https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1
If you expect that a blog post may exceed the 16Mb document limit, you should extract the comments into a separate collection and reference the blog post from the comment and do an application-level join.
也许存储博客文章 ->非关系数据库中的注释 relation 并不是真正最好的设计。
无论如何,您可能应该将评论存储在博客文章的单独集合中。
[编辑]
请参阅下面的评论以进行进一步讨论。
Perhaps storing a blog post -> comments relation in a non-relational database is not really the best design.
You should probably store comments in a separate collection to blog posts anyway.
[edit]
See comments below for further discussion.