为什么 ObjectId 使 MongoDB 中的分片变得更容易?

发布于 2024-10-05 16:32:14 字数 125 浏览 10 评论 0原文

我一直读到使用 ObjectId 作为唯一键可以使分片更容易,但我还没有看到相对详细的解释来解释为什么会这样。有人可以解释一下吗?

我问的原因是我想使用英文字符串(显然是唯一的)作为唯一键,但想确保它以后不会束缚我的手。

I keep reading that using an ObjectId as the unique key makes sharding easier, but I haven't seen a relatively detailed explanation as to why that is. Could someone shed some light on this?

The reason I ask is that I want to use an english string (which will be unique obviously) as the unique key, but want to make sure that it won't tie my hands later on.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

眼波传意 2024-10-12 16:32:14

我最近才开始熟悉 mongoDB 自己,所以对此持保留态度,但我怀疑使用 ObjectId 时分片可能比使用您自己的键值更有效,因为 ObjectId 的一部分会指出哪个创建文档的机器或分片。 mongo 文档中此页面的底部解释了 ObjectId 的每个部分的含义。

I've just recently been getting familiar with mongoDB myself so take this with a grain of salt but I suspect that sharding is probably more efficient when using ObjectId rather that your own key values because of the fact that part of the ObjectId will point out which machine or shard that the document was created on. The bottom of this page in the mongo docs explains what each portion of the ObjectId means.

仲春光 2024-10-12 16:32:14

我在 Mongo 用户列表上问了这个问题,基本上得到的答复是可以生成自己的 _id 值,并且不会使分片变得更加困难。对我来说,有时需要在 _id 上使用数字值,就像我要在 url 中使用它们时一样,所以我在某些集合中生成自己的 _id 。

I asked this question on Mongo user list and basically the reply was that it's OK to generate your own value of _id and it will not make sharding more difficult. For me sometimes it's necessary to have numeric values on _id like when I'm going to use them in url, so I'm generating my own _id in some collections.

黒涩兲箜 2024-10-12 16:32:14

ObjectId 被设计为全局唯一。因此,当用作主键并且将新记录附加到没有主键值的数据集时,每个分片都可以生成新的 objectid 而不必担心与其他分片的冲突。这在某种程度上简化了每个人的生活:)

ObjectId is designed to be globally unique. So, when used as a primary key and a new record is appended to the dataset without primary key value, then each shard can generate a new objectid and not worry about collisions with other shards. This somewhat simplifies life for everyone :)

淡淡的优雅 2024-10-12 16:32:14

片键不必是唯一的。我们不能得出这样的结论:根据对象 id 对集合进行分片总是有效的。

Shard key does not have to be unique. We can't conclude that sharding a collection based on object id is always efficient .

孤独陪着我 2024-10-12 16:32:14

实际上,ObjectID 对于分片键来说可能是一个糟糕的选择。

从文档(http://docs.mongodb.org/manual/core/ sharded-cluster-internals/“写入缩放”部分):

“[ObjectID] 的最高有效位表示时间戳,这意味着它们以规则且可预测的模式递增。[因此]所有插入操作都将数据存储到单个块中,因此,单个分片中,该分片的写入容量将定义集群的有效写入容量。”

换句话说,因为每个 OID 都比它之前创建的 OID 排序“更大”,所以由 OID 键控的插入将落在同一台机器上,并且该机器的写入 I/O 容量将是总 I/O 容量。整个集群的 O。 (这不仅适用于 OID,还适用于任何可预测的键 - 时间戳、自动递增数字等)

相反,如果您选择随机字符串作为分片键,写入将趋于均匀分布在集群上,并且您的吞吐量将是整个集群的总 I/O。

(编辑完成:使用 OID 分片键,当新记录落在“最右边”分片上时,平衡器将处理将它们移动到其他地方,因此它们最终会出现在其他机器上。但这并不能解决 I/哦,问题;这实际上使情况变得更糟。)

Actually, ObjectID is probably a poor choice for a shard key.

From the docs (http://docs.mongodb.org/manual/core/sharded-cluster-internals/ the section on "Write Scaling"):

"[T]he most significant bits of [an ObjectID] represent a time stamp, which means that they increment in a regular and predictable pattern. [Therefore] all insert operations will be storing data into a single chunk, and therefore, a single shard. As a result, the write capacity of this shard will define the effective write capacity of the cluster."

In other words, because every OID sorts "bigger" than the one created immediately before it, an inserts that are keyed by OID will land on the same machine, and the write I/O capacity of that one machine will be the total I/O of your entire cluster. (This is true not just of OIDs, but any predictable key -- timestamps, autoincrementing numbers, etc.)

Contrariwise, if you chose a random string as your shard key, writes would tend to distribute evenly over the cluster, and your throughput would be the total I/O of the whole cluster.

(EDIT to be complete: with an OID shard key, as new records landed on the "rightmost" shard, the balancer would handle moving them elsewhere, so they would eventually end up on other machines. But that doesn't solve the I/O problem; it actually makes it worse.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文