Mongo 分片无法在分片之间分割大型集合

发布于 2024-09-18 16:15:35 字数 1781 浏览 5 评论 0原文

我在 mongo 中看似简单的分片设置遇到了问题。

我有两个分片、一个 mongos 实例和一个配置服务器,设置如下:

机器 A - 10.0.44.16 - 配置服务器,mongos
机器 B - 10.0.44.10 - 分片 1
机器 C - 10.0.44.11 - 分片 2

我有一个名为“Seeds”的集合,它有一个分片键“SeedType”,该字段是集合中每个文档中都存在的字段,并且包含四个值之一(看看下面的分片状态)。其中两个值的条目数明显多于其他两个值(其中两个值各有 784,000 条记录,另外两个值大约有 5,000 条记录)。

我期望看到的行为是,带有 InventoryPOS 的“Seeds”集合中的记录最终将出现在一个分片上,而带有 InventoryOnHand 的记录将最终出现在另一个分片上。

然而,两个较大分片键的所有记录似乎最终都位于主分片上。

这是我的分片状态文本(为了清楚起见,删除了其他集合):

--- Sharding Status ---
  sharding version: { "_id" : 1, "version" : 3 }
  shards:
      { "_id" : "shard0000", "host" : "10.44.0.11:27019" }
      { "_id" : "shard0001", "host" : "10.44.0.10:27017" }
  databases:
        { "_id" : "admin", "partitioned" : false, "primary" : "config" }
        { "_id" : "TimMulti", "partitioned" : true, "primary" : "shard0001" }
                TimMulti.Seeds chunks:
                        { "SeedType" : { $minKey : 1 } } -->> { "SeedType" : "PBI.AnalyticsServer.KPI" } on : shard0000 { "t" : 2000, "i" : 0 }
                        { "SeedType" : "PBI.AnalyticsServer.KPI" } -->> { "SeedType" : "PBI.Retail.InventoryOnHand" } on : shard0001 { "t" : 2000, "i" : 7 }
                        { "SeedType" : "PBI.Retail.InventoryOnHand" } -->> { "SeedType" : "PBI.Retail.InventoryPOS" } on : shard0001 { "t" : 2000, "i" : 8 }
                        { "SeedType" : "PBI.Retail.InventoryPOS" } -->> { "SeedType" : "PBI.Retail.SKU" } on : shard0001 { "t" : 2000, "i" : 9 }
                        { "SeedType" : "PBI.Retail.SKU" } -->> { "SeedType" : { $maxKey : 1 } } on : shard0001 { "t" : 2000, "i" : 10 }

我做错了什么吗?

半不相关的问题:

在不阻塞整个 mongo 服务的情况下,以原子方式将对象从一个集合传输到另一个集合的最佳方法是什么?

提前致谢, -蒂姆

I'm having problems with what seems to be a simple sharding setup in mongo.

I have two shards, a single mongos instance, and a single config server set up like this:

Machine A - 10.0.44.16 - config server, mongos
Machine B - 10.0.44.10 - shard 1
Machine C - 10.0.44.11 - shard 2

I have a collection called 'Seeds' that has a shard key 'SeedType' which is a field that is present on every document in the collection, and contains one of four values (take a look at the sharding status below). Two of the values have significantly more entries than the other two (two of them have 784,000 records each, and two have about 5,000).

The behavior I'm expecting to see is that records in the 'Seeds' collection with InventoryPOS will end up on one shard, and the ones with InventoryOnHand will end up on the other.

However, it seems that all records for both the two larger shard keys end up on the primary shard.

Here's my sharding status text (other collections removed for clarity):

--- Sharding Status ---
  sharding version: { "_id" : 1, "version" : 3 }
  shards:
      { "_id" : "shard0000", "host" : "10.44.0.11:27019" }
      { "_id" : "shard0001", "host" : "10.44.0.10:27017" }
  databases:
        { "_id" : "admin", "partitioned" : false, "primary" : "config" }
        { "_id" : "TimMulti", "partitioned" : true, "primary" : "shard0001" }
                TimMulti.Seeds chunks:
                        { "SeedType" : { $minKey : 1 } } -->> { "SeedType" : "PBI.AnalyticsServer.KPI" } on : shard0000 { "t" : 2000, "i" : 0 }
                        { "SeedType" : "PBI.AnalyticsServer.KPI" } -->> { "SeedType" : "PBI.Retail.InventoryOnHand" } on : shard0001 { "t" : 2000, "i" : 7 }
                        { "SeedType" : "PBI.Retail.InventoryOnHand" } -->> { "SeedType" : "PBI.Retail.InventoryPOS" } on : shard0001 { "t" : 2000, "i" : 8 }
                        { "SeedType" : "PBI.Retail.InventoryPOS" } -->> { "SeedType" : "PBI.Retail.SKU" } on : shard0001 { "t" : 2000, "i" : 9 }
                        { "SeedType" : "PBI.Retail.SKU" } -->> { "SeedType" : { $maxKey : 1 } } on : shard0001 { "t" : 2000, "i" : 10 }

Am I doing anything wrong?

Semi-unrelated question:

What is the best way to atomically transfer an object from one collection to another without blocking the entire mongo service?

Thanks in advance,
-Tim

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

海之角 2024-09-25 16:15:35

分片确实不应该以这种方式使用。您应该选择具有某种变化的分片键(或创建复合分片键),以便 MongoDB 可以生成合理大小的块。分片的要点之一是您的应用程序不必知道您的数据在哪里。

如果您想手动分片,您应该这样做:启动未链接的 MongoDB 服务器并从客户端自行路由。

最后,如果您真的致力于此设置,您可以自己迁移块(有一个 moveChunk 命令)。

平衡器根据内存中映射的数量来移动块(运行 serverStatus 并查看“mapped”字段)。这可能需要一段时间,MongoDB 不希望您的数据在生产中到处乱飞,所以它相当保守。

半不相关的答案:你不能通过分片原子地完成它(eval在多个服务器上不是原子的)。您必须执行 findOne、插入、删除操作。

Sharding really isn't meant to be used this way. You should choose a shard key with some variation (or make a compound shard key) so that MongoDB can make reasonable-size chunks. One of the points of sharding is that your application doesn't have to know where your data is.

If you want to manually shard, you should do that: start unlinked MongoDB servers and route things yourself from the client side.

Finally, if you're really dedicated to this setup, you could migrate the chunk yourself (there's a moveChunk command).

The balancer moves chunks based on how much is mapped in memory (run serverStatus and look at the "mapped" field). It can take a while, MongoDB doesn't want your data flying all over the place in production, so it's pretty conservative.

Semi-unrelated answer: you can't do it atomically with sharding (eval isn't atomic across multiple servers). You'll have to do a findOne, insert, remove.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文