Mongodb 索引状态崩溃后和索引阶段

发布于 2025-01-01 04:15:24 字数 893 浏览 6 评论 0原文

这个问题有两个方面,都与索引有关。

我有一个包含 5.3 亿个条目的数据集,每个条目都有一个包含 10 个元素的数组。我使用的是单个 mongod。我正在批量插入后的数组上构建索引。该数组有两个字符串类型的键值对 - int。

我已经推断/研究在构建之前建立索引这就是 mongodb 的设计目的,如果没有大量的 ram/可交换虚拟内存,就无法对如此大的数据集(插入后)建立索引。

一:索引构建的阶段

索引构建的阶段是什么,我在看日志,看到它从0到100%上升一次,直到达到100%才开始计数(某事与排序有关?)。第二阶段比第一阶段慢得多。是否还有其他需要完成的通行证?

二:索引状态

我不会以这种速度观看索引构建,并且我有一个索引数据集作为备份(我不再信任它,继续阅读)。因此,我kill -9 该进程。我再次启动该过程,日志显示数据库确认索引构建操作正在进行中并错误地结束,但除此之外没有任何内容。该索引显示在 db..getIndexes() 列表中。

我发现这非常奇怪,尤其是 getIndexes 位,我知道在这种情况下索引构建永远不会结束,并且现在我不能相信我相信索引的备份结束了。

我至少希望数据库平台处于一致的状态,或者在它交给我控制之前达到一致的状态。因此,要么回滚索引构建,完成它,要么在没有恢复操作的情况下拒绝启动。

那么我如何确定我的数据库是否处于一致状态,特别是索引?

This question has two aspects, both related to indices.

I have a dataset with 530 million entries, each entry has an array of 10 elements. I am using a single mongod. I am constructing an index on the array post-bulk-insert. The array has two key-value pairs of type string - int.

I have already deduced/researched that putting up the index before construction is what mongodb is designed for and such large datasets cannot be (post-insert) indexed without a massive amount of ram/swappable-virtual-memory.

one: phases of index construction

What are the phases of index construction, I was looking at the log and saw it go up once from 0 to 100%, only to begin counting once it reached 100% (something to do with sorting ? ?). The second phase was MUCH slower then the first. Are there any more passes that need to be done ?

two: Index state

I wasn't going to watch the index construction at this rate, and I have an indexed dataset as a backup(which I can't trust anymore, keep reading). So, I kill -9'd the process. I started the process again, and the logs show the database acknowledging that a index build operation was in progress and ended incorrectly, but nothing beyond this. The index shows up in the db.<db-name>.getIndexes() list.

I find this VERY odd especially the getIndexes bit, I know for a fact that index construction in this case never ended, and now I can't trust the backups I have in which I believe indexing ended ok.

I at least expect a database platform to be in a consistent state, or to get to one before it passes me control. So, either rollback the index construction,finish it, or refuse to start without a recovery operation.

So how do I find out if my database is in a consistant state, specifically the indices ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

心欲静而疯不止 2025-01-08 04:15:24

那么我如何确定我的数据库是否处于一致状态,特别是索引?

为此,有一个 validate 命令。该命令是一个阻塞命令,就像修复一样,但看起来它有几个选项。

因此,要么回滚索引构建,完成它,要么在没有恢复操作的情况下拒绝启动。

同意。并且日志应该清楚地表明数据库重新启动时的状态。然而,MongoDB 绝对还没有“实现”。

第二阶段比第一阶段慢得多。是否还需要完成更多传递?

事实上,一旦完成第二阶段,数据库就会锁定并执行巨大的 fsync,并将新创建的索引刷新到磁盘。你杀它的时候它可能就在这里。

上次我看到此过程发生时,fsync 期间没有日志消息。考虑到数据的大小,这将代表刷新到磁盘的大量数据。对驱动器速度与索引进行一些数学计算,但此阶段肯定会代表大量等待时间。

So how do I find out if my database is in a consistant state, specifically the indices ?

For this, there is a validate command. The command is a blocking command, like repair, but it looks like it has a few options.

So, either rollback the index construction,finish it, or refuse to start without a recovery operation.

Agreed. And the logs should be crystal clear about the state when the DB when it is restarted. However, MongoDB is definitely not "there" yet.

The second phase was MUCH slower then the first. Are there any more passes that need to be done ?

Indeed, once it is done the second phase, the DB then locks and performs a giant fsync as it flushes the newly created index to the disk. It was probably here when you killed it.

The last time I watched this process happen, there was no log message during the fsync. Given the size of your data, this will represent gigs and gigs of data flushing to the disk. Run some math on the speed of your drives vs. the index, but this phase could definitely represent a lot of waiting time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文