Mongodb 索引状态崩溃后和索引阶段
这个问题有两个方面,都与索引有关。
我有一个包含 5.3 亿个条目的数据集,每个条目都有一个包含 10 个元素的数组。我使用的是单个 mongod。我正在批量插入后的数组上构建索引。该数组有两个字符串类型的键值对 - int。
我已经推断/研究在构建之前建立索引这就是 mongodb 的设计目的,如果没有大量的 ram/可交换虚拟内存,就无法对如此大的数据集(插入后)建立索引。
一:索引构建的阶段
索引构建的阶段是什么,我在看日志,看到它从0到100%上升一次,直到达到100%才开始计数(某事与排序有关?)。第二阶段比第一阶段慢得多。是否还有其他需要完成的通行证?
二:索引状态
我不会以这种速度观看索引构建,并且我有一个索引数据集作为备份(我不再信任它,继续阅读)。因此,我kill -9
该进程。我再次启动该过程,日志显示数据库确认索引构建操作正在进行中并错误地结束,但除此之外没有任何内容。该索引显示在 db.
列表中。
我发现这非常奇怪,尤其是 getIndexes
位,我知道在这种情况下索引构建永远不会结束,并且现在我不能相信我相信索引的备份结束了。
我至少希望数据库平台处于一致的状态,或者在它交给我控制之前达到一致的状态。因此,要么回滚索引构建,完成它,要么在没有恢复操作的情况下拒绝启动。
那么我如何确定我的数据库是否处于一致状态,特别是索引?
This question has two aspects, both related to indices.
I have a dataset with 530 million entries, each entry has an array of 10 elements. I am using a single mongod. I am constructing an index on the array post-bulk-insert. The array has two key-value pairs of type string - int.
I have already deduced/researched that putting up the index before construction is what mongodb is designed for and such large datasets cannot be (post-insert) indexed without a massive amount of ram/swappable-virtual-memory.
one: phases of index construction
What are the phases of index construction, I was looking at the log and saw it go up once from 0 to 100%, only to begin counting once it reached 100% (something to do with sorting ? ?). The second phase was MUCH slower then the first. Are there any more passes that need to be done ?
two: Index state
I wasn't going to watch the index construction at this rate, and I have an indexed dataset as a backup(which I can't trust anymore, keep reading). So, I kill -9'd
the process. I started the process again, and the logs show the database acknowledging that a index build operation was in progress and ended incorrectly, but nothing beyond this. The index shows up in the db.<db-name>.getIndexes()
list.
I find this VERY odd especially the getIndexes
bit, I know for a fact that index construction in this case never ended, and now I can't trust the backups I have in which I believe indexing ended ok.
I at least expect a database platform to be in a consistent state, or to get to one before it passes me control. So, either rollback the index construction,finish it, or refuse to start without a recovery operation.
So how do I find out if my database is in a consistant state, specifically the indices ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为此,有一个
validate
命令。该命令是一个阻塞命令,就像修复一样,但看起来它有几个选项。同意。并且日志应该清楚地表明数据库重新启动时的状态。然而,MongoDB 绝对还没有“实现”。
事实上,一旦完成第二阶段,数据库就会锁定并执行巨大的 fsync,并将新创建的索引刷新到磁盘。你杀它的时候它可能就在这里。
上次我看到此过程发生时,
fsync
期间没有日志消息。考虑到数据的大小,这将代表刷新到磁盘的大量数据。对驱动器速度与索引进行一些数学计算,但此阶段肯定会代表大量等待时间。For this, there is a
validate
command. The command is a blocking command, like repair, but it looks like it has a few options.Agreed. And the logs should be crystal clear about the state when the DB when it is restarted. However, MongoDB is definitely not "there" yet.
Indeed, once it is done the second phase, the DB then locks and performs a giant
fsync
as it flushes the newly created index to the disk. It was probably here when you killed it.The last time I watched this process happen, there was no log message during the
fsync
. Given the size of your data, this will represent gigs and gigs of data flushing to the disk. Run some math on the speed of your drives vs. the index, but this phase could definitely represent a lot of waiting time.