读取和插入操作比 mongodb 中的转储更快
我需要清理 200Tb 的 mongodb 集合,并删除旧的时间戳。我正在尝试从新集合构建一个新集合,并运行删除查询,因为在当前正在使用的集合上运行 del 会减慢对其的其他请求。我想过通过转储以下集合或创建一个读写脚本来克隆一个新集合,这样它将从当前集合中读取并写入克隆的集合。我的问题是批处理的读/写操作例如:1000 读和写比转储更快?
编辑: 我发现这个,这个 和 这篇文章,并且想知道,如果写上述方式的脚本与创建读写的 ssh 管道相同吗? ex: 是一个节点/python 脚本,用于从集合中获取 1000 行并将其插入到克隆集合中,与 ssh *** ". /etc/profile; mongodump -h sourceHost -d yourDatabase … | mongorestore - h targetHost -d yourDatabase ?
I need to clean a mongodb collection of 200Tb, and delete older timstamp. I am trying to build a new collection from the new, and run a delete query, since, running a del on the present collection that is in use, will slow down the other requests to it. I have thought of cloning a new collection either by taking a dump of the following collection, or by create a read and and write script, such that, it will read from the present collection and write to the cloned collection. My question is is a read/write operation of a batch ex: 1000 read and write faster than a dump ?
EDIT:
I found this, this and this article, and want to know, if writing a script in the above mentioned way the same as creating a ssh pipe of read and write ? ex: is a node/python script to fetch 1000 rows from a collection and insert that to a clone collection the same as ssh *** ". /etc/profile; mongodump -h sourceHost -d yourDatabase … | mongorestore -h targetHost -d yourDatabase
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我建议采用这种方法:
mongoexport/mongoimport
导入有效数据,即跳过过时的数据。是的,一般来说,
mongodump/mongorestore
可能会更快,但是在mongoexport
中,您可以定义查询并限制导出的数据。可能是这样的:利用参数 numInsertionWorkers 来运行多个工作线程。它会加快你的插入速度。
那么您运行的是分片集群吗?如果是,那么您应该在新集合上使用
sh.splitAt()
,请参阅 如何将集合从一个数据库复制到 MongoDB 中的另一个数据库I would suggest this approach:
mongoexport/mongoimport
to import the valid data, i.e. skip the outdated.Yes, in general
mongodump/mongorestore
might be faster, however atmongoexport
you can define a query and limit the data which is exported. Could be like this:Utilize parameter
numInsertionWorkers
to run multiple workers. It will speed up your inserts.So you run a sharded cluster? If yes, then you should use
sh.splitAt()
on the new collection, see How to copy a collection from one database to another in MongoDB