如何使用 Mongo 控制台从 MongoDB 的 GridFS 删除文件(以类似事务的方式)
我需要删除一堆存储在 Mongo 的 GridFS 中早于给定日期的文件。这意味着我需要从 fs.files 和 fs.chunks 集合中删除。
我正在考虑编写一个函数,从 fs.files 中查找与搜索条件匹配的所有文档,迭代它们并在循环中删除 fs.chunks 中的所有文档然而,根据文档,匹配 fs.files 中相应文档的 files_id 值:
MongoDB 支持对单个文档进行原子操作。 MongoDB不支持传统锁定和复杂事务
这让我认为我建议的方法可能不是正确的方法。
我知道我可以使用客户端驱动程序之一来直接操作 GridFS。例如,使用 PHP 我可以像这样实现它:
<?php
$grid = $db->getGridFS('fs');
$grid->remove(array(
"uploadDate" => array(
'$lt' => new MongoDate(strtotime("2011-02-01 00:00:00"))
)
));
...但是我希望仅使用 mongo 控制台来完成此操作。
那么,删除文件的首选方法是什么,匹配两个集合中 fs.files 中字段的某些条件,使用 mongo 控制台或 JS 文件,为控制台提供数据?
I need to delete a bunch of files, stored in Mongo's GridFS that are older than a given date. This implies that I'd need to delete from both fs.files and fs.chunks collections.
I was thinking of writing a function that finds all documents from fs.files that match the search criteria, iterate over them and in the loop delete all documents from fs.chunks that match the files_id value of the corresponding document in fs.files, however, according to the docs:
MongoDB supports atomic operations on single documents. MongoDB does not support traditional locking and complex transactions
which makes me think my suggested approach might not be the correct one.
I know I could use one of the client drivers to manipulate the GridFS directly. For example, using PHP I could implement it like this:
<?php
$grid = $db->getGridFS('fs');
$grid->remove(array(
"uploadDate" => array(
'$lt' => new MongoDate(strtotime("2011-02-01 00:00:00"))
)
));
... however I wish to accomplish this using only the mongo console.
So, what is the preferred way to remove files, matching certain criteria regarding a field in fs.files from both collections, using the mongo console, or a JS file, feeding the console?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您建议的方法大部分是正确的。指向文件内容的唯一指针位于 fs.files 中,因此,如果您先删除该文件,并且删除块时出现问题,那么至少不会有任何东西可以引用这些块。
为了安全起见,您可以有一个后台作业来遍历 chunks 集合的 files_id 字段(该字段已建立索引,因此应该很快),并确保所有 files_ids 与 fs.files 集合中的文档匹配。如果它们不存在并且不是最近创建的(您可以从 _id 字段获取它们创建的时间戳),则可以删除它们。 (您不想删除最近创建的块,以防它们是当前正在插入的文件的一部分。)
Your suggested approach is mostly correct. The only pointer to a file's contents is in fs.files, so if you delete that first and something goes wrong deleting the chunks, at least nothing will be hanging around that could reference those chunks.
To be on the safe side, you could have a background job that goes through the files_id field of the chunks collection (which is indexed, so it should be fast) and makes sure all files_ids match up with docs in the fs.files collection. If they don't and they weren't created relatively recently (you can get the timestamp they were created at from the _id field), you can delete them. (You wouldn't want to delete recently created chunks, in case they are part of a file currently being inserted.)