如何在 Mongodb 中处理数据库清除
我使用 mongodb 存储 30 天的数据,这些数据以流的形式发送给我。我正在寻找一种清除机制,通过该机制我可以丢弃最旧的数据,为新数据创造空间。我以前使用mysql,我用分区来处理这种情况。我保留了 30 个基于日期的分区。我删除了最旧的分区并创建了一个新分区来保存新数据。
当我在 mongodb 中映射相同的内容时,我感觉像使用基于日期的“分片”。但问题是它使我的数据分布很糟糕。如果所有新数据都在同一个分片中,那么该分片将会非常热,因为有很多人访问它们,并且包含旧数据的分片将被用户加载较少。
我可以进行基于集合的清除。我可以有 30 个集合,并且可以丢弃最旧的集合以容纳新数据。但有几个问题是:1)如果我将集合变小,那么我就无法从分片中受益匪浅,因为分片是针对每个集合完成的。 2) 我的查询必须更改为从所有 30 个集合中查询并采用并集。
请建议我一个好的清除机制(如果有)来处理这种情况。
I use mongodb for storing 30 day data which come to me as a stream. I am searching for a purging mechanism by which I can throw away oldest data to create room for new data. I used to use mysql in which I handled this situation using partitions. I kept 30 partitions which are date based. I delete the oldest dated partition and created a new partition to hold new data.
When I map the same thing in mongodb, I feel like using a date based 'shards'. But the problem is that it makes my data distribution bad. If all the new data are in the same shard, then that shard will be so hot as there are lot of people accessing them and the shards containing older data will be less loaded by users.
I can have a collection based purging. I can have 30 collections and I can throw away the oldest collection to accommodate new data. But couple of problems are 1) If I make collections smaller then I cannot benefit much from sharding as they are done per collection. 2) My queries have to change to query from all 30 collections and take an union.
Please suggest me a good purging mechanism (if any) to handle this situation.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在 MongoDB 中实际上只有三种方法可以进行清除。看起来您已经确定了一些权衡。
选项#1:单个集合
优点
缺点
选项#2:每天收集
优点
collection.drop()
删除数据非常快。缺点
选项#3:每天数据库
优点
缺点
现在有选项#4,但它不是通用解决方案。我知道有些人只是通过使用 Capped Collections 来进行“清除”。在某些情况下这肯定是有效的,但它有很多警告,所以你真的需要知道你在做什么。
There are really only three ways to do purging in MongoDB. It looks like you've already identified several of the trade-offs.
Option #1: single collection
pros
cons
Option #2: collection per day
pros
collection.drop()
is very fast.cons
Option #3: database per day
pros
cons
Now there is an option #4, but it is not a general solution. I know of some people who did "purging" by simply using Capped Collections. There are definitely cases where this works, but it has a bunch of caveats, so you really need to know what you're doing.
我们可以为 mongodb 2.2 版本或更高版本的收集设置 TTL。这将帮助您使收集的旧数据过期。
请点击此链接: http://docs.mongodb.org/manual/tutorial/expire-数据/
we can set TTL for collection from mongodb 2.2 release or higher. this will help you to expire old data from collection.
Follow this link: http://docs.mongodb.org/manual/tutorial/expire-data/
我遇到了类似的情况,这个页面帮助了我,特别是底部的“有用的脚本”部分。 http://www.mongodb.org/display/DOCS/Excessive+Disk+Space
I had a similar situation and this page helped me out, especially the "Helpful Scripts" section at the bottom. http://www.mongodb.org/display/DOCS/Excessive+Disk+Space
最好保留一台服务器作为存档
每 15 天进行一次净化
从存档中删除旧的..
使用更多数据分区进行存档
Better keep one server as archive
Do purging on 15 days interval
Delete old from archive..
Make archive with more data partition