MongoDB Map-Reduce 速度慢并且内存不足
我想使用 MongoDB 作为我正在构建的分析系统的后端。 使用 MongoDB 的主要优点之一是内置的 MapReduce。 由于我们处于“中等数据”规模,因此我们还不需要 Hadoop 的开销。
的类型的行
{
user_id: xxxx,
thing_id:xxxx,
time: xxx
}
出于测试目的,我在 EC2 大型实例上插入了 5000 万条带有 user_id 索引 。它是一个单实例 mongodb(未分片)。
db.user_thing_like.find({user_id: 37104857})
需要不到一秒钟的时间。
然而,我想要计算用户条目数的 MapReduce 花费了一整夜,并返回内存不足错误,要么我必须做一些愚蠢的事情,要么 mongo db 不是适合我想做的事情的工具。
我是 Mongo DB 的新手,希望得到任何帮助。提前致谢
错误:
Tue Aug 9 13:15:58 uncaught exception: map reduce failed:{
"assertion" : "invoke failed: JS Error: out of memory nofile_b:2",
"assertionCode" : 9004,
"errmsg" : "db assertion failure",
"ok" : 0
}
MAPREDUCE QUERY:
db.user_thing_like.mapReduce(map, reduce, {out: "tmp_test"}, {query: {"user_id" : 37104857 }});
MAP AND REDUCE:
map = function () {
for (var key in this) {
emit(key.user_id, {count: 1});
}
};
reduce = function (key, emits) {
total = 0;
for (var i in emits) {
total += emits[i].count;
}
return {"count": total};
}
---更新---
我意识到mapreduce在我使用的语法中没有考虑我的查询过滤器。
这是正确的 MapReduce 查询。
db.runCommand({mapreduce: "user_thing_like", map: map, reduce: reduce, out: "tmp_test", query: {"user_id" : 37104857 }});
I would like to use MongoDB as the backend for the analytics system I am building.
One of the main advantages of using MongoDB is the built-in map reduce.
Since we are at "medium data" scale, we do not yet need the overhead of Hadoop.
For testing purposes I insert 50 million rows of the type
{
user_id: xxxx,
thing_id:xxxx,
time: xxx
}
With an index on user_id on an EC2 Large Instance. Its a Single instance mongodb (not sharded).
db.user_thing_like.find({user_id: 37104857})
takes less than a second.
However a mapreduce where I wanted to count the number of user entries took all night and returned with an out of memory error, either I must be doing something stupid or mongo db is not right tool for what I want to do.
I am new to Mongo DB and would appreciate any help. Thanks in advance
ERROR :
Tue Aug 9 13:15:58 uncaught exception: map reduce failed:{
"assertion" : "invoke failed: JS Error: out of memory nofile_b:2",
"assertionCode" : 9004,
"errmsg" : "db assertion failure",
"ok" : 0
}
MAPREDUCE QUERY:
db.user_thing_like.mapReduce(map, reduce, {out: "tmp_test"}, {query: {"user_id" : 37104857 }});
MAP AND REDUCE:
map = function () {
for (var key in this) {
emit(key.user_id, {count: 1});
}
};
reduce = function (key, emits) {
total = 0;
for (var i in emits) {
total += emits[i].count;
}
return {"count": total};
}
--- UPDATE ---
I realized that the mapreduce was not considering my query filter, in the syntax I used.
Here is the correct mapreduce query.
db.runCommand({mapreduce: "user_thing_like", map: map, reduce: reduce, out: "tmp_test", query: {"user_id" : 37104857 }});
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
另外,尝试从手册中指定
user_id
作为 MapReduce 的排序键:Also, try to specify
user_id
as sort key for MapReduce, from the manual:我意识到,在我使用的语法中,mapreduce 没有考虑我的查询过滤器。
这是正确的 MapReduce 查询。
I realized that the mapreduce was not considering my query filter, in the syntax I used.
Here is the correct mapreduce query.