MongoDB Map-Reduce 速度慢并且内存不足

发布于 2024-11-28 11:24:50 字数 1433 浏览 0 评论 0原文

我想使用 MongoDB 作为我正在构建的分析系统的后端。使用 MongoDB 的主要优点之一是内置的 MapReduce。由于我们处于“中等数据”规模，因此我们还不需要 Hadoop 的开销。

的类型的行

{
 user_id: xxxx,
 thing_id:xxxx,
 time: xxx
}

出于测试目的，我在 EC2 大型实例上插入了 5000 万条带有 user_id 索引。它是一个单实例 mongodb（未分片）。

db.user_thing_like.find({user_id: 37104857})

需要不到一秒钟的时间。

然而，我想要计算用户条目数的 MapReduce 花费了一整夜，并返回内存不足错误，要么我必须做一些愚蠢的事情，要么 mongo db 不是适合我想做的事情的工具。

我是 Mongo DB 的新手，希望得到任何帮助。提前致谢

错误：

Tue Aug  9 13:15:58 uncaught exception: map reduce failed:{
        "assertion" : "invoke failed: JS Error: out of memory nofile_b:2",
        "assertionCode" : 9004,
        "errmsg" : "db assertion failure",
        "ok" : 0
}

MAPREDUCE QUERY：

db.user_thing_like.mapReduce(map, reduce, {out: "tmp_test"}, {query: {"user_id" : 37104857 }});

MAP AND REDUCE：

map = function () {
    for (var key in this) {
        emit(key.user_id, {count: 1});
    }
};

reduce = function (key, emits) {
    total = 0;
    for (var i in emits) {
        total += emits[i].count;
    }
    return {"count": total};
}

---更新---

我意识到mapreduce在我使用的语法中没有考虑我的查询过滤器。

这是正确的 MapReduce 查询。

db.runCommand({mapreduce: "user_thing_like", map: map, reduce: reduce, out: "tmp_test", query: {"user_id" : 37104857 }});

原文

I would like to use MongoDB as the backend for the analytics system I am building.
One of the main advantages of using MongoDB is the built-in map reduce.
Since we are at "medium data" scale, we do not yet need the overhead of Hadoop.

For testing purposes I insert 50 million rows of the type

{
 user_id: xxxx,
 thing_id:xxxx,
 time: xxx
}

With an index on user_id on an EC2 Large Instance. Its a Single instance mongodb (not sharded).

db.user_thing_like.find({user_id: 37104857})

takes less than a second.

However a mapreduce where I wanted to count the number of user entries took all night and returned with an out of memory error, either I must be doing something stupid or mongo db is not right tool for what I want to do.

I am new to Mongo DB and would appreciate any help. Thanks in advance

ERROR :

Tue Aug  9 13:15:58 uncaught exception: map reduce failed:{
        "assertion" : "invoke failed: JS Error: out of memory nofile_b:2",
        "assertionCode" : 9004,
        "errmsg" : "db assertion failure",
        "ok" : 0
}

MAPREDUCE QUERY:

db.user_thing_like.mapReduce(map, reduce, {out: "tmp_test"}, {query: {"user_id" : 37104857 }});

MAP AND REDUCE:

map = function () {
    for (var key in this) {
        emit(key.user_id, {count: 1});
    }
};

reduce = function (key, emits) {
    total = 0;
    for (var i in emits) {
        total += emits[i].count;
    }
    return {"count": total};
}

--- UPDATE ---

I realized that the mapreduce was not considering my query filter, in the syntax I used.

Here is the correct mapreduce query.

db.runCommand({mapreduce: "user_thing_like", map: map, reduce: reduce, out: "tmp_test", query: {"user_id" : 37104857 }});

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

左岸枫 2024-12-05 11:24:50

map = function () {
        emit(this.user_id, {count: 1});
    }
};

另外，尝试从手册中指定 user_id 作为 MapReduce 的排序键：

sort : <sorts the input objects using this key. Useful for optimization, like sorting by the emit key for fewer reduces>]

map = function () {
        emit(this.user_id, {count: 1});
    }
};

Also, try to specify user_id as sort key for MapReduce, from the manual:

sort : <sorts the input objects using this key. Useful for optimization, like sorting by the emit key for fewer reduces>]

回复收藏 0 原文

青衫负雪 2024-12-05 11:24:50

我意识到，在我使用的语法中，mapreduce 没有考虑我的查询过滤器。

这是正确的 MapReduce 查询。

db.runCommand({mapreduce: "user_thing_like", map: map, reduce: reduce, out: "tmp_test", query: {"user_id" : 37104857 }});

I realized that the mapreduce was not considering my query filter, in the syntax I used.

Here is the correct mapreduce query.

db.runCommand({mapreduce: "user_thing_like", map: map, reduce: reduce, out: "tmp_test", query: {"user_id" : 37104857 }});

回复收藏 0 原文

~没有更多了~

关于作者

土豪我们做朋友吧

暂无简介

0 文章

0 评论

23 人气

关注发私信

已经忘了多久

文章 0 评论 0

关注

15867725375

文章 0 评论 0

关注

LonelySnow

文章 0 评论 0

关注

走过海棠暮

文章 0 评论 0

关注

轻许诺言

文章 0 评论 0

关注

信馬由缰

文章 0 评论 0

友情链接

文江博客

MongoDB Map-Reduce 速度慢并且内存不足

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

MongoDB Map-Reduce 速度慢并且内存不足

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。