如何对超过20000个uniq key进行分组?
我有两个例子:
results = coll.group( key={"ip": 1, "id" : 1 }, condition= {}, initial={},
reduce="function(obj,prev) {}" )
print len(results)
以及:
map = Code(
"function () {"
"emit({ id: this.id, ip: this.ip}, {count: 1});"
"}"
)
reduce = Code("function (key, values) {""}")
result = coll.map_reduce(map, reduce, "map_reduce_example")
print result.count()
为什么第二个例子比第一个例子慢?我想使用 2 个示例而不是 1 个示例,因为 1 个示例不适用于超过 20000 个 uniq 密钥。
I have 2 examples:
results = coll.group( key={"ip": 1, "id" : 1 }, condition= {}, initial={},
reduce="function(obj,prev) {}" )
print len(results)
and:
map = Code(
"function () {"
"emit({ id: this.id, ip: this.ip}, {count: 1});"
"}"
)
reduce = Code("function (key, values) {""}")
result = coll.map_reduce(map, reduce, "map_reduce_example")
print result.count()
Why second example more slowly than first ? I want to use 2 example instead of 1 example because 1 example not work for more than 20000 uniq key.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
MongoDB 的新聚合框架可能会解决您的问题,但在那之前我建议将map/reduce作为定期安排的后台作业运行并实时查询map/reduce产生的集合。这将更快地获得分组计数结果,但计数可能会稍微陈旧(取决于上次完成背景地图缩减的时间。)
说明:
MongoDB 的 map/reduce 比 group() 慢得多,原因如下:
MongoDB 的本机 C聚合函数要快得多,但它们的局限性之一是所有输出必须适合单个 BSON 文档(当前为 16MB)。 这就是唯一键的数量受到限制的原因。
MongoDB 的聚合框架将结合这两种方法的优点:
该框架已经记录和可用于 MongoDB 的开发版本。该框架计划于 2012 年 2 月发布正式版本。
MongoDB's new aggregation framework will probably solve your problems, but until then I suggest running map/reduce as a regularly scheduled background job and querying the collection resulting from map/reduce in real-time. This will get the grouped count results much faster, but the counts may be slightly stale (depending on the last time the background map reduce was done.)
Explanation:
MongoDB's map/reduce is much slower than group() for several reasons:
MongoDB's native C aggregation functions are much faster, but one of their limitations is that all output must fit within a single BSON document (currently 16MB). That is why there is a limit on the number of unique keys.
MongoDB's aggregation framework will combine the best of both methods:
The framework is already documented and available in development versions of MongoDB. The framework is scheduled for production release in Feb. 2012.
当您运行 map/reduce 时,您的
map
和reduce
函数在 javascript 运行时执行(比本机 C++ 代码慢)。这里面还涉及到一些锁(JS锁、读锁、写锁)。另一方面,
group
可能会更有效地执行(更多本机代码、更少的锁等)。请注意,在分片环境中,map/reduce 是目前唯一的选择(在未来版本中,您将能够使用聚合框架)。
When you're running map/reduce, your
map
andreduce
functions are executed in javascript runtime (which is slower than native C++ code). This also involves some locking (JS locks, read locks, write locks).group
, on the other hand, might be executed more efficiently (more native code, less locks, etc).Note that in a sharded enviroment, map/reduce is your only option for now (in future versions you'll be able to use Aggregation Framework).