如何对超过20000个uniq key进行分组?

发布于 2024-12-25 02:59:12 字数 491 浏览 3 评论 0原文

我有两个例子:

results = coll.group( key={"ip": 1, "id" : 1 }, condition= {}, initial={},
reduce="function(obj,prev) {}" )
print len(results)

以及:

map = Code(
"function () {"
"emit({ id: this.id, ip: this.ip}, {count: 1});"
"}"
)

reduce = Code("function (key, values) {""}")
result = coll.map_reduce(map, reduce, "map_reduce_example")
print result.count()

为什么第二个例子比第一个例子慢?我想使用 2 个示例而不是 1 个示例,因为 1 个示例不适用于超过 20000 个 uniq 密钥。

I have 2 examples:

results = coll.group( key={"ip": 1, "id" : 1 }, condition= {}, initial={},
reduce="function(obj,prev) {}" )
print len(results)

and:

map = Code(
"function () {"
"emit({ id: this.id, ip: this.ip}, {count: 1});"
"}"
)

reduce = Code("function (key, values) {""}")
result = coll.map_reduce(map, reduce, "map_reduce_example")
print result.count()

Why second example more slowly than first ? I want to use 2 example instead of 1 example because 1 example not work for more than 20000 uniq key.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

抱猫软卧 2025-01-01 02:59:12

MongoDB 的新聚合框架可能会解决您的问题,但在那之前我建议将map/reduce作为定期安排的后台作业运行并实时查询map/reduce产生的集合。这将更快地获得分组计数结果,但计数可能会稍微陈旧(取决于上次完成背景地图缩减的时间。)


说明:

MongoDB 的 map/reduce 比 group() 慢得多,原因如下:

  • 中间转换:BSON -> JSON-> BSON-> JSON-> BSON(MongoDB 以二进制 BSON 存储数据,但 JavaScript map() 和 reduce() 需要输入文本 JSON)
  • Javascript 函数 map() 和 reduce() 必须由单线程 JavaScript 引擎解释

MongoDB 的本机 C聚合函数要快得多,但它们的局限性之一是所有输出必须适合单个 BSON 文档(当前为 16MB)。 这就是唯一键的数量受到限制的原因。

MongoDB 的聚合框架将结合这两种方法的优点:

  • 原生执行以提高速度
  • 没有与 JSON 之间的 BSON 转换
  • 结果可以发送到集合,从而绕过单个文档设置的限制。

该框架已经记录可用于 MongoDB 的开发版本。该框架计划于 2012 年 2 月发布正式版本。

MongoDB's new aggregation framework will probably solve your problems, but until then I suggest running map/reduce as a regularly scheduled background job and querying the collection resulting from map/reduce in real-time. This will get the grouped count results much faster, but the counts may be slightly stale (depending on the last time the background map reduce was done.)


Explanation:

MongoDB's map/reduce is much slower than group() for several reasons:

  • Intermediate conversions: BSON -> JSON -> BSON -> JSON -> BSON (MongoDB stores data in binary BSON, but JavaScript map() and reduce() need to be fed textual JSON)
  • Javascript functions map() and reduce() must be interpreted by the single-threaded JavaScript engine

MongoDB's native C aggregation functions are much faster, but one of their limitations is that all output must fit within a single BSON document (currently 16MB). That is why there is a limit on the number of unique keys.

MongoDB's aggregation framework will combine the best of both methods:

  • Native execution for speed
  • No BSON conversions to/from JSON
  • Results can be sent to a collection, bypassing the limitations set by a single document.

The framework is already documented and available in development versions of MongoDB. The framework is scheduled for production release in Feb. 2012.

欢你一世 2025-01-01 02:59:12

当您运行 map/reduce 时,您的 mapreduce 函数在 javascript 运行时执行(比本机 C++ 代码慢)。这里面还涉及到一些锁(JS锁、读锁、写锁)。

另一方面,group 可能会更有效地执行(更多本机代码、更少的锁等)。

请注意,在分片环境中,map/reduce 是目前唯一的选择(在未来版本中,您将能够使用聚合框架)。

When you're running map/reduce, your map and reduce functions are executed in javascript runtime (which is slower than native C++ code). This also involves some locking (JS locks, read locks, write locks).

group, on the other hand, might be executed more efficiently (more native code, less locks, etc).

Note that in a sharded enviroment, map/reduce is your only option for now (in future versions you'll be able to use Aggregation Framework).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文