MongoDB 按功能分组

发布于 2024-12-25 04:28:53 字数 738 浏览 5 评论 0原文

在MySQL中，

select a,b,count(1) as cnt from list group by a, b having cnt > 2;

我必须使用mongodb中的具有条件来执行按函数分组。但我收到以下错误。请分享您的意见。

在 MongoDB 中

> res = db.list.group({key:{a:true,b:true},
...                      reduce: function(obj,prev) {prev.count++;},
...                      initial: {count:0}}).limit(10);

Sat Jan  7 16:36:30 uncaught exception: group command failed: {
        "errmsg" : "exception: group() can't handle more than 20000 unique keys",
        "code" : 10043,
        "ok" : 0

一旦执行，我们需要在下一步运行以下文件。

for (i in res) {if (res[i].count>2) printjson(res[i])};

问候，库马兰

原文

In MySQL

select a,b,count(1) as cnt from list group by a, b having cnt > 2;

I have to execute the group by function using having condition in mongodb.
But i am getting following error. Please share your input.

In MongoDB

> res = db.list.group({key:{a:true,b:true},
...                      reduce: function(obj,prev) {prev.count++;},
...                      initial: {count:0}}).limit(10);

Sat Jan  7 16:36:30 uncaught exception: group command failed: {
        "errmsg" : "exception: group() can't handle more than 20000 unique keys",
        "code" : 10043,
        "ok" : 0

Once it will be executed, we need to run the following file on next.

for (i in res) {if (res[i].count>2) printjson(res[i])};

Regards,
Kumaran

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

惟欲睡 2025-01-01 04:28:54

根据您的组的数量，您可能会通过使用 distinct 找到比 group 或 MapReduce 更简单、更快的解决方案：

var res = [];
for( var cur_a = db.list.distinct('a'); cur_a.hasNext(); ) {
  var a = cur_a.next();
  for( var cur_b = db.list.distinct('b'); cur_b.hasNext(); ) {
    var b = cur_b.next();
    var cnt = db.list.count({'a':a,'b':b})
    if (cnt > 2)
      res.push({ 'a': a, 'b' : b 'cnt': cnt}
  }
}

如果你在 a 和 b 上有索引

db.list.ensureIndex({'a':1,'b':1})

Depends on the number of your groups, you might find a simpler and faster solution than group or MapReduce by using distinct:

var res = [];
for( var cur_a = db.list.distinct('a'); cur_a.hasNext(); ) {
  var a = cur_a.next();
  for( var cur_b = db.list.distinct('b'); cur_b.hasNext(); ) {
    var b = cur_b.next();
    var cnt = db.list.count({'a':a,'b':b})
    if (cnt > 2)
      res.push({ 'a': a, 'b' : b 'cnt': cnt}
  }
}

It will be faster if you have indexes on a and b

db.list.ensureIndex({'a':1,'b':1})

回复收藏 0 原文

无人问我粥可暖 2025-01-01 04:28:53

MongoDB group by 在大多数情况下是非常有限的，例如，

- the result set must be lesser than 10000 keys.
- it will not work in sharded environments

所以最好使用 Map Reduce。所以查询就像这样

~~map = function() { emmit({a:true,b:true},{count:1}); }~~

reduce = function(k, values) {
    var result = {count: 0};
    values.forEach(function(value) {
        result.count += value.count;
    });
    return result;
}

然后

db.list.mapReduce(map,reduce,{out: { inline : 1}})

它是一个未经测试的版本。让我知道它是否有效

编辑：

早期的地图功能有故障。这就是为什么你没有得到结果。它应该是

map = function () {
    emit({a:this.a, b:this.b}, {count:1});
}

测试数据：

> db.multi_group.insert({a:1,b:2})
> db.multi_group.insert({a:2,b:2})
> db.multi_group.insert({a:3,b:2})
> db.multi_group.insert({a:1,b:2})
> db.multi_group.insert({a:3,b:2})
> db.multi_group.insert({a:7,b:2})


> db.multi_group.mapReduce(map,reduce,{out: { inline : 1}})
{
    "results" : [
        {
            "_id" : {
                "a" : 1,
                "b" : 2
            },
            "value" : {
                "count" : 2
            }
        },
        {
            "_id" : {
                "a" : 2,
                "b" : 2
            },
            "value" : {
                "count" : 1
            }
        },
        {
            "_id" : {
                "a" : 3,
                "b" : 2
            },
            "value" : {
                "count" : 2
            }
        },
        {
            "_id" : {
                "a" : 7,
                "b" : 2
            },
            "value" : {
                "count" : 1
            }
        }
    ],
    "timeMillis" : 1,
    "counts" : {
        "input" : 6,
        "emit" : 6,
        "reduce" : 2,
        "output" : 4
    },
    "ok" : 1,
}

编辑2：

完整的解决方案，包括应用计数> = 2

map = function () {
    emit({a:this.a, b:this.b}, {count:1,_id:this._id});
}

reduce = function(k, values) {
    var result = {count: 0,_id:[]};
    values.forEach(function(value) {
        result.count += value.count;
        result._id.push(value._id);
    });
    return result;
}

>db.multi_group.mapReduce(map,reduce,{out: { replace : "multi_result"}})

> db.multi_result.find({'value.count' : {$gte : 2}})
{ "_id" : { "a" : 1, "b" : 2 }, "value" : { "_id" : [   ObjectId("4f0adf2884025491024f994c"),   ObjectId("4f0adf3284025491024f994f") ], "count" : 2 } }
{ "_id" : { "a" : 3, "b" : 2 }, "value" : { "_id" : [   ObjectId("4f0adf3084025491024f994e"),   ObjectId("4f0adf3584025491024f9950") ], "count" : 2 } }

MongoDB group by is very limited in most cases, for instance

- the result set must be lesser than 10000 keys.
- it will not work in sharded environments

So its better to use map reduce. so the query would be like this

~~map = function() { emit({a:true,b:true},{count:1}); }~~

reduce = function(k, values) {
    var result = {count: 0};
    values.forEach(function(value) {
        result.count += value.count;
    });
    return result;
}

and then

db.list.mapReduce(map,reduce,{out: { inline : 1}})

Its a untested version. let me know if it works

EDIT:

The earlier map function was faulty. Thats why you are not getting the results. it should have been

map = function () {
    emit({a:this.a, b:this.b}, {count:1});
}

Test data:

> db.multi_group.insert({a:1,b:2})
> db.multi_group.insert({a:2,b:2})
> db.multi_group.insert({a:3,b:2})
> db.multi_group.insert({a:1,b:2})
> db.multi_group.insert({a:3,b:2})
> db.multi_group.insert({a:7,b:2})


> db.multi_group.mapReduce(map,reduce,{out: { inline : 1}})
{
    "results" : [
        {
            "_id" : {
                "a" : 1,
                "b" : 2
            },
            "value" : {
                "count" : 2
            }
        },
        {
            "_id" : {
                "a" : 2,
                "b" : 2
            },
            "value" : {
                "count" : 1
            }
        },
        {
            "_id" : {
                "a" : 3,
                "b" : 2
            },
            "value" : {
                "count" : 2
            }
        },
        {
            "_id" : {
                "a" : 7,
                "b" : 2
            },
            "value" : {
                "count" : 1
            }
        }
    ],
    "timeMillis" : 1,
    "counts" : {
        "input" : 6,
        "emit" : 6,
        "reduce" : 2,
        "output" : 4
    },
    "ok" : 1,
}

EDIT2:

Complete solution including applying having count >= 2

map = function () {
    emit({a:this.a, b:this.b}, {count:1,_id:this._id});
}

reduce = function(k, values) {
    var result = {count: 0,_id:[]};
    values.forEach(function(value) {
        result.count += value.count;
        result._id.push(value._id);
    });
    return result;
}

>db.multi_group.mapReduce(map,reduce,{out: { replace : "multi_result"}})

> db.multi_result.find({'value.count' : {$gte : 2}})
{ "_id" : { "a" : 1, "b" : 2 }, "value" : { "_id" : [   ObjectId("4f0adf2884025491024f994c"),   ObjectId("4f0adf3284025491024f994f") ], "count" : 2 } }
{ "_id" : { "a" : 3, "b" : 2 }, "value" : { "_id" : [   ObjectId("4f0adf3084025491024f994e"),   ObjectId("4f0adf3584025491024f9950") ], "count" : 2 } }

回复收藏 0 原文