Mongodb MapReduce 使用 Mongoid 对每个类别最多进行 n 次分组

发布于 2025-01-02 01:43:35 字数 3223 浏览 5 评论 0原文

我在 MongoDB (2.0.2) 地图缩减方面遇到了一个奇怪的问题。

所以，故事是这样的：

我有广告模型（在下面查找模型源代码摘录），我需要对每个类别最多 n 个广告进行分组，以便获得一个很好的有序列表，我以后可以用它来做更多有趣的事情。

# encoding: utf-8
class Ad  
  include Mongoid::Document
  cache
  include Mongoid::Timestamps

  field :title
  field :slug, :unique => true

  def self.aggregate_latest_active_per_category

    map = "function () {
        emit( this.category, { id: this._id });
    }"

    reduce = "function ( key, value ) {
      return { ads:v };
    }"

  self.collection.map_reduce(map, reduce, { :out => "categories"} )

  end

到目前为止所有的乐趣和游戏。

我期望得到类似于 (mongo shell for db.categories.findOne() ) 的结果：

    {
      "_id" : "category_name",
      "value" : {
          "ads" : [
                          {
                            "id" : ObjectId("4f2970e9e815f825a30014ab")
                          },
                          {
                            "id" : ObjectId("4f2970e9e815f825a30014b0")
                          },
                          {
                            "id" : ObjectId("4f2970e9e815f825a30014b6")
                          },
                          {
                            "id" : ObjectId("4f2970e9e815f825a30014b8")
                          },
                          {
                            "id" : ObjectId("4f2970e9e815f825a30014bd")
                          },
                          {
                            "id" : ObjectId("4f2970e9e815f825a30014c1")
                          },
                          {
                            "id" : ObjectId("4f2970e9e815f825a30014ca")
                          },
                          // ... and it goes on and on
          ]
        }
      }

实际上，如果我可以获得包含的值，那就更好了只有数组，但 MongoDB 抱怨还不支持它，但是，随着稍后使用 finalize 函数，这不是我想问的大问题。

现在，回到问题。当我进行映射缩减时实际发生的情况是，它会输出类似以下内容的内容：

{
    "_id" : "category_name",
    "value" : {
        "ads" : [
            {
                "ads" : [
                    {
                        "ads" : [
                            {
                                "ads" : [
                                    {
                                        "ads" : [
                                            {
                                                "id" : ObjectId("4f2970d8e815f825a3000011")
                                            },
                                            {
                                                "id" : ObjectId("4f2970d8e815f825a3000017")
                                            },
                                            {
                                                "id" : ObjectId("4f2970d8e815f825a3000019")
                                            },
                                            {
                                                "id" : ObjectId("4f2970d8e815f825a3000022")
                                            },

   // ... on and on and on

...虽然我可能可以找到一种使用它的方法，但它看起来不像我应该得到的东西。

所以，我的问题（最后）是：

我做错了什么吗？
我的 MongoDB map reduce 有问题（我的意思是除了与 hadoop 相比的所有常见问题之外）？

原文

I have a weird problem with MongoDB (2.0.2) map reduce.

So, the story goes like this:

I have Ad model (look for model source extract below) and I need to group up to n ads per category in order to have a nice ordered listing I can later use to do more interesting things.

# encoding: utf-8
class Ad  
  include Mongoid::Document
  cache
  include Mongoid::Timestamps

  field :title
  field :slug, :unique => true

  def self.aggregate_latest_active_per_category

    map = "function () {
        emit( this.category, { id: this._id });
    }"

    reduce = "function ( key, value ) {
      return { ads:v };
    }"

  self.collection.map_reduce(map, reduce, { :out => "categories"} )

  end

All fun and games up until now.

What I expect is to get a result in a form which resembles (mongo shell for db.categories.findOne() ):

    {
      "_id" : "category_name",
      "value" : {
          "ads" : [
                          {
                            "id" : ObjectId("4f2970e9e815f825a30014ab")
                          },
                          {
                            "id" : ObjectId("4f2970e9e815f825a30014b0")
                          },
                          {
                            "id" : ObjectId("4f2970e9e815f825a30014b6")
                          },
                          {
                            "id" : ObjectId("4f2970e9e815f825a30014b8")
                          },
                          {
                            "id" : ObjectId("4f2970e9e815f825a30014bd")
                          },
                          {
                            "id" : ObjectId("4f2970e9e815f825a30014c1")
                          },
                          {
                            "id" : ObjectId("4f2970e9e815f825a30014ca")
                          },
                          // ... and it goes on and on
          ]
        }
      }

Actually, it would be even better if I could get value to contain only array but MongoDB complains about not supporting that yet, but, with later use of finalize function, that is not a big problem I want to ask about.

Now, back to problem. What actually happens when I do map reduce is that it spits out something like :

{
    "_id" : "category_name",
    "value" : {
        "ads" : [
            {
                "ads" : [
                    {
                        "ads" : [
                            {
                                "ads" : [
                                    {
                                        "ads" : [
                                            {
                                                "id" : ObjectId("4f2970d8e815f825a3000011")
                                            },
                                            {
                                                "id" : ObjectId("4f2970d8e815f825a3000017")
                                            },
                                            {
                                                "id" : ObjectId("4f2970d8e815f825a3000019")
                                            },
                                            {
                                                "id" : ObjectId("4f2970d8e815f825a3000022")
                                            },

   // ... on and on and on

... and while I could probably work out a way to use this it just doesn't look like something I should get.

So, my questions (finally) are:

Am I doing something wrong and what is it?
I there something wrong with MongoDB map reduce (I mean besides all the usual things when compared to hadoop)?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

何以笙箫默 2025-01-09 01:43:35

是的，你做错了。 map 和 reduce 的输入和输出应该是统一的。因为它们是并行执行的，并且 reduce 可能会在部分减少的结果上运行。尝试这些功能：

var map = function() {
  emit(this.category, {ads: [this._id]});
};

var reduce = function(key, values) {
  var result = {ads: []};

  values.forEach(function(v) {
    v.ads.forEach(function(a) {
      result.ads.push(a)
    });
  });
  return result;
}

这应该生成如下文档：

{_id: category, value: {ads: [ObjectId("4f2970d8e815f825a3000011"), 
                              ObjectId("4f2970d8e815f825a3000019"), 
                              ...]}}

Yes, you're doing it wrong. Inputs and outputs of map and reduce should be uniform. Because they are meant to be executed in parallel, and reduce might be run over partially reduced results. Try these functions:

var map = function() {
  emit(this.category, {ads: [this._id]});
};

var reduce = function(key, values) {
  var result = {ads: []};

  values.forEach(function(v) {
    v.ads.forEach(function(a) {
      result.ads.push(a)
    });
  });
  return result;
}

This should produce documents like:

{_id: category, value: {ads: [ObjectId("4f2970d8e815f825a3000011"), 
                              ObjectId("4f2970d8e815f825a3000019"), 
                              ...]}}

回复收藏 0 原文

~没有更多了~