mongodb 地图缩减：“第一/最低”价值？

发布于 2024-12-05 09:48:52 字数 807 浏览 1 评论 0原文

我有这样的文档：

{
        "_id" : "someid",
        "name" : "somename",
        "action" : "do something",
        "date" : ISODate("2011-08-19T09:00:00Z")
}

我想将它们映射减少为这样的内容：

{
        "_id" : "someid",
        "value" : {
            "count" : 100,
            "name" : "somename",
            "action" : "do something",
            "date" : ISODate("2011-08-19T09:00:00Z")
            "firstEncounteredDate" : ISODate("2011-07-01T08:00:00Z")
        }
}

我想按“名称”、“操作”和“日期”对映射减少的文档进行分组。但每个文档都应该有这个“firstEncounteredDate”，其中包含最早的“日期”（实际上按“名称”和“操作”分组）。

如果我按名称、操作和日期分组，firstEncounteredDate 将始终是日期，这就是为什么我想知道是否有任何方法可以从整个文档中获取“最早日期”（按“名称”和“操作”分组））同时进行映射缩减。

我怎样才能在地图减少中做到这一点？

编辑：有关firstEncounteredDate的更多详细信息（由@beny23提供）

原文

I have documents like this:

{
        "_id" : "someid",
        "name" : "somename",
        "action" : "do something",
        "date" : ISODate("2011-08-19T09:00:00Z")
}

I want to map reduce them into something like this:

{
        "_id" : "someid",
        "value" : {
            "count" : 100,
            "name" : "somename",
            "action" : "do something",
            "date" : ISODate("2011-08-19T09:00:00Z")
            "firstEncounteredDate" : ISODate("2011-07-01T08:00:00Z")
        }
}

I want to group the map reduced documents by "name", "action", and "date". But every document should has this "firstEncounteredDate" containing the earliest "date" (that is actually grouped by "name" and "action").

If I group by name, action and date, firstEncounteredDate would always be date, that's why I'd like to know if there's any way to get "the earliest date" (grouped by "name", and "action" from the entire document) while doing map-reduce.

How can I do this in map reduce?

Edit: more detail on firstEncounteredDate (courtesy to @beny23)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

雨巷深深 2024-12-12 09:48:52

似乎两遍映射缩减就符合要求，有点类似于这个例子： http:// Cookbook.mongodb.org/patterns/unique_items_map_reduce/

在第 1 遍中，将原始“name”x“action”x“date”文档分组为“name”和“action”，在reduce过程中将各种“date”值收集到“dates”数组中。使用“最终确定”函数查找收集的最小日期。

未经测试的代码：

// phase i map function : 

function () {
  emit( { "name": this.name, "action": this.action } , 
        { "count": 1, "dates": [ this.date ] } );
}

// phase i reduce function : 

function( key, values ) {
  var result = { count: 0, dates: [ ] };

  values.forEach( function( value ) {
    result.count += value.count;
    result.dates = result.dates.concat( value.dates );
  }

  return result;
}

// phase i finalize function : 

function( key, reduced_value ) {
  var earliest = new Date( Math.min.apply( Math, reduced_value.dates ) );
  reduced_value.firstEncounteredDate = earliest ;
  return reduced_value;
}

在第 2 步中，使用第 1 步中生成的文档作为输入。对于每个“名称”x“操作”文档，为每个收集的日期发出新的“名称”x“操作”x“日期”文档，以及现在确定的该“名称”x“操作”对所共有的最小日期。按“名称”x“操作”x“日期”分组，汇总reduce期间每个单独日期的计数。

同样未经测试的代码：

// phase ii map function : 

function() {
  this.dates.forEach( function( d ) {
    emit( { "name": this.name, "action": this.action, "date" : d } ,
          { "count": 1, "firstEncounteredDate" : this.firstEncounteredDate } );
  }
}

// phase ii reduce function : 

function( key, values ) {
  // note: value[i].firstEncounteredDate should all be identical, so ... 
  var result = { "count": 0, 
                 "firstEncounteredDate": values[0].firstEncounteredDate };

  values.forEach( function( value ) {
    result.count += value.count;
  }

  return result;
}

显然，第 2 次传递并没有做很多繁重的工作——它主要是将每个文档复制 N 次，每个唯一日期复制一次。在第 1 次传递的减少步骤中，我们可以轻松构建唯一日期到其发生率计数的地图。（事实上，如果我们不这样做，那么在第 1 次传递的值中包含“计数”字段就没有任何实际意义。）但是执行第二次传递是一种相当轻松的方法生成包含所需文档的完整目标集合。

Seems like a two-pass map-reduce would fit the bill, somewhat akin to this example: http://cookbook.mongodb.org/patterns/unique_items_map_reduce/

In pass #1, group the original "name"x"action"x"date" documents by just "name" and "action", collecting the various "date" values into a "dates" array during reduce. Use a 'finalize' function to find the minimum of the collected dates.

Untested code:

// phase i map function : 

function () {
  emit( { "name": this.name, "action": this.action } , 
        { "count": 1, "dates": [ this.date ] } );
}

// phase i reduce function : 

function( key, values ) {
  var result = { count: 0, dates: [ ] };

  values.forEach( function( value ) {
    result.count += value.count;
    result.dates = result.dates.concat( value.dates );
  }

  return result;
}

// phase i finalize function : 

function( key, reduced_value ) {
  var earliest = new Date( Math.min.apply( Math, reduced_value.dates ) );
  reduced_value.firstEncounteredDate = earliest ;
  return reduced_value;
}

In pass #2, use the documents generated in pass #1 as input. For each "name"x"action" document, emit a new "name"x"action"x"date" document for each collected date, along with the now determined minimum date common to that "name"x"action" pair. Group by "name"x"action"x"date", summing up the count for each individual date during reduce.

Equally untested code:

// phase ii map function : 

function() {
  this.dates.forEach( function( d ) {
    emit( { "name": this.name, "action": this.action, "date" : d } ,
          { "count": 1, "firstEncounteredDate" : this.firstEncounteredDate } );
  }
}

// phase ii reduce function : 

function( key, values ) {
  // note: value[i].firstEncounteredDate should all be identical, so ... 
  var result = { "count": 0, 
                 "firstEncounteredDate": values[0].firstEncounteredDate };

  values.forEach( function( value ) {
    result.count += value.count;
  }

  return result;
}

Pass #2 does not do a lot of heavy lifting, obviously -- it's mostly copying each document N times, one for each unique date. We could easily build a map of unique dates to their incidence counts during the reduce step of pass #1. (In fact, if we don't do this, there's no real point in having a "count" field in the values from pass #1.) But doing the second pass is a fairly effortless way of generating a full target collection containing the desired documents.

回复收藏 0 原文

~没有更多了~