基于文档外部包含的属性的 MapReduce 聚合

发布于 2024-12-03 06:39:45 字数 951 浏览 1 评论 0原文

假设我有一个“活动”集合,每个活动都有名称、成本和位置:

{_id : 1 , name: 'swimming', cost: '3.40', location: 'kirkstall'}
{_id : 2 , name: 'cinema', cost: '6.50', location: 'hyde park'}
{_id : 3 , name: 'gig', cost: '10.00', location: 'hyde park'}

我还有一个 people 集合,其中记录了每项活动他们计划进行的次数ayear:

{_id : 1 , name: 'russell', activities : { {1 : 9} , {2 : 4} , {3 : 21} }}

出于多种原因,我不想通过将活动属性放入人员集合中来对活动属性进行非规范化。

首先,这是关于规划的,因此如果活动的成本发生变化,人员集合中的成本也需要发生变化。所以我必须更新所有人员记录。

其次,我可能希望在某个时候向活动集合添加一些其他属性,并且希望避免在这样做时必须将它们添加到人员集合中每个记录中的每个活动中。

但是,现在我想做一个 MapReduce 来找出所有人总共计划了多少活动,并按位置分组。

这意味着在对人员集合进行 MapReduce 期间,我需要知道他们计划的活动的位置。有人能想出一个好方法来做到这一点吗?

我目前最好的办法(这非常垃圾)是创建一个存储的 JavaScript 函数,该函数接受 Activity_ids 数组,查询 Activity 集合,并将 Activity_id 的映射返回到位置。然后我将其粘贴到 map 函数中并从中查找位置。正如我所说,这将是非常垃圾的,因为对 activities 集合的相同查询将为 people 集合中的每个项目运行一次。

Say I have a collection of 'activities', each of which has a name, cost and location:

{_id : 1 , name: 'swimming', cost: '3.40', location: 'kirkstall'}
{_id : 2 , name: 'cinema', cost: '6.50', location: 'hyde park'}
{_id : 3 , name: 'gig', cost: '10.00', location: 'hyde park'}

I also have a people collection which records, for each activity, how many times they plan to do each in a year:

{_id : 1 , name: 'russell', activities : { {1 : 9} , {2 : 4} , {3 : 21} }}

I don't want to denormalise the activities' attributes by putting them in the person collection for a number of reasons.

First of all, this is about planning, so if the cost of an activity changes, it would need to change in the person collection too. So I'd have to update all person records.

Secondly, I will probably want to add some other attributes to the activity collection at some point, and want to avoid having to add them to every activity in every record in the person collection when I do.

However, now I want to do a MapReduce to find out how many activities are planned in total by all people, grouped by location.

This means that during a MapReduce on the person collection I need to know the location of the activities they have planned. Can anyone think of a nice way to do this?

My best shot at the moment (which is pretty rubbish) is creating a stored javascript function that accepts an array of activity_ids, queries the activity collection, and returns a map of activity_id to location. I'd then stick this in the map function and lookup locations from it. This would be pretty rubbish though as I've said as the same query on the activities collection would be run once for every item in the people collection.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

时光与爱终年不遇 2024-12-10 06:39:45

我通过将 MapReduce 包装在一些存储的 javascript 中来做到这一点。

function (query) {

  var one = db.people.findOne(query);
  var activity_ids = [];
  for (var k in one.activities){
    activity_ids.push(parseInt(k));
  }

  var activity_location_map = {};
  db.activities.find({id : {$in : activity_ids}}).forEach(function(a){
    activity_location_map[a.id] = a.location;
  });


  return db.people.mapReduce(
    function map(){
      for (var k in this.activities){
        emit({location : activity_location_map[k]} , { total: this.activities[k] });
        emit({location: activity_location_map[k]} , { total: this.activities[k] });
      }
    },
    function reduce(key, values){
      var reduced = {total: 0};
      values.forEach(function(value){
        reduced.total += value.total;
      });

      return reduced;
    },
    {out : {inline: true}, scope : { activity_location_map : activity_location_map }}
  ).results;
}

烦人,而且混乱,但它有效,而且我想不出更好的了。

I did this by wrapping the MapReduce in some stored javascript.

function (query) {

  var one = db.people.findOne(query);
  var activity_ids = [];
  for (var k in one.activities){
    activity_ids.push(parseInt(k));
  }

  var activity_location_map = {};
  db.activities.find({id : {$in : activity_ids}}).forEach(function(a){
    activity_location_map[a.id] = a.location;
  });


  return db.people.mapReduce(
    function map(){
      for (var k in this.activities){
        emit({location : activity_location_map[k]} , { total: this.activities[k] });
        emit({location: activity_location_map[k]} , { total: this.activities[k] });
      }
    },
    function reduce(key, values){
      var reduced = {total: 0};
      values.forEach(function(value){
        reduced.total += value.total;
      });

      return reduced;
    },
    {out : {inline: true}, scope : { activity_location_map : activity_location_map }}
  ).results;
}

Annoying, and messy, but it works, and I can't think of owt better.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文