基于文档外部包含的属性的 MapReduce 聚合
假设我有一个“活动”集合,每个活动都有名称、成本和位置:
{_id : 1 , name: 'swimming', cost: '3.40', location: 'kirkstall'}
{_id : 2 , name: 'cinema', cost: '6.50', location: 'hyde park'}
{_id : 3 , name: 'gig', cost: '10.00', location: 'hyde park'}
我还有一个 people
集合,其中记录了每项活动他们计划进行的次数ayear:
{_id : 1 , name: 'russell', activities : { {1 : 9} , {2 : 4} , {3 : 21} }}
出于多种原因,我不想通过将活动属性放入人员集合中来对活动属性进行非规范化。
首先,这是关于规划的,因此如果活动的成本发生变化,人员集合中的成本也需要发生变化。所以我必须更新所有人员记录。
其次,我可能希望在某个时候向活动集合添加一些其他属性,并且希望避免在这样做时必须将它们添加到人员集合中每个记录中的每个活动中。
但是,现在我想做一个 MapReduce 来找出所有人总共计划了多少活动,并按位置分组。
这意味着在对人员集合进行 MapReduce 期间,我需要知道他们计划的活动的位置。有人能想出一个好方法来做到这一点吗?
我目前最好的办法(这非常垃圾)是创建一个存储的 JavaScript 函数,该函数接受 Activity_ids 数组,查询 Activity 集合,并将 Activity_id 的映射返回到位置。然后我将其粘贴到 map
函数中并从中查找位置。正如我所说,这将是非常垃圾的,因为对 activities
集合的相同查询将为 people
集合中的每个项目运行一次。
Say I have a collection of 'activities', each of which has a name, cost and location:
{_id : 1 , name: 'swimming', cost: '3.40', location: 'kirkstall'}
{_id : 2 , name: 'cinema', cost: '6.50', location: 'hyde park'}
{_id : 3 , name: 'gig', cost: '10.00', location: 'hyde park'}
I also have a people
collection which records, for each activity, how many times they plan to do each in a year:
{_id : 1 , name: 'russell', activities : { {1 : 9} , {2 : 4} , {3 : 21} }}
I don't want to denormalise the activities' attributes by putting them in the person collection for a number of reasons.
First of all, this is about planning, so if the cost of an activity changes, it would need to change in the person collection too. So I'd have to update all person records.
Secondly, I will probably want to add some other attributes to the activity collection at some point, and want to avoid having to add them to every activity in every record in the person collection when I do.
However, now I want to do a MapReduce to find out how many activities are planned in total by all people, grouped by location.
This means that during a MapReduce on the person collection I need to know the location of the activities they have planned. Can anyone think of a nice way to do this?
My best shot at the moment (which is pretty rubbish) is creating a stored javascript function that accepts an array of activity_ids, queries the activity collection, and returns a map of activity_id to location. I'd then stick this in the map
function and lookup locations from it. This would be pretty rubbish though as I've said as the same query on the activities
collection would be run once for every item in the people
collection.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我通过将 MapReduce 包装在一些存储的 javascript 中来做到这一点。
烦人,而且混乱,但它有效,而且我想不出更好的了。
I did this by wrapping the MapReduce in some stored javascript.
Annoying, and messy, but it works, and I can't think of owt better.