使用 Mapreduce 进行递归计算
我正在研究地图归约程序,并正在考虑设计以下形式的计算,其中 a1, b1
是与键关联的值,
a1/b1, a1+a2/b1+b2, a1+a2+a3/b1+b2+b3 ...
因此在归约器的每个阶段我都需要以前的值。 如何将其设计为映射减少,因为在每个阶段只能读取与特定键关联的值。
如果您觉得问题不清楚,您能指导我回答这个一般性问题吗?
更一般的问题:如何在 MapReduce 中使用递归来开发斐波那契数列?
编辑
你能帮助我修改我的设计
key1, V1,V2,V3
Key2, V4,V5,V6
映射器输出
Key1_X V1
Key1_Y V2
Key2_X V4
Key2_Y V5
类似的减速器输出
Key1_X {V1,.....}
Key1_Y {V2,.....}
,现在处于下一个映射器阶段。我可以创建一个这样的列表:
key1 {V1,....} {V2,....}
Key2 {V4,....} {V5,....}
我这样做的原因是执行:
Key1 {V1/V2, V1+V6/V2+V7, V1+V6+..../V2+V7+.. , .........}
可以这样做吗?因为数据集很大,所以我觉得用map-reduce会更好。
改变设计是否有助于提高效率?
I am working on map reduce program and was thinking about designing computations of the form where a1, b1
are the values associated with a key
a1/b1, a1+a2/b1+b2, a1+a2+a3/b1+b2+b3 ...
So at every stage of reducer I would require the previous values.
How would one design this as a map reduce as at every stage only the values associated with a particular key can be read.
If you feel the question is not clear, can you guide me towards this general question?
More general question: How would one develop a Fibonacci series using recursion in map reduce?
EDIT
Can you help me with my modified design
key1, V1,V2,V3
Key2, V4,V5,V6
Mapper output
Key1_X V1
Key1_Y V2
Key2_X V4
Key2_Y V5
Reducer output
Key1_X {V1,.....}
Key1_Y {V2,.....}
similarly, now in the next mapper stage. Can I create a list like this:
key1 {V1,....} {V2,....}
Key2 {V4,....} {V5,....}
My reason to do this, is to perform:
Key1 {V1/V2, V1+V6/V2+V7, V1+V6+..../V2+V7+.. , .........}
Is it possible to do this? Because the data set is very large, so I think it will be better to use map reduce.
Will changing the design help make it more efficient?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
斐波那契的主要问题(正如您在具体问题中所指出的那样)是该系列中所有项之间的依赖性。
如果不先计算前面的项,就无法计算后面的项。
MapReduce 是非常好的 IFF,你可以将你的工作分割成独立的部分。
我看不出有什么简单的方法可以做到这一点。
因此,任何“强制”MapReduce 解决此问题的构造都会破坏可扩展性优势。因此,您最喜欢的编程语言中的简单的高度优化循环将优于任何 MapReduce 算法。
The main problem with Fibonacci (and as you indicated in your specific problem too) is the dependence between all terms in the series.
You cannot calculate the later terms without calculating the earlier terms first.
MapReduce is very good IFF you can split your job into independent pieces.
I don't see an easy way to do this.
So any construct "forcing" MapReduce to solve this will break the scalability advantages. Hence a simple highly optimized loop in your favorite programming language will outperform any MapReduce algorithm.
编写您的映射器/减速器来计算这三件事:
Write your mapper/reducer to calculate these three things: