Fork/Join 和 Map/Reduce 之间的区别
Fork/Join 和 Map/Reduce 之间的主要区别是什么?
它们的分解和分布类型(数据与计算)有何不同?
What is the key difference between Fork/Join and Map/Reduce?
Do they differ in the kind of decomposition and distribution (data vs. computation)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一个关键的区别是 FJ 似乎被设计为在单个Java 虚拟机,而 MR 是明确设计用于在大型机器集群上工作。这些是非常不同的场景。
FJ 提供了以递归方式将任务划分为多个子任务的工具;更多层,现阶段“叉间”通信的可能性,更传统的编程。不会扩展到(至少在论文中)超出单台机器。非常适合利用您的八核。
MR 只做一个大的 split,映射的 split 之间根本不相互通信,然后将所有内容缩减到一起。单层,在减少之前没有内部拆分通信,并且可大规模扩展。非常适合利用您的云份额。
One key difference is that F-J seems to be designed to work on a single Java VM, while M-R is explicitly designed to work on a large cluster of machines. These are very different scenarios.
F-J offers facilities to partition a task into several subtasks, in a recursive-looking fashion; more tiers, possibility of 'inter-fork' communication at this stage, much more traditional programming. Does not extend (at least in the paper) beyond a single machine. Great for taking advantage of your eight-core.
M-R only does one big split, with the mapped splits not talking between each other at all, and then reduces everything together. A single tier, no inter-split communication until reduce, and massively scalable. Great for taking advantage of your share of the cloud.
有一篇关于这个主题的完整科学论文,Comparing Fork /加入和MapReduce。
本文比较了三种并行范例的性能、可扩展性和可编程性:fork/join、MapReduce 和混合方法。
他们发现,Java fork/join 基本上具有较低的启动延迟,并且对于小输入(<5MB)可以很好地扩展,但由于共享内存的大小限制,它无法处理较大的输入,
单节点架构。另一方面,MapReduce 具有显着的启动延迟(数十秒),但对于计算集群上更大的输入(> 100MB)可以很好地扩展。
但如果您愿意的话,还有更多内容可供阅读。
There is a whole scientific paper on the subject, Comparing Fork/Join and MapReduce.
The paper compares the performance, scalability and programmability of three parallel paradigms: fork/join, MapReduce, and a hybrid approach.
What they find is basically that Java fork/join has low startup latency and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory,
single node architectures. On the other hand, MapReduce has significant startup latency (tens of seconds), but scales well for much larger inputs (>100MB) on a compute cluster.
But there is a lot more to read there if you're up for it.