Fork/Join 和 Map/Reduce 之间的区别

发布于 2024-08-26 13:35:17 字数 72 浏览 10 评论 0原文

Fork/Join 和 Map/Reduce 之间的主要区别是什么?

它们的分解和分布类型(数据与计算)有何不同?

What is the key difference between Fork/Join and Map/Reduce?

Do they differ in the kind of decomposition and distribution (data vs. computation)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

命比纸薄 2024-09-02 13:35:17

一个关键的区别是 FJ 似乎被设计为在单个Java 虚拟机,而 MR 是明确设计用于在大型机器集群上工作。这些是非常不同的场景。

FJ 提供了以递归方式将任务划分为多个子任务的工具;更多层,现阶段“叉间”通信的可能性,更传统的编程。不会扩展到(至少在论文中)超出单台机器。非常适合利用您的八核。

MR 只做一个大的 split,映射的 split 之间根本不相互通信,然后将所有内容缩减到一起。单层,在减少之前没有内部拆分通信,并且可大规模扩展。非常适合利用您的云份额。

One key difference is that F-J seems to be designed to work on a single Java VM, while M-R is explicitly designed to work on a large cluster of machines. These are very different scenarios.

F-J offers facilities to partition a task into several subtasks, in a recursive-looking fashion; more tiers, possibility of 'inter-fork' communication at this stage, much more traditional programming. Does not extend (at least in the paper) beyond a single machine. Great for taking advantage of your eight-core.

M-R only does one big split, with the mapped splits not talking between each other at all, and then reduces everything together. A single tier, no inter-split communication until reduce, and massively scalable. Great for taking advantage of your share of the cloud.

三人与歌 2024-09-02 13:35:17

有一篇关于这个主题的完整科学论文,Comparing Fork /加入和MapReduce

本文比较了三种并行范例的性能、可扩展性和可编程性:fork/join、MapReduce 和混合方法。

他们发现,Java fork/join 基本上具有较低的启动延迟,并且对于小输入(<5MB)可以很好地扩展,但由于共享内存的大小限制,它无法处理较大的输入,
单节点架构。另一方面,MapReduce 具有显着的启动延迟(数十秒),但对于计算集群上更大的输入(> 100MB)可以很好地扩展。

但如果您愿意的话,还有更多内容可供阅读。

There is a whole scientific paper on the subject, Comparing Fork/Join and MapReduce.

The paper compares the performance, scalability and programmability of three parallel paradigms: fork/join, MapReduce, and a hybrid approach.

What they find is basically that Java fork/join has low startup latency and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory,
single node architectures. On the other hand, MapReduce has significant startup latency (tens of seconds), but scales well for much larger inputs (>100MB) on a compute cluster.

But there is a lot more to read there if you're up for it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文