为什么Spark比Hadoop地图快100倍
为什么火花比Hadoop MapReduce快? 根据我的理解,由于内存处理是否更快,因此Hadoop也将数据加载到RAM中,然后将其加载到RAM中。每个程序首先加载到RAM,然后执行。因此,我们可以说Spark正在进行内存处理,为什么其他大数据技术不这样做。你能解释一下我吗?
Why spark is faster than Hadoop MapReduce?.
As per my understanding if spark is faster due to in-memory processing then Hadoop is also load data into RAM then it process. Every program first load into RAM then it execute. So how we can say spark is doing in-memory processing and why not other big data technology not doing the same. Could you please explain me?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
从MapReduce中学到的所有课程创造了Spark。它不是第2代,它是使用类似概念重新设计的,但确实在地图中学习缺失/所做的事情。
MapReduce分区数据,读取数据,执行映射,写入磁盘,发送给reducer,将其写入磁盘,然后读取它,然后将其降低,然后将其写入磁盘。很多写作和阅读。如果您想执行另一个操作,您将重新开始整个周期。
Spark,试图将其保存在内存中,而它执行多个地图/操作,但它仍然确实会传输数据,但仅在必须并使用智能逻辑来弄清楚它如何优化您要求它的内容时才进行传输。在记忆中是有用的,但不是唯一的事情。
Spark was created out of all the lessons learned from MapReduce. It's not a generation 2, it's redesigned using similar concepts but really learning what was missing/done poorly in map reduce.
MapReduce partitions data, it reads data, does a map, writes to disk, sends to reducer, which writes it to disk, then reads it, then reduces it, then writes to disk. Lots of writing and reading. If you want to do another operation you start the whole cycle again.
Spark, tries to keep it in memory, while it does multiple maps/operations, it still does transfer data but only when it has to and uses smart logic to figure out how it can optimize what you are asking it to do. In memory is helpful, but not the only thing it does.