处理对同一数据集的查询流时的高效 MapReduce
我有一个巨大的静态数据集,并且有一个可以应用于它的函数。
f 的形式为 reduce(map(f, dataset)),因此我将使用 MapReduce 骨架。但是,我不想在每个请求中分散数据(理想情况下我想利用索引来加速 f)。有一个 MapReduce 实现可以解决这种一般情况吗?
我查看了 IterativeMapReduce ,也许它可以完成工作,但似乎解决了稍微不同的情况,并且代码尚不可用。
I have a massive, static dataset and I've a function to apply to it.
f is in the form reduce(map(f, dataset)), so I would use the MapReduce skeleton. However, I don't want to scatter the data at each request (and ideally I want to take advantage of indexing in order to speedup f). There is a MapReduce implementation that address this general case?
I've taken a look at IterativeMapReduce and maybe it does the job, but seems to address a slightly different case, and the code isn't available yet.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Hadoop 的 MapReduce(以及所有其他受 Google 启发的 Map-Reduce 框架)并不总是分散数据。
Hadoop's MapReduce (and all the others map-reduce skeleton inspired by Google) doesn't scatter the data all the time.