当前位置：文江博客话题详情

Disco/MapReduce：在分割数据上使用 chain_reader

发布于 2024-08-28 05:25:54 字数 146 浏览 10 评论 0原文

我的算法当前使用 nr_reduces 1，因为我需要确保给定键的数据已聚合。

要将输入传递到下一次迭代，应该使用“chain_reader”。但是，映射器的结果是单个结果列表，这似乎意味着下一次映射迭代将作为单个映射器进行！有没有办法分割结果以触发多个映射器？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

安人多梦 2024-09-04 05:25:54

我可以给出一个很长的答案，但由于这个问题已经有 3 年历史了：查看此页面： http://discoproject.org/doc/disco/howto/dataflow.html#single-partition-map

简而言之：当mapper函数有N个输入时，输出将为N通过设置 merge_partitions=False 你的reduce将输出N个blob。现在，如果您想生成比输入更多的输出，您可以传递 partions=N。但是，当您的迪斯科作业仅包含映射器函数并且您想要生成分区输出时，请添加最简单的归约 fase 与上述参数相结合以获得分区输出。

@staticmethod
def reduce(iter, out, params):
    for (key, value) in iter:
        out.add(key, value)

I could give a long answer but since this question is 3 years old: check out this page: http://discoproject.org/doc/disco/howto/dataflow.html#single-partition-map

In short: When there is N input for the mapper function, the output will be N and by setting merge_partitions=False your reduce will output N blobs. Now if you want to generate more outputs than inputs you can pass partions=N. But when your disco job consists of just a mapper function and you want to generate partitioned output, then add the simplest reduce fase combined with the params stated above to get that partitioned output.

@staticmethod
def reduce(iter, out, params):
    for (key, value) in iter:
        out.add(key, value)

回复收藏 0 原文

~没有更多了~

关于作者

刘备忘录

暂无简介

0 文章

0 评论

414 人气

关注发私信

友情链接

文江博客

Disco/MapReduce：在分割数据上使用 chain_reader

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

苦中寻乐

lueluelue

嗼ふ静

王权女流氓

与花如笺

残酷

友情链接

Disco/MapReduce：在分割数据上使用 chain_reader

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

苦中寻乐

lueluelue

嗼ふ静

王权女流氓

与花如笺

残酷

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。