在reduce()方法中key的输入顺序是怎样的
我有一个简单的用例。在我的输入文件中,我只需要计算单词总数的百分比分布。例如,word1 出现 10 次,word2 出现 5 次等,单词总数为 100,那么我只需要显示 % word1 = 10 % , % word2 = 5% 等。所以每当我遇到一个单词时,我就输入context.write(word,1) 在map() 和reduce 中我总结了各个计数。但要计算百分比,我们需要总字数。我也在计算这个。
因此,在reduce中获取word1或word2的键之前,我要获取总字数键,以计算每个单词的百分比。但在减少中,我在其他一些键之后得到了这个总单词数键。因此我无法计算百分比。
我还尝试使用 context.getConfiguration().setFloat("total count",count); 在地图配置中设置此总计数但在减少中我无法从配置中获取这个值。它只是返回 null。
有什么建议请补充。
谢谢..
I have a simple use case. In my input file I just need to calculate the percentage distribution of total number of words. For example word1 is present 10 times, word2 is present 5 times etc and the total number of words are 100 then I just need to display % word1 = 10 % , % word2 = 5% etc. So whenever I encounter a word I just put context.write(word,1) in map() and in the reduce I sum up the individual counts. But to calculate the percentage we require total number of words. I am also calculating that.
Hence before getting the keys for word1 or word2 in the reduce I am to get the total word count key for percentage calculation for each and every word. But in the reduce I am getting this total words key after some other keys. Hence I am unable to calculate the percentage.
I also tried to set this total count in the config of map using context.getConfiguration().setFloat("total count",count); But in the reduce I am not able to get this value from config. It simply returns null.
Any suggestions please add.
thank you..
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您需要首先消化您的文档,如下所示:
然后您可以对文档进行第二遍处理,对您收集的信息执行您喜欢的操作,也许对每个单词使用类似以下方法:
或者打印频率,例如:
You need to first digest your document, like this:
Then you can do a second pass over your document doing what you like with the info you've collected, perhaps using something like this method on every word:
Or to print the frequencies, something like: