在reduce()方法中key的输入顺序是怎样的

发布于 2024-11-12 05:47:00 字数 455 浏览 3 评论 0原文

我有一个简单的用例。在我的输入文件中,我只需要计算单词总数的百分比分布。例如,word1 出现 10 次,word2 出现 5 次等,单词总数为 100,那么我只需要显示 % word1 = 10 % , % word2 = 5% 等。所以每当我遇到一个单词时,我就输入context.write(word,1) 在map() 和reduce 中我总结了各个计数。但要计算百分比,我们需要总字数。我也在计算这个。

因此,在reduce中获取word1或word2的键之前,我要获取总字数键,以计算每个单词的百分比。但在减少中,我在其他一些键之后得到了这个总单词数键。因此我无法计算百分比。

我还尝试使用 context.getConfiguration().setFloat("total count",count); 在地图配置中设置此总计数但在减少中我无法从配置中获取这个值。它只是返回 null。

有什么建议请补充。

谢谢..

I have a simple use case. In my input file I just need to calculate the percentage distribution of total number of words. For example word1 is present 10 times, word2 is present 5 times etc and the total number of words are 100 then I just need to display % word1 = 10 % , % word2 = 5% etc. So whenever I encounter a word I just put context.write(word,1) in map() and in the reduce I sum up the individual counts. But to calculate the percentage we require total number of words. I am also calculating that.

Hence before getting the keys for word1 or word2 in the reduce I am to get the total word count key for percentage calculation for each and every word. But in the reduce I am getting this total words key after some other keys. Hence I am unable to calculate the percentage.

I also tried to set this total count in the config of map using context.getConfiguration().setFloat("total count",count); But in the reduce I am not able to get this value from config. It simply returns null.

Any suggestions please add.

thank you..

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

温暖的光 2024-11-19 05:47:00

您需要首先消化您的文档,如下所示:

class WordCounter {
    Map<String, Integer> totals = new HashMap<String, Integer>();
    int wordCount;

    void digest(String document) {
        for (String word : document.split("\\w+")) {
            wordCount++;
            Integer count = totals.get(word);
            if (count == null)
                totals.put(word, 1);
            else
                totals.put(word, ++count);
        }
    }
}

然后您可以对文档进行第二遍处理,对您收集的信息执行您喜欢的操作,也许对每个单词使用类似以下方法:

String decorateWithPercent(String word) {
    return word + " (" + (totals.get(word) / wordCount) + "%)";
}

或者打印频率,例如:

void printFrequencies() {
    for (Map.Entry<String, Integer> wordCount : totals.entrySet()) {
        System.out.println(wordCount.getKey() + " " + wordCount.getValue());
    }
}

You need to first digest your document, like this:

class WordCounter {
    Map<String, Integer> totals = new HashMap<String, Integer>();
    int wordCount;

    void digest(String document) {
        for (String word : document.split("\\w+")) {
            wordCount++;
            Integer count = totals.get(word);
            if (count == null)
                totals.put(word, 1);
            else
                totals.put(word, ++count);
        }
    }
}

Then you can do a second pass over your document doing what you like with the info you've collected, perhaps using something like this method on every word:

String decorateWithPercent(String word) {
    return word + " (" + (totals.get(word) / wordCount) + "%)";
}

Or to print the frequencies, something like:

void printFrequencies() {
    for (Map.Entry<String, Integer> wordCount : totals.entrySet()) {
        System.out.println(wordCount.getKey() + " " + wordCount.getValue());
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文