为什么地图中的第一条输出线减少了java中的null

发布于 2025-02-05 13:46:47 字数 2227 浏览 4 评论 0原文

我不明白为什么我地图的第一个输出减少作业为0和null 输出为:URL;访问次数

,这里是映射类别:

public class WordCountMapper extends
        Mapper<LongWritable, Text, Text, IntWritable> 
{
    public void map(LongWritable cle, Text valeur, Context sortie)
            throws IOException          
    {
        
        String url="";
        int nbVisites=0;
        Pattern httplogPattern = Pattern.compile("([^\\s]+) - - \\[(.+)\\] \"([^\\s]+) (/[^\\s]*) HTTP/[^\\s]+\" [^\\s]+ ([0-9]+)");
        String ligne = valeur.toString();

        if (ligne.length()>0) {
            Matcher matcher = httplogPattern.matcher(ligne);
            if (matcher.matches()) {
                url = matcher.group(1);
                nbVisites = Integer.parseInt(matcher.group(5));
            }           
        }
        
        Text urlText = new Text(url);
        IntWritable value = new IntWritable(nbVisites);
        try 
        {           
            sortie.write(urlText, value);   
            System.out.println(urlText + " ; " + value);
        } 
        catch (InterruptedException e) 
        {
            e.printStackTrace();
        }
    }

和还原器:

public class WordCountReducer extends
        Reducer<Text, IntWritable, Text, IntWritable> 
{
    public void reduce(Text key, Iterable<IntWritable> values, Context sortie) throws IOException, InterruptedException 
    {
        

        Iterator<IntWritable> it = values.iterator();
        int nb=0;
        while (it.hasNext()) {
            nb = nb + it.next().get();
        }

        try {
            sortie.write(key,  new IntWritable(nb));
            System.out.println(key.toString() + ";" + nb);
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

输入文件的每行看起来像这样:

199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245

这是输出:

    0
04-dynamic-c.rotterdam.luna.net 4
06-dynamic-c.rotterdam.luna.net 1
10.salc.wsu.edu 3
11.ts2.mnet.medstroms.se    1
128.100.183.222 4
128.102.149.149 4

您可以看到第一行是几个无效的值,

谢谢

I don't understand why the first output of my map reduce job is 0 and null
The output is : url ; number of visits

and here is the mapper class :

public class WordCountMapper extends
        Mapper<LongWritable, Text, Text, IntWritable> 
{
    public void map(LongWritable cle, Text valeur, Context sortie)
            throws IOException          
    {
        
        String url="";
        int nbVisites=0;
        Pattern httplogPattern = Pattern.compile("([^\\s]+) - - \\[(.+)\\] \"([^\\s]+) (/[^\\s]*) HTTP/[^\\s]+\" [^\\s]+ ([0-9]+)");
        String ligne = valeur.toString();

        if (ligne.length()>0) {
            Matcher matcher = httplogPattern.matcher(ligne);
            if (matcher.matches()) {
                url = matcher.group(1);
                nbVisites = Integer.parseInt(matcher.group(5));
            }           
        }
        
        Text urlText = new Text(url);
        IntWritable value = new IntWritable(nbVisites);
        try 
        {           
            sortie.write(urlText, value);   
            System.out.println(urlText + " ; " + value);
        } 
        catch (InterruptedException e) 
        {
            e.printStackTrace();
        }
    }

and reducer :

public class WordCountReducer extends
        Reducer<Text, IntWritable, Text, IntWritable> 
{
    public void reduce(Text key, Iterable<IntWritable> values, Context sortie) throws IOException, InterruptedException 
    {
        

        Iterator<IntWritable> it = values.iterator();
        int nb=0;
        while (it.hasNext()) {
            nb = nb + it.next().get();
        }

        try {
            sortie.write(key,  new IntWritable(nb));
            System.out.println(key.toString() + ";" + nb);
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

Each line of the input file looks like this :

199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245

and here is the output :

    0
04-dynamic-c.rotterdam.luna.net 4
06-dynamic-c.rotterdam.luna.net 1
10.salc.wsu.edu 3
11.ts2.mnet.medstroms.se    1
128.100.183.222 4
128.102.149.149 4

As you can see first line is a couple of null values

Thank you

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦毁影碎の 2025-02-12 13:46:47

您会得到键(而不是null),因为您的默认映射器文本是一个空字符串。然后,还原器计算为0 ...

如果您在编写输出之前检查线路是否确实匹配,则可以使用

此代码的重构版本

public class WebLogDriver extends Configured implements Tool {

  public static final String APP_NAME = WebLogDriver.class.getSimpleName();

  public static void main(String[] args) throws Exception {
    final int status = ToolRunner.run(new Configuration(), new WebLogDriver(), args);
    System.exit(status);
  }

  @Override
  public int run(String[] args) throws Exception {
    Configuration conf = getConf();
    Job job = Job.getInstance(conf, APP_NAME);
    job.setJarByClass(WebLogDriver.class);

    // outputs for mapper and reducer
    job.setOutputKeyClass(Text.class);

    // setup mapper
    job.setMapperClass(WebLogDriver.WebLogMapper.class);
    job.setMapOutputValueClass(IntWritable.class);

    // setup reducer
    job.setReducerClass(WebLogDriver.WebLogReducer.class);
    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    final Path outputDir = new Path(args[1]);
    FileOutputFormat.setOutputPath(job, outputDir);

    return job.waitForCompletion(true) ? 0 : 1;
  }

  static class WebLogMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    static final Pattern HTTP_LOG_PATTERN = Pattern.compile("(\\S+) - - \\[(.+)] \"(\\S+) (/\\S*) HTTP/\\S+\" \\S+ (\\d+)");

    final Text keyOut = new Text();
    final IntWritable valueOut = new IntWritable();

    @Override
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {
      String line = value.toString();
      if (line.isEmpty()) return;
      Matcher matcher = HTTP_LOG_PATTERN.matcher(line);

      if (matcher.matches()) {
        keyOut.set(matcher.group(1));
        try {
          valueOut.set(Integer.parseInt(matcher.group(5)));

          context.write(keyOut, valueOut);
        } catch (NumberFormatException e) {
          e.printStackTrace();
        }
      }
    }
  }

  static class WebLogReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

    static final IntWritable valueOut = new IntWritable();

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
      int nb = StreamSupport.stream(values.spliterator(), true)
          .mapToInt(IntWritable::get)
          .sum();
      valueOut.set(nb);
      context.write(key, valueOut);
    }
  }
}

You get an empty key (not null) because your default mapper Text is an empty string. Then the reducer counts that as 0...

It works fine if you check that your lines actually match before writing the output

Here's a refactored version of your code

public class WebLogDriver extends Configured implements Tool {

  public static final String APP_NAME = WebLogDriver.class.getSimpleName();

  public static void main(String[] args) throws Exception {
    final int status = ToolRunner.run(new Configuration(), new WebLogDriver(), args);
    System.exit(status);
  }

  @Override
  public int run(String[] args) throws Exception {
    Configuration conf = getConf();
    Job job = Job.getInstance(conf, APP_NAME);
    job.setJarByClass(WebLogDriver.class);

    // outputs for mapper and reducer
    job.setOutputKeyClass(Text.class);

    // setup mapper
    job.setMapperClass(WebLogDriver.WebLogMapper.class);
    job.setMapOutputValueClass(IntWritable.class);

    // setup reducer
    job.setReducerClass(WebLogDriver.WebLogReducer.class);
    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    final Path outputDir = new Path(args[1]);
    FileOutputFormat.setOutputPath(job, outputDir);

    return job.waitForCompletion(true) ? 0 : 1;
  }

  static class WebLogMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    static final Pattern HTTP_LOG_PATTERN = Pattern.compile("(\\S+) - - \\[(.+)] \"(\\S+) (/\\S*) HTTP/\\S+\" \\S+ (\\d+)");

    final Text keyOut = new Text();
    final IntWritable valueOut = new IntWritable();

    @Override
    protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {
      String line = value.toString();
      if (line.isEmpty()) return;
      Matcher matcher = HTTP_LOG_PATTERN.matcher(line);

      if (matcher.matches()) {
        keyOut.set(matcher.group(1));
        try {
          valueOut.set(Integer.parseInt(matcher.group(5)));

          context.write(keyOut, valueOut);
        } catch (NumberFormatException e) {
          e.printStackTrace();
        }
      }
    }
  }

  static class WebLogReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

    static final IntWritable valueOut = new IntWritable();

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
      int nb = StreamSupport.stream(values.spliterator(), true)
          .mapToInt(IntWritable::get)
          .sum();
      valueOut.set(nb);
      context.write(key, valueOut);
    }
  }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文