MapReduce Job执行报错 File file job.jar does not exist

发布于 2022-09-07 21:32:46 字数 3221 浏览 15 评论 0

我尝试在java中直接运行main方法去提交job到yarn中执行。但是得到如下的错误信息:

2018-08-26 10:25:37,544 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1375)) - Job job_1535213323614_0010 failed with state FAILED due to: Application application_1535213323614_0010 failed 2 times due to AM Container for appattempt_1535213323614_0010_000002 exited with  exitCode: -1000 due to: File file:/tmp/hadoop-yarn/staging/nasuf/.staging/job_1535213323614_0010/job.jar does not exist
.Failing this attempt.. Failing the application.

并且HADOOP_HOME的日志目录中没有任何此次job的日志信息。

mapper代码如下

public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
    
    @Override
    protected void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        
        String line = value.toString();
        String[] words = StringUtils.split(line, " ");
        
        for (String word: words) {
            context.write(new Text(word), new LongWritable(1));
        }
        
    }

}

reducer代码如下:

public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable>{
    
    @Override
    protected void reduce(Text key, Iterable<LongWritable> values, Context context) 
            throws IOException, InterruptedException {
        
        long count = 0;
        for (LongWritable value: values) {
            count += value.get();
        }
        
        context.write(key, new LongWritable(count));
        
    }

}

main方法如下:

public class WCRunner {
    
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = new Configuration();
        conf.set("mapreduce.job.jar", "wc.jar");
        conf.set("mapreduce.framework.name", "yarn");
        conf.set("yarn.resourcemanager.hostname", "hdcluster01");
        conf.set("yarn.nodemanager.aux-services", "mapreduce_shuffle");
        Job job = Job.getInstance(conf);
        
        // 设置整个job所用的类在哪个jar包
        job.setJarByClass(WCRunner.class);
        
        // 本job实用的mapper和reducer的类
        job.setMapperClass(WCMapper.class);
        job.setReducerClass(WCReducer.class);
        
        // 指定reducer的输出数据kv类型(若不指定下面mapper的输出类型,此处可以同时表明mapper和reducer的输出类型)
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);
        
        // 指定mapper的输出数据kv类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);
        
        // 指定原始数据存放位置
        FileInputFormat.setInputPaths(job, new Path("hdfs://hdcluster01:9000/wc/srcdata"));
        
        // 处理结果的输出数据存放路径
        FileOutputFormat.setOutputPath(job, new Path("hdfs://hdcluster01:9000/wc/output3"));
        
        // 将job提交给集群运行
        job.waitForCompletion(true);
        
    }

}

我本地执行代码的操作系统是MacOS,用户名是nasuf,远程部署的hadoop是伪分布式模式,hdfs和yarn都在一台服务器上,所属用户是parallels。
我查看了日志中提到的这个路径/tmp/hadoop-yarn/staging/nasuf/.staging/job_1535213323614_0010/job.jar确实并不存在。/tmp下并没有/hadoop-yarn目录。

请问是什么原因导致的这个问题呢?
谢谢大家

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

神也荒唐 2022-09-14 21:32:46

问题解决了。直接将core-site.xml文件拷贝到classpath下,或者添加如下配置即可:

conf.set("hadoop.tmp.dir", "/home/parallels/app/hadoop-2.4.1/data/");
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文