在本地/远程集群上运行java hadoop作业
我正在尝试在本地/远程集群上运行 hadoop 作业。将来这项工作将通过网络应用程序执行。我正在尝试从 eclipse 执行这段代码:
public class TestHadoop {
private final static String host = "localhost";
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
run();
}
static void run() throws IOException, InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
// run on other machine/cluster
conf.set("fs.default.name", "hdfs://" + host + ":8020");
conf.set("mapred.job.tracker", "hdfs://" + host + ":8021");
Job job = new Job(conf, "Wordcount");
job.setJarByClass(TestHadoop.class);
FileInputFormat.addInputPath(job, new Path("/user/hue/jobsub/sample_data/midsummer.txt"));
FileOutputFormat.setOutputPath(job, new Path("/tmp/hadoop-out2"));
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.waitForCompletion(true);
}
static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,
InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
}
但是出现以下错误:
2011-09-30 16:32:39,000 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: java.lang.ClassNotFoundException: org.mmm.hadoop.TestHadoop$Map
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996)
at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:212)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:602)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.ClassNotFoundException: org.mmm.hadoop.TestHadoop$Map
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:994)
... 8 more
16:33:01.209 [LeaseChecker] DEBUG org.apache.hadoop.hdfs.DFSClient - LeaseChecker is interrupted.
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method) [na:1.7.0]
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1167) ~[hadoop-core-0.20.2-cdh3u1.jar:na]
at java.lang.Thread.run(Thread.java:722) [na:1.7.0]
我正在将 CDH3 与 Hue 一起使用。作业出现在作业列表中,并带有上述正在运行的子错误。
I'm trying to run hadoop job on local/remote cluster. This job in future will be executed from web application. I'm trying to execute this piece of code from eclipse:
public class TestHadoop {
private final static String host = "localhost";
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
run();
}
static void run() throws IOException, InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
// run on other machine/cluster
conf.set("fs.default.name", "hdfs://" + host + ":8020");
conf.set("mapred.job.tracker", "hdfs://" + host + ":8021");
Job job = new Job(conf, "Wordcount");
job.setJarByClass(TestHadoop.class);
FileInputFormat.addInputPath(job, new Path("/user/hue/jobsub/sample_data/midsummer.txt"));
FileOutputFormat.setOutputPath(job, new Path("/tmp/hadoop-out2"));
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.waitForCompletion(true);
}
static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,
InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
}
However I get the following errors:
2011-09-30 16:32:39,000 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: java.lang.ClassNotFoundException: org.mmm.hadoop.TestHadoop$Map
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996)
at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:212)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:602)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.ClassNotFoundException: org.mmm.hadoop.TestHadoop$Map
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:994)
... 8 more
16:33:01.209 [LeaseChecker] DEBUG org.apache.hadoop.hdfs.DFSClient - LeaseChecker is interrupted.
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method) [na:1.7.0]
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1167) ~[hadoop-core-0.20.2-cdh3u1.jar:na]
at java.lang.Thread.run(Thread.java:722) [na:1.7.0]
I'm using CDH3 with Hue. Jobs appears on job list with above running child error.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您必须将自定义映射器/减速器实现捆绑在 jar 中。
然后将查找该 jar 并将其传输到集群。
You have to bundle your custom mapper/reducer implementations in an jar.
will then lookup that jar and transfer it to the cluster.
我知道我可能已经太晚了,但也尝试将 Map 和 Reduce 声明为公共。
I know I'm probably way too late, but try declaring Map and Reduce as public, too.
mapred.job.tracker url应该是http而不是hdfs...
并使Mapper和Reducer公开..
mapred.job.tracker url should be http not hdfs...
and make Mapper and Reducer public..