hadoop mahout:org.apache.classifier.df.mapreduce.TestForest error

发布于 2021-12-02 09:05:06 字数 7902 浏览 645 评论 1

我搭了三台CentOS7虚拟机,安装配置好了hadoop-3.0.0,准备运行mahout上的随机森林算法来训练一个机器学习分类器。第一步是生成描述文件(/des.info);第二步是训练森林模型(/user/hadoop/forest);第三步是检测。

这是HDFS:

[hadoop@hadoop1 ~]$ hadoop fs -ls /

Found 5 items
-rw-r--r--   1 hadoop supergroup    8807688 2018-04-29 19:59 /des.info
-rw-r--r--   1 hadoop supergroup   79736192 2018-04-29 19:55 /features.txt
-rw-r--r--   1 hadoop supergroup        278 2018-05-01 03:50 /test.txt
drwx------   - hadoop supergroup          0 2018-04-29 20:05 /tmp
drwxr-xr-x   - hadoop supergroup          0 2018-04-29 20:05 /user

features.txt是训练用数据,test.txt是测试用数据。

前两步应该没什么问题,第三步测试数据时就会报错:

[hadoop@hadoop1 ~]$ hadoop jar /opt/mahout-distribution-0.9/mahout-examples-0.9-job.jar org.apache.mahout.classifier.df.mapreduce.TestForest -i /test.txt -ds /des.info -m /user/hadoop/forest -a -mr -o prediction
2018-05-01 03:53:21,595 INFO mapreduce.Classifier: Adding the dataset to the DistributedCache
2018-05-01 03:53:21,597 INFO mapreduce.Classifier: Adding the decision forest to the DistributedCache
2018-05-01 03:53:21,600 INFO mapreduce.Classifier: Configuring the job...
2018-05-01 03:53:21,605 INFO mapreduce.Classifier: Running the job...
2018-05-01 03:53:21,702 INFO client.RMProxy: Connecting to ResourceManager at hadoop1/192.168.80.100:8032
2018-05-01 03:53:22,073 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1525056498669_0002
2018-05-01 03:53:22,507 INFO input.FileInputFormat: Total input files to process : 1
2018-05-01 03:53:23,009 INFO mapreduce.JobSubmitter: number of splits:1
2018-05-01 03:53:23,170 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-05-01 03:53:23,353 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1525056498669_0002
2018-05-01 03:53:23,355 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-05-01 03:53:23,663 INFO conf.Configuration: resource-types.xml not found
2018-05-01 03:53:23,663 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-05-01 03:53:23,813 INFO impl.YarnClientImpl: Submitted application application_1525056498669_0002
2018-05-01 03:53:23,870 INFO mapreduce.Job: The url to track the job: http://hadoop1:8088/proxy/application_1525056498669_0002/
2018-05-01 03:53:23,871 INFO mapreduce.Job: Running job: job_1525056498669_0002
2018-05-01 03:53:31,029 INFO mapreduce.Job: Job job_1525056498669_0002 running in uber mode : false
2018-05-01 03:53:31,030 INFO mapreduce.Job:  map 0% reduce 0%
2018-05-01 03:53:34,089 INFO mapreduce.Job: Task Id : attempt_1525056498669_0002_m_000000_0, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException: 946827879
 at org.apache.mahout.classifier.df.node.Node.read(Node.java:58)
 at org.apache.mahout.classifier.df.DecisionForest.readFields(DecisionForest.java:197)
 at org.apache.mahout.classifier.df.DecisionForest.read(DecisionForest.java:203)
 at org.apache.mahout.classifier.df.DecisionForest.load(DecisionForest.java:225)
 at org.apache.mahout.classifier.df.mapreduce.Classifier$CMapper.setup(Classifier.java:209)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

2018-05-01 03:53:46,822 INFO mapreduce.Job: Task Id : attempt_1525056498669_0002_m_000000_1, Status : FAILED
2018-05-01 03:53:51,342 INFO mapreduce.Job: Task Id : attempt_1525056498669_0002_m_000000_2, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException: 946827879
 at org.apache.mahout.classifier.df.node.Node.read(Node.java:58)
 at org.apache.mahout.classifier.df.DecisionForest.readFields(DecisionForest.java:197)
 at org.apache.mahout.classifier.df.DecisionForest.read(DecisionForest.java:203)
 at org.apache.mahout.classifier.df.DecisionForest.load(DecisionForest.java:225)
 at org.apache.mahout.classifier.df.mapreduce.Classifier$CMapper.setup(Classifier.java:209)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

2018-05-01 03:54:05,463 INFO mapreduce.Job:  map 100% reduce 0%
2018-05-01 03:54:06,479 INFO mapreduce.Job: Job job_1525056498669_0002 failed with state FAILED due to: Task failed task_1525056498669_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0

2018-05-01 03:54:06,556 INFO mapreduce.Job: Counters: 12
 Job Counters
  Failed map tasks=4
  Launched map tasks=4
  Other local map tasks=3
  Data-local map tasks=1
  Total time spent by all maps in occupied slots (ms)=28056
  Total time spent by all reduces in occupied slots (ms)=0
  Total time spent by all map tasks (ms)=28056
  Total vcore-milliseconds taken by all map tasks=28056
  Total megabyte-milliseconds taken by all map tasks=28729344
 Map-Reduce Framework
  CPU time spent (ms)=0
  Physical memory (bytes) snapshot=0
  Virtual memory (bytes) snapshot=0
Exception in thread "main" java.lang.IllegalStateException: Job failed!
 at org.apache.mahout.classifier.df.mapreduce.Classifier.run(Classifier.java:127)
 at org.apache.mahout.classifier.df.mapreduce.TestForest.mapreduce(TestForest.java:188)
 at org.apache.mahout.classifier.df.mapreduce.TestForest.testForest(TestForest.java:174)
 at org.apache.mahout.classifier.df.mapreduce.TestForest.run(TestForest.java:146)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
 at org.apache.mahout.classifier.df.mapreduce.TestForest.main(TestForest.java:315)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:153)

 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

累赘 2021-12-05 17:37:03

mahout用的比较少吧. sparkmlib比较多些吧,

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文