Mahout LDA 给出 FileNotFound 异常
我按照这里所述创建了我的术语向量:
~/Scripts/Mahout/trunk/bin/mahout seqdirectory --input /home/ben/Scripts/eipi/files --output /home/ben/Scripts/eipi/mahout_out -chunk 1
~/Scripts/Mahout/trunk/bin/mahout seq2sparse -i /home/ben/Scripts/eipi/mahout_out -o /home/ben/Scripts/eipi/termvecs -wt tf -seq
然后我运行
~/Scripts/Mahout/trunk/bin/mahout lda -i /home/ben/Scripts/eipi/termvecs -o /home/ben/Scripts/eipi/lda_working -k 2 -v 100
,我得到:
MAHOUT-JOB:/home/ben/Scripts/Mahout/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar 11/09/04 16:28:59 信息 common.AbstractJob:命令行参数:{--endPhase=2147483647,--input=/home/ben/Scripts/eipi/termvecs,--maxIter=-1,-- numTopics=2, --numWords=100, --output=/home/ben/Scripts/eipi/lda_working, --startPhase=0, --tempDir=temp, --topicSmoothing=-1.0} 2004 年 11 月 9 日 16:29:00 信息 lda.LDADriver:LDA 迭代 1 11/09/04 16:29:01 INFO input.FileInputFormat:要处理的总输入路径:4 11/09/04 16:29:01 信息mapred.JobClient:清理暂存区域文件:/tmp/hadoop-ben/mapred/staging/ben692167368/.staging/job_local_0001 线程“main”中出现异常 java.io.FileNotFoundException:文件文件:/home/ben/Scripts/eipi/termvecs/tokenized-documents/data 不存在。 在 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371) 在 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) 在 org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63) 在 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) 在 org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:902) 在 org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:919) 在org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) 在 org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838) 在 org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791) 在 java.security.AccessController.doPrivileged(本机方法) 在 javax.security.auth.Subject.doAs(Subject.java:396) 在 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) 在org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791) 在 org.apache.hadoop.mapreduce.Job.submit(Job.java:465) 在 org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:494) 在 org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:426) 在 org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:226) 在 org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:174) 在 org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 在 org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:90) 在 sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法) 在 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 在 java.lang.reflect.Method.invoke(Method.java:597) 在 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) 在 org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) 在 org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188) 在 sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法) 在 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 在 java.lang.reflect.Method.invoke(Method.java:597) 在 org.apache.hadoop.util.RunJar.main(RunJar.java:156)
是的,该文件不存在。我应该如何创建它?
I created my term vectors as stated here like this:
~/Scripts/Mahout/trunk/bin/mahout seqdirectory --input /home/ben/Scripts/eipi/files --output /home/ben/Scripts/eipi/mahout_out -chunk 1
~/Scripts/Mahout/trunk/bin/mahout seq2sparse -i /home/ben/Scripts/eipi/mahout_out -o /home/ben/Scripts/eipi/termvecs -wt tf -seq
Then I run
~/Scripts/Mahout/trunk/bin/mahout lda -i /home/ben/Scripts/eipi/termvecs -o /home/ben/Scripts/eipi/lda_working -k 2 -v 100
and I get:
MAHOUT-JOB: /home/ben/Scripts/Mahout/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar
11/09/04 16:28:59 INFO common.AbstractJob: Command line arguments: {--endPhase=2147483647, --input=/home/ben/Scripts/eipi/termvecs, --maxIter=-1, --numTopics=2, --numWords=100, --output=/home/ben/Scripts/eipi/lda_working, --startPhase=0, --tempDir=temp, --topicSmoothing=-1.0}
11/09/04 16:29:00 INFO lda.LDADriver: LDA Iteration 1
11/09/04 16:29:01 INFO input.FileInputFormat: Total input paths to process : 4
11/09/04 16:29:01 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-ben/mapred/staging/ben692167368/.staging/job_local_0001
Exception in thread "main" java.io.FileNotFoundException: File file:/home/ben/Scripts/eipi/termvecs/tokenized-documents/data does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:902)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:919)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:494)
at org.apache.mahout.clustering.lda.LDADriver.runIteration(LDADriver.java:426)
at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:226)
at org.apache.mahout.clustering.lda.LDADriver.run(LDADriver.java:174)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.mahout.clustering.lda.LDADriver.main(LDADriver.java:90)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
It's right, that file doesn't exist. How am I supposed to create it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
向量可能是空的,因为它们的创建过程中可能存在问题。检查您的向量是否在其文件夹中成功创建(文件大小不是 0 字节)。如果您的输入文件夹缺少某些文件,则可能会出现此错误。在这种情况下,这两个步骤将起作用,但不会创建有效的输出。
The vectors might be empty as there may be a problem in their creation. Check if your vectors are successfully created in their folders (have not a file size of 0 bytes). This error may occur if you are input folder is missing some files. In that case, these two steps will work although not create a valid output.