使用 nutch 爬行时出现 IOException
在用 nutch(1.4) 爬行了一天之后......最后我得到了以下异常的坏坏:
.
.
.
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1204)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1240)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1213)
.
.
。
我有 20 个新闻网站,nutch 的输入参数是:深度 3 和 topN -1 我的 linux 根目录有足够的空间和大约 4GB 的内存 我该如何解决这个问题? 谢谢。
After one day crawling with nutch(1.4) ... at last i got the bad bad below exception:
.
.
.
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1204)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1240)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1213)
.
.
.
i have 20 news site and input argument of nutch is : depth 3 and topN -1
i have enough space in root directory of my linux and about 4GB of ram
how can i solve this issue?
thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为您可能遇到这个问题: http://wiki.apache.org/nutch/NutchGotchas
那里提供的答案指出:
I think that you might have this problem: http://wiki.apache.org/nutch/NutchGotchas
The answer provided there states: