设置 Nutch 1.3 和 Hadoop
我是 Nutch 和 Hadoop 的新手,并尝试按照 http://wiki.apache.org 上的教程进行操作/nutch/NutchHadoopTutorial。
所以我开始发布 Nutch 1.3。
尽管 Hadoop 包含在 Nutch 中,但构建后我没有在 /nutch/search/conf 下看到教程中提到的任何 .sh 或 .xml 文件。
我想知道是否必须先在同一目录结构中设置 hadoop,或者在继续 Nutch 设置之前复制 hadoop 配置文件。
谁能帮我指明正确的方向。我很确定我迷路了:-(
提前非常感谢
I am a newbie to Nutch and Hadoop and trying to follow the tutorial here at http://wiki.apache.org/nutch/NutchHadoopTutorial.
So I started with Nutch 1.3 release.
Even though Hadoop is included in Nutch, I did not see any of these .sh or .xml files referred in the tutorial under /nutch/search/conf after the build.
I was wondering if I have to setup hadoop first in the same directory structure or copy over hadoop config files before proceeding to Nutch setup.
Can anyone please put me in the right direction. I am pretty sure that I am lost :-(
Thanks very much in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好吧,从 1.3 开始,hadoop 就不再包含在 Nutch 中了……我已经在邮件列表中抱怨过了。但Nutch小组的目标似乎已经变成了仅仅一个爬虫组件。要使用它,您需要安装 hadoop 这是很好的教程 & solr(用于搜索)。
有些人宣布他们将修复这个问题,但仅限于 Nutch1.4。不确定它最终会去哪里。
Well hadoop is not included anymore in Nutch since 1.3 ... I have complained in the mailing list. But the goal of Nutch group seem to have changed to a crawler component only. To make use of it you need to install hadoop here is good tutorial & solr (for search).
Some people announced they are going to fix that but for Nutch1.4 only. Not sure where it will end up.