Nutch “http.agent.name”中未列出代理;

发布于 2024-11-18 16:03:25 字数 1189 浏览 3 评论 0原文

Exception in thread "main" java.lang.IllegalArgumentException: Fetcher: No agents listed in 'http.agent.name' property.
        at org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java:1166)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1068)
        at org.apache.nutch.crawl.Crawl.run(Crawl.java:135)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:54)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

每次我运行 ./nutchcrawl urls -dircrawl -depth 3 -topN 5 。 nutch 决定抛出这个错误。我有我的 nutch-site.xml & nutch-default.xml 设置。

 <property>
  <name>http.agent.name</name>
  <value>blah</value>
  </property>

删除了描述以使其更易于阅读。但我看不出还有什么地方可以指定代理名称。如果有人有任何建议,我将不胜感激。

Exception in thread "main" java.lang.IllegalArgumentException: Fetcher: No agents listed in 'http.agent.name' property.
        at org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java:1166)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1068)
        at org.apache.nutch.crawl.Crawl.run(Crawl.java:135)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:54)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Every time i run ./nutch crawl urls -dir crawl -depth 3 -topN 5 . nutch decides to throw this error. I have both my nutch-site.xml & nutch-default.xml set with.

 <property>
  <name>http.agent.name</name>
  <value>blah</value>
  </property>

Took the description out to make its easier to read. But I fail to see where else the agent name can be specified. if anybody has any advice I would be grateful.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

暖树树初阳… 2024-11-25 16:03:25

使用1.3?如果是这样,请确保您在runtime/local/conf中更改了nutch-site.xml(而不是默认值)
更改 NUTCH_HOME/conf 中的配置不会被复制到运行时目录,除非您使用 ant 重建。

using 1.3? If so make sure you changed nutch-site.xml (and not default) in runtime/local/conf
Changing the conf in NUTCH_HOME/conf won't be copied to the runtime dirs unless you rebuild with ant.

挽手叙旧 2024-11-25 16:03:25

也尝试为 http.robots.agents 提供代理名称。这对我有用。后来我就没收到这个消息了!!!

Try giving the agent name for http.robots.agents also. It worked for me. I didn't get that message thereafter!!!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文