当前位置：文江博客话题详情

设置 Nutch 1.3 和 Solr 3.1

发布于 2024-12-11 19:29:14 字数 1289 浏览 2 评论 0原文

我正在尝试让 nutch 1.3 和 solr 3.1 一起工作。

注意：我使用的是 Windows 并安装了 Cygwin。

我已经安装了 nutch 并进行了基本的爬网（从运行时/本地运行）

bin/nutch 抓取 url -dir 抓取 -深度 3

这似乎基于 teh 日志（crawl.log） ... LinkDb: 完成于 2011-10-24 14:22:47, 经过: 00:00:02 爬网完成：爬网

我已经安装了 solr 并使用 localhost:8983/solr/admin 验证安装

我将 nutch schema.xml 文件复制到 example\solr\conf 文件夹

当我运行以下命令时

bin/nutch solrindex http://localhost:8983/solr 爬行/crawldb 爬行/linkdb 爬行/分段/*

I收到以下错误（hadoop.log）

2011-10-24 15:39:26,467 WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: ERROR:unknown field 'content'

ERROR:unknown field 'content'
request: http://localhost:8983/solr/update?wt=javabin&version=2
...
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2011-10-24 15:39:26,676 ERROR solr.SolrIndexer - java.io.IOException: Job failed!

我缺少什么？

原文

I am trying to get nutch 1.3 and solr 3.1 working together.

Note: I am using Windows and have Cygwin installed.

I have nutch installed and did a basic crawl (running from runtime/local)

bin/nutch crawl urls -dir crawl -depth 3

This seems to have worked based on teh logs (crawl.log)
...
LinkDb: finished at 2011-10-24 14:22:47, elapsed: 00:00:02
crawl finished: crawl

I have solr installed and verified install with localhost:8983/solr/admin

I copied the nutch schema.xml file to the example\solr\conf folder

When I run the following command

bin/nutch solrindex http://localhost:8983/solr crawl/crawldb crawl/linkdb crawl/segments/*

I get the following error (hadoop.log)

2011-10-24 15:39:26,467 WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: ERROR:unknown field 'content'

ERROR:unknown field 'content'
request: http://localhost:8983/solr/update?wt=javabin&version=2
...
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2011-10-24 15:39:26,676 ERROR solr.SolrIndexer - java.io.IOException: Job failed!

What am I missing?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

恍梦境° 2024-12-18 19:29:14

schema.xml 中似乎缺少内容字段定义。

例如

<field name="content" type="text" stored="false" indexed="true"/>

示例 schema.xml @ http: //svn.apache.org/viewvc/nutch/branches/branch-1.3/conf/schema.xml?view=markup 似乎有它。您可能需要检查复制的 schema.xml。

Seems the content field definition is missing in the schema.xml.

e.g.

<field name="content" type="text" stored="false" indexed="true"/>

The example schema.xml @ http://svn.apache.org/viewvc/nutch/branches/branch-1.3/conf/schema.xml?view=markup seems to have it. You may want to check the schema.xml you copied over.

回复收藏 0 原文

~没有更多了~

关于作者

友欢

暂无简介

文章

26 人气

关注发私信

十二

文章 0 评论 0

关注

飞烟轻若梦

文章 0 评论 0

关注

OPleyuhuo

文章 0 评论 0

关注

wxb0109

文章 0 评论 0

关注

旧城空念

文章 0 评论 0

关注

-小熊_

文章 0 评论 0

友情链接

文江博客

设置 Nutch 1.3 和 Solr 3.1

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

设置 Nutch 1.3 和 Solr 3.1

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。