Nutch 爬虫未对 HTML 内容建立索引

发布于 2024-12-17 08:01:27 字数 900 浏览 4 评论 0原文

我正在尝试开发一个搜索功能，在其中输入城市名称，它会为我提供该城市的天气状况。
我已经在我的系统上设置了 Nutch-1.3 和 Solr-3.4.0。我正在爬行的网站位于这里并将索引传递给Solr现在，我想检索此上显示的信息链接，关于查询德里。

我怎样才能实现这个目标？需要写什么插件吗？

 <doc><float name="score">1.0</float><float name="boost">0.1879294</float><str name="content"/><str name="digest">d41d8cd98f00b204e9800998ecf8427e</str><str name="id">http://www.imd.gov.in/section/nhac/distforecast/delhi.htm</str><str name="segment">20111118153543</str><str name="title"/><date name="tstamp">2011-11-18T10:06:45.604Z</date><str name="url">http://www.imd.gov.in/section/nhac/distforecast/delhi.htm</str></doc>

原文

I am trying to develop a search functionality where I enter a city name and it gives me the weather conditions for that city.
I have set up Nutch-1.3 and Solr-3.4.0 on my system. The website I am crawling is here and passing the index to Solr for searching.Now, I want to retrieve the information displayed on this link, on querying for delhi.

How can I achieve this? Does it require any plugin to be written?

 <doc><float name="score">1.0</float><float name="boost">0.1879294</float><str name="content"/><str name="digest">d41d8cd98f00b204e9800998ecf8427e</str><str name="id">http://www.imd.gov.in/section/nhac/distforecast/delhi.htm</str><str name="segment">20111118153543</str><str name="title"/><date name="tstamp">2011-11-18T10:06:45.604Z</date><str name="url">http://www.imd.gov.in/section/nhac/distforecast/delhi.htm</str></doc>

分享到QQ

分享到微博