您之前是否使用elasticsearch 对 nutch 爬行结果建立了索引?
有没有人有幸为 nutch 编写自定义索引器来使用 elasticsearch 索引爬行结果?或者您知道已经存在的吗?
Has anyone had any luck writing custom indexers for nutch to index the crawl results with elasticsearch? Or do you know of any that already exist?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我编写了一个模拟 Solr api 的 ElasticSearch 插件。使用此插件和标准 Nutch Solr 索引器,您可以轻松地将爬网数据发送到 ElasticSearch 中。插件以及如何将其与 Nutch 一起使用的示例可以在 GitHub 上找到:
https://github.com/mattweber/elasticsearch-mocksolrplugin
I wrote an ElasticSearch plugin that mocks the Solr api. Using this plugin and the standard Nutch Solr indexer you can easily send crawled data into ElasticSearch. Plugin and an example of how to use it with Nutch can be found on GitHub:
https://github.com/mattweber/elasticsearch-mocksolrplugin
我知道 Nutch 将添加可插拔后端,并且很高兴看到它。我需要将 elasticsearch 与 Nutch 1.3 集成。代码发布在这里。搭载 (src/java/org/apache/nutch/indexer/solr) 代码。
https://github.com/ctjmorgan/nutch-elasticsearch-indexer
I know that Nutch will be adding pluggable backends and glad to see it. I had a need to integrate elasticsearch with Nutch 1.3. Code is posted here. Piggybacked off the (src/java/org/apache/nutch/indexer/solr) code.
https://github.com/ctjmorgan/nutch-elasticsearch-indexer
还没有这样做,但这绝对是可行的,但需要搭载 SOLR 代码(src/java/org/apache/nutch/indexer/solr)并将其适应 ElasticSearch。顺便说一句,这对 Nutch 来说是一个很好的贡献
Haven't done it but this is definitely doable but would require to piggyback the SOLR code (src/java/org/apache/nutch/indexer/solr) and adapt it to ElasticSearch. Would be a nice contrib to Nutch BTW
时光飞逝,现在Nucth已经与ElasticSearch很好地集成了。 这里是一个很好的教程。
Time goes by and now Nucth is already integrated well with ElasticSearch. Here is a nice tutorial.