当前位置：文江博客话题详情

PySolr RSS 数据导入

发布于 2024-08-19 01:51:47 字数 342 浏览 9 评论 0原文

我正在使用 PySolr 来运行我的搜索。我想索引一个 rss feed，想知道是否可以使用 PySolr 来实现，如果可以的话，你该怎么做。

我在 http://wiki.apache.org/ 上找到了有关如何在 Solr 中执行此操作的说明solr/DataImportHandler#HttpDataSource_Example

但找不到有关如何在 PySolr 中执行等效操作的任何信息，

谢谢

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

生生不灭 2024-08-26 01:51:47

您可能不需要在 PySolr 中执行相同的操作。如果您已经让 Solr 对 feed 建立了索引（如示例所示），那么您只需使用 PySolr 来查询该索引即可。类似于：

from pysolr import Solr
solr = Solr('http://localhost:8983/solr/rss/')
response = solr.search('some query string')
print response.hits
for result in response.docs:
    do_stuff_with(result)

如果您确实想从 Python 端执行此操作，那么您需要在那里获取并解析 RSS（使用其他库，例如通用 Feed 解析器); PySolr 只是包装了与 Solr 的交互，它不“处理”数据源。

您可能想查看 Haystack，它使用 PySolr（并且可以使用其他引擎）并巧妙地抽象了创建工作搜索索引条目并将它们发送到 Solr 进行索引。

You probably don't need to do the equivalent in PySolr. If you already have Solr indexing the feed, as per the example, then you just use PySolr to query that index. Something like:

from pysolr import Solr
solr = Solr('http://localhost:8983/solr/rss/')
response = solr.search('some query string')
print response.hits
for result in response.docs:
    do_stuff_with(result)

If you really want to do it from the Python side, then you'll need to fetch and parse the RSS there (using other libraries, e.g. Universal Feed Parser); PySolr just wraps the interaction with Solr, it doesn't “do” data sources.

You may want to check out Haystack, which uses PySolr (and can use other engines) and neatly abstracts the job of creating search index entries and shipping them off to Solr for indexing.

回复收藏 0 原文

~没有更多了~