从 solr 和 nutch 生成的搜索索引中获取文本片段
我刚刚按照入门教程配置了 nutch 和 solr,以成功对网站上的文本进行爬网和索引。现在我尝试通过修改示例速度模板来制作搜索页面。
现在回答我的问题。我如何告诉 solr 提供点击内容的相关文本片段?我只获得与每个点击相关的以下字段:
分数、提升、摘要、id、段、标题、日期、tstamp 和 url。
内容确实被索引了,因为我可以搜索我只知道在全文中的单词,但我仍然没有得到与命中相关的全文。
I have just configured nutch and solr to successfully crawl and index text on a web site, by following the geting started tutorials. Now I am trying to make a search page by modifying the example velocity templates.
Now to my question. How can I tell solr to provide a relevant text snippet of the content of the hits? I only get the following fields associated with each hit:
score, boost, digest, id, segment, title, date, tstamp and url.
The content is really indexed, because I can search for words that I know only is in the fulltext, but I still don't get the fulltext back associated with the hit.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不要忘记:索引与存储不同。
如果所有字段都被索引,但没有存储任何字段,则您可以搜索文档中的单词。
要获取特定字段的内容,还必须在 schema.xml 中存储为 true
如果您的全文字段是
存储,因此默认的“字段列表设置”可能不包括全文字段。
您可以使用
fl
参数添加此内容:...此示例,如果您的全文存储在名为 mytext 的字段中
最后,如果您只想包含包含搜索词的文本片段(不是全文)查看 solr/lucene 中的突出显示组件
don't forget: indexed is not the same as stored.
You can search words in an document, if all field are indexed, but no field is stored.
To get the content of a specific field, it must be also stored=true in schema.xml
If your fulltext-field is
stored, so probably the default "field-list-settings" does not include the fulltext-field.
You can add this by using the
fl
parameter:...this example, if your fulltext is stored in the field called mytext
Finally, if you like to have only a snippet of the text with the searched words (not the whole text) look at the highlight-component from solr/lucene