Solr 类似 Google 的 Fragmenter?
我正在实现一个最初使用 KinoSearch 的 Solr 应用程序。
我现在已将所有内容移至 Solr 和结果页面,但我注意到结果有所不同。具体来说,突出显示不太一样。
对于 KinoSearch,有 KinoSearch::Highlight::Highlighter 对象,它似乎会产生类似于 Google 的片段(尝试打破句子并添加省略号(...),如果在句子中间中断,则用空格分隔)。
有人对使用 Solr 实现类似功能的方法有任何建议吗?我尝试过使用正则表达式片段来中断句子,但它似乎实际上反向应用了正则表达式,并以上一个句子中的句点开始片段。
我可以在视图代码中添加省略号逻辑。我只是想知道是否有人遇到过类似的情况以及如何处理。
谢谢!
I am implementing a Solr application that had originally used KinoSearch.
I have everything now moved to Solr and a results page, but I notice a difference in the results. Specifically, the highlighting is not quite the same.
With KinoSearch, there is the KinoSearch::Highlight::Highlighter object which appears to produce fragments similar to Google (tries to break around sentences and adds elipsis (...) separated by a space if breaks mid-sentence).
Does anybody have any suggestions for a way to implement something similar with Solr. I have tried the regex fragmenter to break at sentences, but it seems to actually apply the regular expression in reverse and starts fragments with a period from the previous sentence.
I can add the elipsis logic in the view code. I'm just wondering if anybody has encountered something similar and how it has been handled.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我的问题有两个部分。关于搜索的第一个问题似乎不遵循正则表达式,并在所有内容之前加一个句点,在此解决:
http://lucene .472066.n3.nabble.com/Basic-sentence-parsing-with-the-regex-highlighter-fragmenter-td505749.html
第二期的省略号,我准备在前端代码中实现。
我将保留这个问题,因为我仍然很好奇是否存在更好的解决方案。
My question had two parts. The first issue regarding the search seeming to not follow the regular expression and put a period before everything is addressed here:
http://lucene.472066.n3.nabble.com/Basic-sentence-parsing-with-the-regex-highlighter-fragmenter-td505749.html
The second issue of the elipsis, I am going to implement in the front-end code.
I will leave this question open as I'm still curious if a better solution exists.