过滤器对 solr 中搜索结果的影响
当我在 solr 中查询“优雅”时,我也得到“优雅”的结果。
我使用这些过滤器进行索引分析
WhitespaceTokenizerFactory
StopFilterFactory
WordDelimiterFilterFactory
LowerCaseFilterFactory
SynonymFilterFactory
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
ReversedWildcardFilterFactory
和查询分析:
WhitespaceTokenizerFactory
SynonymFilterFactory
StopFilterFactory
WordDelimiterFilterFactory
LowerCaseFilterFactory
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
我想知道哪个过滤器影响我的搜索结果。
when i query for "elegant" in solr i get results for "elegance" too.
I used these filters for index analyze
WhitespaceTokenizerFactory
StopFilterFactory
WordDelimiterFilterFactory
LowerCaseFilterFactory
SynonymFilterFactory
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
ReversedWildcardFilterFactory
and for query analyze:
WhitespaceTokenizerFactory
SynonymFilterFactory
StopFilterFactory
WordDelimiterFilterFactory
LowerCaseFilterFactory
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
I want to know which filter affecting my search result.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
EnglishPorterFilterFactory
这就是简短的答案;)
更多信息:
English Porter 的意思是英语 porter 词干分析器词干算法。根据词干分析器(启发式词根构建器),优雅和优雅都具有相同的词干。
您可以在线验证这一点,例如此处。基本上你会看到“eleg ant”和“eleg ance”源于同一个词干> 腿。
来自 Solr 来源:
这里正是 prowords 文件发挥作用:
这就是影响词干的部分。在那里你可以看到雪球库的调用
EnglishPorterFilterFactory
Thats the short answer ;)
A little more information:
English Porter means the english porter stemmer stemming alogrithm. And both elegant and elegance have according to the stemmer (which is a heuristical word root builder) the same stem.
You can verify this online e.g. Here. Basically you will see "eleg ant " and "eleg ance" stemmed to the same stem > eleg.
From Solr source:
Here exactly comes the protwords file into play:
Thats the part which affects the stemming. There you see the invocation of the snowball library