Solr 索引、搜索词干
我有一个问题,我在一组员工记录上有一个索引。 全文索引基于人员的姓名和职位。
我可以毫无问题地搜索像“john”这样的名字,以及像“anthon”这样的名字的一部分并且有效。
但是,某些名称无法正确搜索,例如“anthony”不返回任何结果,但“anth”返回所有 anthony 的名称。同样,搜索“carly”不会返回任何结果,但搜索“car”却会返回。
I have an issue where I have an index on a set of staff records.
The full text index is based on the person's name and position.
I can search for a name like "john" without an issue, and part of a name like "anthon" and that works.
However, some names won't search correctly such as "anthony" returns no results, but "anth" returns all anthony's. Like wise searching for "carly" returns nothing, but "car" does.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
正如 Maurico 评论的那样,不建议对人名进行词干提取。
词干提取会导致很多意想不到的结果,至少对于人名来说是这样。
另外,检查您的 schema.xml 和应用的字段分析也会很有趣。
如果您在索引和查询时使用不同的分析,则可能会出现此问题。
来自 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers
从您提到的示例来看,您似乎在索引时在字段上有 Stemmer,但是在查询时分析时似乎不存在相同的情况。
As Maurico commented, Stemming is not recommended for Person names.
Stemming would cause a lot of unexpected results atleast for person names.
Also, it would be interesting to check your schema.xml and the field analysis applied.
This issue can occur if your are using different analysis at index and query time.
From http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers
From the example you mentioned, you seem to have Stemmer on the field at index time however the same does not seem to exist at query time analysis.