Apache Solr 查询结果不一致
我是 Apache Solr 的新手,并尝试使用搜索词针对名为“normalizedContents”且类型为“text”的字段进行查询。
所有搜索词必须存在于该字段中。问题是,我得到的结果不一致。
例如,solr索引只有一个文档,其normalizedContents字段的值=“EDOUARD SERGE WILFRID EDOS0004 UNE MENTION COMPLMENTAIRE”
我在solr的Web界面中尝试了这些查询:
- normalizedContents:(edouard AND une)返回结果
- normalizedContents:(edouar* AND une) 返回结果
- normalizedContents:(EDOUAR* AND une) 不返回任何内容
- NormalizedContents:(edouar AND une) 不返回任何内容
- normalizedContents:(edouar* AND un) 返回结果(尽管没有“un”一词)
- normalizedContents:(edouar* AND uned) 返回结果(尽管没有“uned”) word)
这是 schema.xml 中的 normalizedContents 声明:
<field name="normalizedContents" type="text" indexed="true" stored="true" multiValued="false"/>
因此,通配符和 AND 运算符不遵循预期的行为。我做错了什么?
谢谢。
I'm new to Apache Solr and trying to make a query using search terms against a field called "normalizedContents" and of type "text".
All of the search terms must exist in the field. Problem is, I'm getting inconsistent results.
For example, the solr index has only one document with normalizedContents field with value = "EDOUARD SERGE WILFRID EDOS0004 UNE MENTION COMPLEMENTAIRE"
I tried these queries in solr's web interface:
- normalizedContents:(edouard AND une) returns the result
- normalizedContents:(edouar* AND une) returns the result
- normalizedContents:(EDOUAR* AND une) doesn't return anything
- normalizedContents:(edouar AND une) doesn't return anything
- normalizedContents:(edouar* AND un) returns the result (although there's no "un" word)
- normalizedContents:(edouar* AND uned) returns the result (although there's no "uned" word)
Here's the declaration of normalizedContents in schema.xml:
<field name="normalizedContents" type="text" indexed="true" stored="true" multiValued="false"/>
So, wildcards and AND operator do not follow the expected behavior. What am I doing wrong ?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
默认情况下,字段类型 text 对内容进行词干提取 (
solr.SnowballPorterFilterFactory
)。因此 'un' 和 'uned' 匹配 une。那么您可能在查询和索引分析器上都没有solr.LowerCaseFilterFactory
过滤器,因此 EDUAR* 不匹配。第四个不匹配,因为 edouard 不是 edouar。如果您想要精确匹配,则应将数据复制到另一个字段中,该字段的类型带有一组更有限的过滤器。例如,只有solr.WhitespaceTokenizerFactory
从架构中发布
部分可能有助于理解所有内容。By default the field type text does stemming on the content (
solr.SnowballPorterFilterFactory
). Thus 'un' and 'uned' match une. Then you might not have thesolr.LowerCaseFilterFactory
filter on both, query and index analyzer, therefore EDUAR* does not match. And the 4th doesnt match as edouard is not stemmed to edouar. If you want exact matches, you should copy the data in another field that has a type with a more limited set of filters. E.g. only asolr.WhitespaceTokenizerFactory
Posting the
<fieldType name="text">
section from your schema might be helpful to understand everything.