Apache Solr 查询结果不一致

发布于 2024-11-17 03:34:17 字数 833 浏览 0 评论 0原文

我是 Apache Solr 的新手，并尝试使用搜索词针对名为“normalizedContents”且类型为“text”的字段进行查询。

所有搜索词必须存在于该字段中。问题是，我得到的结果不一致。

例如，solr索引只有一个文档，其normalizedContents字段的值=“EDOUARD SERGE WILFRID EDOS0004 UNE MENTION COMPLMENTAIRE”

我在solr的Web界面中尝试了这些查询：

normalizedContents:(edouard AND une)返回结果
normalizedContents:(edouar* AND une) 返回结果
normalizedContents:(EDOUAR* AND une) 不返回任何内容
NormalizedContents:(edouar AND une) 不返回任何内容
normalizedContents:(edouar* AND un) 返回结果（尽管没有“un”一词）
normalizedContents:(edouar* AND uned) 返回结果（尽管没有“uned”） word)

这是 schema.xml 中的 normalizedContents 声明：

<field name="normalizedContents" type="text" indexed="true" stored="true" multiValued="false"/>

因此，通配符和 AND 运算符不遵循预期的行为。我做错了什么？

谢谢。

原文

I'm new to Apache Solr and trying to make a query using search terms against a field called "normalizedContents" and of type "text".

All of the search terms must exist in the field. Problem is, I'm getting inconsistent results.

For example, the solr index has only one document with normalizedContents field with value = "EDOUARD SERGE WILFRID EDOS0004 UNE MENTION COMPLEMENTAIRE"

I tried these queries in solr's web interface:

normalizedContents:(edouard AND une) returns the result
normalizedContents:(edouar* AND une) returns the result
normalizedContents:(EDOUAR* AND une) doesn't return anything
normalizedContents:(edouar AND une) doesn't return anything
normalizedContents:(edouar* AND un) returns the result (although there's no "un" word)
normalizedContents:(edouar* AND uned) returns the result (although there's no "uned" word)

Here's the declaration of normalizedContents in schema.xml:

<field name="normalizedContents" type="text" indexed="true" stored="true" multiValued="false"/>

So, wildcards and AND operator do not follow the expected behavior. What am I doing wrong ?

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

表情可笑 2024-11-24 03:34:17

默认情况下，字段类型 text 对内容进行词干提取 (solr.SnowballPorterFilterFactory)。因此 'un' 和 'uned' 匹配 une。那么您可能在查询和索引分析器上都没有 solr.LowerCaseFilterFactory 过滤器，因此 EDUAR* 不匹配。第四个不匹配，因为 edouard 不是 edouar。如果您想要精确匹配，则应将数据复制到另一个字段中，该字段的类型带有一组更有限的过滤器。例如，只有 solr.WhitespaceTokenizerFactory

从架构中发布部分可能有助于理解所有内容。

回复收藏 0 原文

~没有更多了~