为什么此 Lucene 查询是“包含”? 而不是“startsWith”?
string q = "m";
Query query = new QueryParser("company", new StandardAnalyzer()).Parse(q+"*");
会导致查询成为 prefixQuery :company:a*
我仍然会得到像“Fleet Africa”这样的结果,很明显 A 不在开头,因此给了我不想要的结果。
Query query = new TermQuery(new Term("company", q+"*"));
将导致查询为 termQuery :company:a* 并且不返回任何结果。 可能是因为它将查询解释为完全匹配,并且我的值都不是“a*”文字。
Query query = new WildcardQuery(new Term("company", q+"*"));
将返回与 prefixquery 相同的结果;
我究竟做错了什么?
string q = "m";
Query query = new QueryParser("company", new StandardAnalyzer()).Parse(q+"*");
will result in query being a prefixQuery :company:a*
Still I will get results like "Fleet Africa" where it is rather obvious that the A is not at the start and thus gives me undesired results.
Query query = new TermQuery(new Term("company", q+"*"));
will result in query being a termQuery :company:a* and not returning any results. Probably because it interprets the query as an exact match and none of my values are the "a*" literal.
Query query = new WildcardQuery(new Term("company", q+"*"));
will return the same results as the prefixquery;
What am I doing wrong?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
StandardAnalyzer 会将“Fleet Africa”标记为“fleet”和“africa”。 您的 a* 搜索将匹配后面的术语。
如果您想将“Fleet Africa”视为一个术语,请使用不会在空格上分解字符串的分析器。 KeywordAnalyzer 是一个示例,但您可能仍然希望将数据小写,以便查询不区分大小写。
StandardAnalyzer will tokenize "Fleet Africa" into "fleet" and "africa". Your a* search will match the later term.
If you want to consider "Fleet Africa" as one single term, use an analyzer that does not break up your string on whitespaces. KeywordAnalyzer is an example, but you may still want to lowercase your data so queries are case insensitive.
简短的回答:您的所有查询都不会将搜索限制在字段的开头。
您需要一个 EdgeNGramTokenFilter或类似的东西。
有关 Lucene 中自动完成功能的实现,请参阅此问题。
The short answer: all your queries do not constrain the search to the start of the field.
You need an EdgeNGramTokenFilter or something like it.
See this question for an implementation of autocomplete in Lucene.
另一个解决方案可能是使用 StringField 来存储例如“Fleet Africa”的数据
然后使用 WildCardQuery。现在 f* 或 F* 会给出结果,但 A* 或 a* 不会。
StringField 已索引但未标记化。
Another solution could be to use StringField to store the data for ex: "Fleet Africa"
Then use a WildCardQuery.. Now f* or F* would give results but A* or a* won't.
StringField is indexed but not tokenized.