Lucene 评分:TermQuery w &不带术语向量
当术语向量/位置/偏移量打开时,TermQuery:ExtractTerms 是否会导致更高的计数? (假设匹配出现超过 1 次)。相反,在关闭倒排文件信息的情况下,ExtractTerms 是否始终返回 1 并且仅返回 1 个术语?
编辑:在模式中打开术语向量如何以及在何处影响评分?
Does TermQuery:ExtractTerms result in a higher count when termvectors/positions/offsets are turned on? (assuming that there is more than 1 occurence of a match). Conversely, with the inverted file info turned off, does ExtractTerms always return 1 and only 1 term?
EDIT: How and where does turning on termvectors in the schema affect scoring?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
TermQuery.ExtractTerms
提取查询中的术语,而不是结果。因此,无论索引中包含什么内容,搜索“foo:bar”都将始终只返回一个术语。在我看来,您想了解 突出显示,而不是
Query.ExtractTerms
。编辑:根据您的评论,听起来您好像在问:“术语向量如何影响评分?”答案是:一点也不。术语频率、范数等是在索引时计算的,因此存储什么并不重要。
主要的例外是带有 slop 的
PhraseQuery
,它使用术语“位置”。一个小例外是自定义评分类可以使用他们想要的任何数据,因此不仅术语向量而且有效负载等都可能影响分数。如果您只是执行
TermQuery
,那么您存储的内容应该没有效果。TermQuery.ExtractTerms
extracts the terms in the query, not the result. So a search for "foo:bar" will always return exactly one term, regardless of what's in the index.It sounds to me like you want to know about highlighting, not
Query.ExtractTerms
.EDIT: Based on your comment, it sounds like you are asking: "how is scoring affected by term vectors?" The answer to that is: not at all. The term frequency, norm, etc. is calculated at index time, so it doesn't matter what you store.
The major exception is
PhraseQuery
with slop, which uses the term positions. A minor exception is that custom scoring classes can use whatever data they want, so not only term vectors but also payloads etc. can potentially affect the score.If you're just doing
TermQuery
s though, what you store should have no effect.