Lucene 如何计算多字段分数?
以下是 Lucene 评分方程:
Score(q,d) = coord(q,d) · queryNorm(q) · Σ ( tf(t in d) · idf(t)2 · t.getBoost() ·norm(t,d) ) )
多场计分怎么样?
分数是直接求和还是平均还是......?
Here's Lucene scoring equation:
score(q,d) = coord(q,d) · queryNorm(q) · ∑ ( tf(t in d) · idf(t)2 · t.getBoost() · norm(t,d) )
What about multifield scoring?
Does the score gets directly summed or averaged or..?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以在 相似性类。 在此等式中,参数实际上是指“字段”,但它们是指“文档”。 因此,术语频率是文档中给定字段中术语的频率。 这会自动处理多个字段的查询。
KenE上面的答案是不正确的。 (等式中没有 MAX 运算符。)字段上每个查询的分数相加就是最终分数。 对于查询(姓名:账单或性别:男性),结果是(姓名:账单)和(性别:男性)的分数之和。 通常,满足这两个标准的文档将获得更高的分数(由于总和)并出现。
You can read the details of scoring in Similarity class. In this equation, the parameters are referred in reference to Document when they actually mean Field. So, Term Frequency is the frequency of the term in given field in the document. This automatically takes care of the queries on multiple fields.
KenE's answer above is incorrect. (There is no MAX operator in the equation.) The score for each query on a field adds up to the final score. For the query (name:bill OR gender:male) the result is sum of score for (name:bill) and (gender:male). Typically, the documents which satisfy both these criteria will score higher (due to sum) and come up.
这取决于操作。 如果您执行“或”操作(姓名:账单或性别:男性),则取两者中的最大值。 如果您执行“与”操作,它将执行求和操作。
It depends on the operation. If you are doing an OR as in (name:bill OR gender:male), it takes the max of the two. If you are doing an AND, it will do a sum.
Shashikant Kore 正确地说,每个字段的分数都是相加的。 然而,这仅在
queryNorm
和coord
因素贡献之前成立,这意味着最终分数不太可能相加。每个分数乘以
queryNorm
因子,根据查询计算,因此每个(name:bill)
、(gender:male)< /code> 和
(姓名:账单或性别:男性)
。 组合查询的queryNorm
也不仅仅是两个单项查询的queryNorm
的总和。 因此,只有将每个分数除以该查询的queryNorm
因子,分数才会相加。coord
因素也可能支付一部分:默认评分器将分数乘以匹配的查询词的比例。 因此,您只能在考虑所有项都匹配(或禁用了coord
)的queryNorm
后依赖求和。您可以使用
explain
功能,可通过debugQuery=true 在 Solr 中使用代码>参数。
Shashikant Kore is correct to say that scores for each field are summed. This, however, is only true before the contribution of the
queryNorm
andcoord
factors, meaning the final scores will not likely add up.Each score is multiplied by the
queryNorm
factor, which is calculated per query and hence differs for each of(name:bill)
,(gender:male)
, and(name:bill OR gender:male)
. Nor is thequeryNorm
for the combined query merely the sum of thequeryNorm
s for the two single-term queries. So the scores only sum if you divide each score by thequeryNorm
factor for that query.The
coord
factor may also pay a part: the default scorer multiplies the score by the proportion of query terms that were matched. So you can only rely on summation after accounting forqueryNorm
where all terms match (orcoord
is disabled).You can see exactly how a score is calculated using the
explain
functionality, available in Solr through thedebugQuery=true
parameter.使用lucene的默认相似度分数,我使用了布尔查询并得到了最终公式如下:(抱歉它是在
latex
中)Using lucene's default similarity score, I have used a boolean query and got the final formula as following: (sorry it is in
latex
)