在不同长度的场地上均匀提升
我有一个可能有多个值的文本字段。
文件1: 字段 a:“XY”
文档 2: 字段 a:"X"
我希望能够执行以下操作:
a:X^5
并且让文档 1 和文档 2 获得相同的分数。 我一直在搞乱所有的字段选项,但最终我总是得到 doc 2 的分数是 doc 1 的两倍。
我尝试设置 multiValued="true",但得到相同的结果。
有什么办法可以设置我的搜索或字段定义,以便它仅根据搜索词的存在而增强,而不会受到字段的其余内容的影响。
I've got a text field that can potentially have multiple values.
doc 1:
field a:"X Y"
doc 2:
field a:"X"
I want to be able to do :
a:X^5
And have both doc 1 and 2 get an identical score.
I've been messing around with all the field options, but I always end up with doc 2 getting double the score of doc 1.
I've tried setting multiValued="true", but get the same result.
Is there someway that I can set my search or the field definition so that it will boost just based upon the existence of the search term and not be effected by the rest of the field's contents.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
通过在架构中设置
omitNorms=true
来禁用规范并重新索引 - 它应该禁用字段的长度规范化并为您提供所需的结果。有关
omitNorms
功能的更多详细信息,请参阅此。Disable norms by setting
omitNorms=true
in your schema and reindex - it should disable the length normalization for the field and give you the desired results.For more details of what
omitNorms
does, see this.doc 2
的字段a
只有一个
项,而doc 1
有两个< /代码>。
Solr DefaultSimilartiy 实施考虑了长度范数,字段中的项数,用于计算分数时的字段。
LenghtNorm 是
1.0 / Math.sqrt(numTerms)
LengthNorm 允许您使较短的文档得分更高。
您可以提供自己的相似性类的实现,该类不考虑 lengthNorm。
检查 computeNorm方法实现。
您可以使用 omitNorms=false 关闭规范。
规范允许索引时间增加和字段长度标准化。这允许您在索引时向字段添加提升,并使较短的文档得分更高。
因此,如果您使用它,您将失去上述两项。
The field
a
ofdoc 2
has onlyone
term as compared todoc 1
which hastwo
.Solr DefaultSimilartiy implementation takes into account the length norm, number of terms in the field, for the fields when calculating the score.
LenghtNorm is
1.0 / Math.sqrt(numTerms)
LengthNorm allows you to make shorter documents score higher.
You can provide your own implementation of Similarity class which doesn't take into account the lengthNorm.
Check computeNorm method implementation.
You can turn of the Norms using omitNorms=false.
Norms allow for index time boosts and field length normalization. This allows you to add boosts to fields at index time and makes shorter documents score higher.
So you would lose both of the above if you use it.