将重量施加到KNN尺寸
在在ES/OS中进行KNN搜索时,似乎建议将KNN矢量中的数据归一化,以防止单个维度超越最终评分。
所有值均归一化为0到1之间的值,
[0.2, 0.3, 0.2]
在我当前的示例中,我有一个3维矢量,从欧几里得距离的得分的角度来看,
这似乎给所有维度提供了相等的权重。在我的特定示例中,我使用的是L2向量:
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "nmslib",
}
但是,如果我想给我的一个维度之一(例如2倍)给予更大的重量基本范围为0-1?
示例:
[0.2, 0.3, 1.2] // Third vector is now between 0-2
该术语的距离计算现在为(2 *(xi -yi))^2
,并且与其他距离相比会导致更大的差异。结果,在这个特定术语中,总分将对差异更敏感。
在OS中,分数计算为1 /(1 +距离函数)< / code>因此,从距离函数返回的值越高,分数就越低。
是否有一种决定重量范围的方法?设置范围太高可能会使维度过高?
When doing a KNN searches in ES/OS it seems to be recommended to normalize the data in the knn vectors to prevent single dimensions from over powering the the final scoring.
In my current example I have a 3 dimensional vector where all values are normalized to values between 0 and 1
[0.2, 0.3, 0.2]
From the perspective of Euclidian distance based scoring this seems to give equal weight to all dimensions.
In my particular example I am using an l2 vector:
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "nmslib",
}
However, if I want to give more weight to one of my dimensions (say by a factor of 2), would it be acceptable to single out that dimension and normalize between 0-2 instead of the base range of 0-1?
Example:
[0.2, 0.3, 1.2] // Third vector is now between 0-2
The distance computation for this term would now be (2 * (xi - yi))^2
and lead to bigger diffs compared to the rest. As a result the overall score would be more sensitive to differences in this particular term.
In OS the score is calculated as 1 / (1 + Distance Function)
so the higher the value returned from the distance function, the lower the score will be.
Is there a method to deciding what the weighting range should be? Setting the range too high would likely make the dimension too dominant?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论