将重量施加到KNN尺寸

发布于 2025-01-27 07:25:03 字数 688 浏览 2 评论 0原文

在在ES/OS中进行KNN搜索时，似乎建议将KNN矢量中的数据归一化，以防止单个维度超越最终评分。

所有值均归一化为0到1之间的值，

[0.2, 0.3, 0.2]

在我当前的示例中，我有一个3维矢量，从欧几里得距离的得分的角度来看，

这似乎给所有维度提供了相等的权重。在我的特定示例中，我使用的是L2向量：

"method": {
            "name": "hnsw",
            "space_type": "l2",
            "engine": "nmslib",
          }

但是，如果我想给我的一个维度之一（例如2倍）给予更大的重量基本范围为0-1？

示例：

[0.2, 0.3, 1.2] // Third vector is now between 0-2

该术语的距离计算现在为（2 *（xi -yi））^2，并且与其他距离相比会导致更大的差异。结果，在这个特定术语中，总分将对差异更敏感。

在OS中，分数计算为1 /（1 +距离函数）< / code>因此，从距离函数返回的值越高，分数就越低。

是否有一种决定重量范围的方法？设置范围太高可能会使维度过高？

原文

When doing a KNN searches in ES/OS it seems to be recommended to normalize the data in the knn vectors to prevent single dimensions from over powering the the final scoring.

In my current example I have a 3 dimensional vector where all values are normalized to values between 0 and 1

[0.2, 0.3, 0.2]

From the perspective of Euclidian distance based scoring this seems to give equal weight to all dimensions.

In my particular example I am using an l2 vector:

"method": {
            "name": "hnsw",
            "space_type": "l2",
            "engine": "nmslib",
          }

However, if I want to give more weight to one of my dimensions (say by a factor of 2), would it be acceptable to single out that dimension and normalize between 0-2 instead of the base range of 0-1?

Example:

[0.2, 0.3, 1.2] // Third vector is now between 0-2

The distance computation for this term would now be (2 * (xi - yi))^2 and lead to bigger diffs compared to the rest. As a result the overall score would be more sensitive to differences in this particular term.

In OS the score is calculated as 1 / (1 + Distance Function) so the higher the value returned from the distance function, the lower the score will be.

Is there a method to deciding what the weighting range should be? Setting the range too high would likely make the dimension too dominant?

分享到QQ

分享到微博