将重量施加到KNN尺寸

发布于 2025-01-27 07:25:03 字数 688 浏览 2 评论 0原文

在在ES/OS中进行KNN搜索时,似乎建议将KNN矢量中的数据归一化,以防止单个维度超越最终评分。

所有值均归一化为0到1之间的值,

[0.2, 0.3, 0.2]

在我当前的示例中,我有一个3维矢量,从欧几里得距离的得分的角度来看,

这似乎给所有维度提供了相等的权重。在我的特定示例中,我使用的是L2向量:

"method": {
            "name": "hnsw",
            "space_type": "l2",
            "engine": "nmslib",
          }

但是,如果我想给我的一个维度之一(例如2倍)给予更大的重量基本范围为0-1?

示例:

[0.2, 0.3, 1.2] // Third vector is now between 0-2

该术语的距离计算现在为(2 *(xi -yi))^2,并且与其他距离相比会导致更大的差异。结果,在这个特定术语中,总分将对差异更敏感。

在OS中,分数计算为1 /(1 +距离函数)< / code>因此,从距离函数返回的值越高,分数就越低。

是否有一种决定重量范围的方法?设置范围太高可能会使维度过高?

When doing a KNN searches in ES/OS it seems to be recommended to normalize the data in the knn vectors to prevent single dimensions from over powering the the final scoring.

In my current example I have a 3 dimensional vector where all values are normalized to values between 0 and 1

[0.2, 0.3, 0.2]

From the perspective of Euclidian distance based scoring this seems to give equal weight to all dimensions.

In my particular example I am using an l2 vector:

"method": {
            "name": "hnsw",
            "space_type": "l2",
            "engine": "nmslib",
          }

However, if I want to give more weight to one of my dimensions (say by a factor of 2), would it be acceptable to single out that dimension and normalize between 0-2 instead of the base range of 0-1?

Example:

[0.2, 0.3, 1.2] // Third vector is now between 0-2

The distance computation for this term would now be (2 * (xi - yi))^2 and lead to bigger diffs compared to the rest. As a result the overall score would be more sensitive to differences in this particular term.

In OS the score is calculated as 1 / (1 + Distance Function) so the higher the value returned from the distance function, the lower the score will be.

Is there a method to deciding what the weighting range should be? Setting the range too high would likely make the dimension too dominant?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文