适当的向量相似度指数
我正在尝试调整余弦相似度以确定两个向量相对于条目的相似程度。由于获得的度量在向量尺度下是不变的{(0,1,2)和(0,2,4)的余弦相似度为1},那么扩展相似性度量以考虑初始向量尺度的方法是什么?我想到乘以 min{|v1|, |v2|}/max{|v1|, |v2|},用 |v|表示向量 v 范数,以保留 -1 和 1 的边界。非常感谢任何建议。
I'm trying to adjust cosine similarity to determine how similar two vectors are, with respect to entries. Since the obtained measure is invariant under vector scale {(0, 1, 2) and (0, 2, 4) have cosine similarity of 1}, what would be the way to extend the similarity measure to account for the initial vector scale? I thought of multiplying by min{|v1|, |v2|}/max{|v1|, |v2|}, with |v| denoting a vector v norm, to preserve the bounds of -1 and 1. Any suggestions are highly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好吧,余弦相似度基于两个向量之间的角度(与向量的长度无关)。如果您需要考虑向量长度的东西,那么您需要考虑向量长度如何影响上下文中的相似性。
另请注意,如果需要保持在特定边界内(例如
[-1, 1]
),您始终可以对相似性或距离度量进行后处理。进行此类转换的流行函数是 arctan。例如,您可以尝试使用适当的变换来尝试欧几里德距离,而不是扩展余弦相似度:
但正如我所说,“正确”的公式取决于您的上下文。
Well, cosine similarity is based on the angle between both vectors (which doesn't depend on the length of the vectors). If you need something that takes the length of the vectors into account then you need to think about how vector length influences similarity in your context.
Also note that you can always post-process a similarity or distance measure if need to stay within certain boundaries (like
[-1, 1]
). A popular functions for doing such transforms is the arctan.For example, instead of extending the cosine similarity you could try the Euclidean distance with an appropriate transformation:
But as I said, the "correct" formula depends on your context.