余弦相似度代码(非项向量)

发布于 2024-12-06 21:26:57 字数 788 浏览 0 评论 0原文

我试图找到 2 个向量(x,y 点)之间的余弦相似度,但我犯了一些我无法确定的愚蠢错误。请原谅我是新手,如果我犯了一个非常简单的错误(我很可能是这样),请原谅我。

感谢您的帮助

  public static double GetCosineSimilarity(List<Point> V1, List<Point> V2)
    {
        double sim = 0.0d;
        int N = 0;
        N = ((V2.Count < V1.Count)?V2.Count : V1.Count);
        double dotX = 0.0d; double dotY = 0.0d;
        double magX = 0.0d; double magY = 0.0d;
        for (int n = 0; n < N; n++)
        {
            dotX += V1[n].X * V2[n].X;
            dotY += V1[n].Y * V2[n].Y;
            magX += Math.Pow(V1[n].X, 2);
            magY += Math.Pow(V1[n].Y, 2);
        }

        return (dotX + dotY)/(Math.Sqrt(magX) * Math.Sqrt(magY));
    }

编辑:除了语法之外,我的问题还与逻辑构造有关,因为我正在处理不同长度的向量。另外,上面的内容如何推广到 m 维的向量。谢谢

I am trying to find the cosine similarity between 2 vectors (x,y Points) and I am making some silly error that I cannot nail down. Pardone me am a newbie and sorry if I am making a very simple error (which I very likely am).

Thanks for your help

  public static double GetCosineSimilarity(List<Point> V1, List<Point> V2)
    {
        double sim = 0.0d;
        int N = 0;
        N = ((V2.Count < V1.Count)?V2.Count : V1.Count);
        double dotX = 0.0d; double dotY = 0.0d;
        double magX = 0.0d; double magY = 0.0d;
        for (int n = 0; n < N; n++)
        {
            dotX += V1[n].X * V2[n].X;
            dotY += V1[n].Y * V2[n].Y;
            magX += Math.Pow(V1[n].X, 2);
            magY += Math.Pow(V1[n].Y, 2);
        }

        return (dotX + dotY)/(Math.Sqrt(magX) * Math.Sqrt(magY));
    }

Edit: Apart from syntax, my question was also to do with the logical construct given I am dealing with Vectors of differing lengths. Also, how is the above generalizable to vectors of m dimensions. Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

柳若烟 2024-12-13 21:26:57

如果是二维的,则可以将向量表示为 (V1.X, V1.Y)(V2.X, V2.Y),然后 如果您处于更高的维度,那么您可以将每个向量表示

public static double GetCosineSimilarity(Point V1, Point V2) {
 return (V1.X*V2.X + V1.Y*V2.Y) 
         / ( Math.Sqrt( Math.Pow(V1.X,2)+Math.Pow(V1.Y,2))
             Math.Sqrt( Math.Pow(V2.X,2)+Math.Pow(V2.Y,2))
           );
}

List。因此,在 4 维中,第一个向量将具有分量 V1 = (V1[0], V1[1], V1[2], V1[3])

public static double GetCosineSimilarity(List<double> V1, List<double> V2)
{
    int N = 0;
    N = ((V2.Count < V1.Count) ? V2.Count : V1.Count);
    double dot = 0.0d;
    double mag1 = 0.0d;
    double mag2 = 0.0d;
    for (int n = 0; n < N; n++)
    {
        dot += V1[n] * V2[n];
        mag1 += Math.Pow(V1[n], 2);
        mag2 += Math.Pow(V2[n], 2);
    }

    return dot / (Math.Sqrt(mag1) * Math.Sqrt(mag2));
}

If you are in 2-dimensions, then you can have vectors represented as (V1.X, V1.Y) and (V2.X, V2.Y), then use

public static double GetCosineSimilarity(Point V1, Point V2) {
 return (V1.X*V2.X + V1.Y*V2.Y) 
         / ( Math.Sqrt( Math.Pow(V1.X,2)+Math.Pow(V1.Y,2))
             Math.Sqrt( Math.Pow(V2.X,2)+Math.Pow(V2.Y,2))
           );
}

If you are in higher dimensions then you can represent each vector as List<double>. So, in 4-dimensions the first vector would have components V1 = (V1[0], V1[1], V1[2], V1[3]).

public static double GetCosineSimilarity(List<double> V1, List<double> V2)
{
    int N = 0;
    N = ((V2.Count < V1.Count) ? V2.Count : V1.Count);
    double dot = 0.0d;
    double mag1 = 0.0d;
    double mag2 = 0.0d;
    for (int n = 0; n < N; n++)
    {
        dot += V1[n] * V2[n];
        mag1 += Math.Pow(V1[n], 2);
        mag2 += Math.Pow(V2[n], 2);
    }

    return dot / (Math.Sqrt(mag1) * Math.Sqrt(mag2));
}
醉南桥 2024-12-13 21:26:57

更新日期:2023 年 12 月 30 日


正如 Bellarmine Head 所指出的,最新版本的 Microsoft.SemanticKernel.Core Nuget 包(版本 1.0.1)不再不再具有 CosineSimilarity 函数,但是您可以使用System.Numerics.Tensors Nuget 包中的 Microsoft 静态方法TensorPrimitives。取而代之的是余弦相似度。它接受两个只读跨度(ReadOnlySpan x、ReadOnlySpan y)。

使用浮点数组时,用法看起来像这样:

using System.Numerics.Tensors;

float[] vector1 = new[] { 0.018007852f, 0.031938456f, -0.046234965f, };
float[] vector2 = new[] { 0.01055384f, 0.0020128791f, 0.013548848f };
double result = TensorPrimitives.CosineSimilarity(vector1, vector2);

Console.WriteLine($"The result was: {result}");

结果越接近 1,两个项目就越相似。

较早的帖子


Microsoft 在 Microsoft.SemanticKernel.Core Nuget 包(当前版本为 1.0.0-beta1)中有一个扩展方法,其扩展方法名为 余弦相似度。它具有三个重载:

  1. 数字数组(浮点型或双精度型)
  2. Span of T
  3. ReadOnlySpan of T

使用浮点数组时,用法如下所示:

float[] vector1 = new[] {  0.018007852f, 0.031938456f, -0.046234965f, };
float[] vector2 = new[] {  0.01055384f, 0.0020128791f, 0.013548848f };
double result = vector1.CosineSimilarity(vector2);

结果越接近 1,两项就越相似。

Update Dec 30, 2023


As pointed out by Bellarmine Head, the latest version of Microsoft.SemanticKernel.Core Nuget package (version 1.0.1) does NOT have a CosineSimilarity function anymore, but you can use the Microsoft static method in the System.Numerics.Tensors Nuget package TensorPrimitives.CosineSimilarity instead. It accepts two read only spans (ReadOnlySpan x, ReadOnlySpan y).

Using float arrays the usage looks something like this:

using System.Numerics.Tensors;

float[] vector1 = new[] { 0.018007852f, 0.031938456f, -0.046234965f, };
float[] vector2 = new[] { 0.01055384f, 0.0020128791f, 0.013548848f };
double result = TensorPrimitives.CosineSimilarity(vector1, vector2);

Console.WriteLine(
quot;The result was: {result}");

The closer the result is to 1 the more similar the two items are.

Older post


Microsoft has a extension method in the Microsoft.SemanticKernel.Core Nuget package (currently at version 1.0.0-beta1) that has extension method called CosineSimilarity. It has three overloads:

  1. Array of number (float or double)
  2. Span of T
  3. ReadOnlySpan of T

Using float arrays the usage looks something like this:

float[] vector1 = new[] {  0.018007852f, 0.031938456f, -0.046234965f, };
float[] vector2 = new[] {  0.01055384f, 0.0020128791f, 0.013548848f };
double result = vector1.CosineSimilarity(vector2);

The closer the result is to 1 the more similar the two items are.

喜爱纠缠 2024-12-13 21:26:57

最后一行应该是

return (dotX + dotY)/(Math.Sqrt(magX) * Math.Sqrt(magY))

The last line should be

return (dotX + dotY)/(Math.Sqrt(magX) * Math.Sqrt(magY))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文