什么是最快计算两个相同形状矩阵行之间余弦相似性的方法
例如,我有两个2D数组,如下所示:
X = array([[4, 4, 4, 2],
[3, 1, 2, 2],
[1, 3, 3, 3],
[1, 3, 1, 2]])
Y = array([[2, 1, 1, 4],
[2, 1, 1, 1],
[4, 1, 4, 4],
[4, 2, 3, 4]])
我想计算x和y行之间的余弦相似
def cos(feats1, feats2):
"""
Computing cosine distance
For similarity
"""
cos = np.dot(feats1, feats2) / (np.linalg.norm(feats1) * np.linalg.norm(feats2))
return cos
for i in range(a.shape[0]):
print(cos(a[i,:],b[i,:]))
。但是x和y的大小就像(1200000000,512),只需要很长的时间才能计算使用循环。
我的问题是如何利用代数和Numpy的力量来加快此过程。
或任何可以更有效地执行此计算的方法。
谢谢
For example, I have two 2D array as follow:
X = array([[4, 4, 4, 2],
[3, 1, 2, 2],
[1, 3, 3, 3],
[1, 3, 1, 2]])
Y = array([[2, 1, 1, 4],
[2, 1, 1, 1],
[4, 1, 4, 4],
[4, 2, 3, 4]])
I want to calculate cosine simarity between rows of X and Y. such as
def cos(feats1, feats2):
"""
Computing cosine distance
For similarity
"""
cos = np.dot(feats1, feats2) / (np.linalg.norm(feats1) * np.linalg.norm(feats2))
return cos
for i in range(a.shape[0]):
print(cos(a[i,:],b[i,:]))
Right now, I am using for loop to calculate cos distance between vectors. But the size of X and Y is like (1200000000, 512), it takes realy long time to calculate just using for loop.
My question is how i can utilize the power of algebra and numpy to speed up this process.
Or any other method that can perform this calculation more efficient.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一条线可能的可能是:诀窍是只指定执行标准和点产品的轴。
第一部分,
(x * y).sum(axis = 1)
负责计算点产品。axis = 1
指定我们在列上执行点产品,即获得每一行的结果(数据点)。第二部分简单地使用相同的方法计算每个向量的规范。
Possible in one single line: the trick is to just specify the axis over which perform the norm and the dot product.
The first part,
(X * Y).sum(axis=1)
takes care of computing the dot product.axis=1
specify that we perform the dot product over the columns, i.e. get a result for each row (the datapoints).The second part simply computes the norm of each vector, with the same method.
如果您只想使用
numpy
,请充分利用广播:If you only want to use
numpy
, make good use of broadcasting: