用余弦相似性在不同情节结束的两种计算T-SNE图的方法,但是该方法似乎是相同的
在过去的一个小时中,我一直在研究这个问题,但似乎找不到问题... 我有一份文章列表,我想查看哪些文章彼此相似。
我通过计算文章的TF-IDF向量之间的余弦相似性并制作结果的T-SNE图。我以两种方式做到了这一点,但令我惊讶的是,这些地块彼此截然不同,而且我看不出哪一个是正确的。
在示例中,TFDOC是TF-IDF。
from sklearn.metrics.pairwise import cosine_similarity
from sklearn import manifold
X = cosine_similarity(tfdoc, tfdoc)
model = manifold.TSNE(random_state=1, metric="precomputed")
Y = model.fit_transform(X)
绘制后,这将导致:
但是当我使用此代码时:
from sklearn.manifold import TSNE
tsne = TSNE(random_state=1, metric="cosine")
embs = tsne.fit_transform(tfdoc)
<
a href =“ https://i.sstatic.net/kcwk8.png” “>
有人知道这里的区别到底是什么?
提前致谢!!
I have been looking at this for the past hour but can not seem to find the problem...
I have a list of articles on which I want to see which articles are similar to each other.
I have done this by computing the cosine similarities between the TF-IDF vectors of the articles and making a t-SNE plot of the result. I have done this in 2 ways but what surprised me is that the plots are very different from each other, and I do not see which one is correct.
In the examples, tfdoc is the TF-IDF.
from sklearn.metrics.pairwise import cosine_similarity
from sklearn import manifold
X = cosine_similarity(tfdoc, tfdoc)
model = manifold.TSNE(random_state=1, metric="precomputed")
Y = model.fit_transform(X)
when plotted, this results in:
But when I use this code:
from sklearn.manifold import TSNE
tsne = TSNE(random_state=1, metric="cosine")
embs = tsne.fit_transform(tfdoc)
It results in:
Does someone know what the difference here exactly is?
Thanks in advance!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
第一个测试使用余弦相似性,而第二个测试使用余弦距。通常,较大的余弦距离意味着较小的余弦相似性。
The first test uses cosine-similarity, whereas the second uses cosine-distance. Normally, larger cosine distances means smaller cosine similarity.