LSA - 找到 SVD 后的步骤
从早上开始我已经读了很多教程了。我的问题涉及找到两个文档之间的相似性。我期待在 java 中使用 LSA 来实现此目的。
我理解了术语文档矩阵的创建,然后将 SVD(维度减少)应用于它。结果获得了 3 个矩阵。这可能听起来很愚蠢,但我已经坚持这个问题有一段时间了。现在,如果我必须找到两个文档之间的相似性,我该怎么办?
I have read quite a few tutorials since morning . My problem involves finding the similarity between two documents. I am looking forward to use LSA in java for this purpose.
I understood the creation of the term-document matrix and then the SVD(Dimensionality gets reduced) is applied to it . 3 Matrices are obtained as results.This might sound stupid but i have been stuck with this for a quite a while . Now if i have to find the similarity between the two documents what do i have to do ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用 SVD 计算出 3 个矩阵后,您需要计算要比较的两个文档的向量之间的相关性。您可以使用斯皮尔曼相关性。
另一种方法是使用余弦距离。
您可以在 LSA 找到更多详细信息,有一个完整的示例和解释。
你可能会搜索一些 LSA 的 java 库。
After calculating the 3 matrices using SVD, you need to calculate the correlation between the vectors of the two documents you want to compare. you can use spearman's correlation.
Another way is with using the cosine distance.
you will find more details at LSA, there is a full example with explanation.
you might search for some java libraries for LSA.