对LSA的质疑
我必须找到参考文档与存储库中的文档集之间的相似性。
Method :
1. I find the term document matrix for all the documents including the reference document
2. The svd is calculated for this matrix
3. I take the v array(The third result)
4. I transpose this matrix so that the each row represents a document .
5. The first row represents the reference document .
6. I find the cosine similarity beween this row and the rest of the rows
我的疑问:
由于我的数据库中有大约 7 个文档,所以我只得到 8*8 varray(文档矩阵) 。那么,如果我单独找到这 8 个值的余弦相似度,我会得到正确的结果吗?
这种方法普遍采用吗?
我用java来编码这个。我使用 jama 包来查找 svd 。
I have to find the similarity between a reference document and the set of documents in a repository .
Method :
1. I find the term document matrix for all the documents including the reference document
2. The svd is calculated for this matrix
3. I take the v array(The third result)
4. I transpose this matrix so that the each row represents a document .
5. The first row represents the reference document .
6. I find the cosine similarity beween this row and the rest of the rows
My doubts :
Since i have around 7 documents in my db , i get only 8*8 varray(document matrix) . SO will i get a correct result if i find the cosine similarity with these 8 values alone ?
Is such a method adopted generally ?
I use java to code this . I make use of the jama package to find the svd .
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
计算你的余弦
相似度,您将需要计算后得到的最后一个矩阵
A = U * S * V^t 。
您可以阅读 LSA 的示例 这里
calculating your Cosine
similarity, you will require the last matrix which you will get after this calculation
A = U * S * V^t .
You can read an example of LSA Here