有潜在语义索引吗?
Java 中是否有 LSI 的开源实现?我想在我的项目中使用该库。我见过 jLSI,但它实现了其他一些 LSI 模型。我想要一个标准型号。
Is there any open source implementation of LSI in Java? I want to use that library for my project. I have seen jLSI but it implements some other model of LSI. I want a standard model.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您考虑过LDA(潜在狄利克雷分配)吗?我也没有,但我最近在LSI(专利)上遇到了同样的问题。据我了解,LDA 是一种相关/更强大的技术。 http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation 显然有一些链接可以打开-源实现。
Have you considered LDA (Latent Dirichlet allocation)? I haven't really either, but I encountered the same problem with LSI recently (patents). From what I understand LDA is a related/more powerful technique. http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation apparently has some links to open-source implementations.
谷歌搜索 java LSI 导致类似的问题< /a> 推荐 SemanticVectors。构建在 Lucene 之上的软件包,与 LSI “类似”。我不知道它是否比 jLSI 实现更接近。
该帖子还提到 LSI 已获得专利,并且没有很多实现。因此,如果您需要标准实现,您可能必须使用 java 以外的语言。
A google search for java LSI leads to a similar question that recommends SemanticVectors. A package built on top of Lucene that is 'similar' to LSI. I don't know if it's closer than the jLSI implementation.
That thread also mentions that LSI is patented and there aren't a lot of implementations of it. So if you need a standard implementation you may have to use a language other than java.
S-Space Package 具有 LSA 的开源版本,带有绑定为 LSI 文档向量。 (这两种方法都在相同的术语文档矩阵上运行,并且除了输出之外都是等效的。)这是一种使用精简 SVD 的相当可扩展的方法。我已经用它在所有维基百科上运行 LSI,没有任何问题(删除出现次数少于 5 次的不常见术语后)。
正如 Scott Ray 提到的,SemanticVectors 包还有一个很好的 LSI 实现,最近改用相同的 Thin-SVD (SVDLIBJ),因此您可能会检查一下,就像以前没有检查过一样。
The S-Space Package has an open source version of LSA, with bindings for the LSI document vectors. (Both approaches operate on the same term-document matrix and are equivalent except in the output.) It's a fairly scalable approach that uses the thin-SVD. I've used it to run LSI on all of Wikipedia with no issue (after removing the infrequent terms with less than 5 occurrences).
As Scott Ray mentioned, the SemanticVectors package also has a good LSI implementation that recently switched to using the same thin-SVD (SVDLIBJ), so you might check that out as if you hadn't before.
谷歌搜索 NLP 工具提供了这个幻灯片,我认为这有帮助......
a google search for NLP tools provide this slides which i think helps ...
我相信LSA/LSI是在1989年获得专利的,这意味着专利应该刚刚到期。希望我们很快就能看到一些不错的开源应用程序。
I believe that LSA/LSI was patented in 1989, which means the patent should have just expired. Hopefully we will see some nice open source applications soon.
您尝试过语义向量包吗?
http://code.google.com/p/semanticvectors/
Have you tried the Semantic Vector package?
http://code.google.com/p/semanticvectors/