有潜在语义索引吗?

发布于 2024-08-11 13:50:12 字数 170 浏览 6 评论 0原文

Java 中是否有 LSI 的开源实现?我想在我的项目中使用该库。我见过 jLSI,但它实现了其他一些 LSI 模型。我想要一个标准型号。

Is there any open source implementation of LSI in Java? I want to use that library for my project. I have seen jLSI but it implements some other model of LSI. I want a standard model.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

那支青花 2024-08-18 13:50:12

您考虑过LDA(潜在狄利克雷分配)吗?我也没有,但我最近在LSI(专利)上遇到了同样的问题。据我了解,LDA 是一种相关/更强大的技术。 http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation 显然有一些链接可以打开-源实现。

Have you considered LDA (Latent Dirichlet allocation)? I haven't really either, but I encountered the same problem with LSI recently (patents). From what I understand LDA is a related/more powerful technique. http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation apparently has some links to open-source implementations.

感情旳空白 2024-08-18 13:50:12

谷歌搜索 java LSI 导致类似的问题< /a> 推荐 SemanticVectors。构建在 Lucene 之上的软件包,与 LSI “类似”。我不知道它是否比 jLSI 实现更接近。

该帖子还提到 LSI 已获得专利,并且没有很多实现。因此,如果您需要标准实现,您可能必须使用 java 以外的语言。

A google search for java LSI leads to a similar question that recommends SemanticVectors. A package built on top of Lucene that is 'similar' to LSI. I don't know if it's closer than the jLSI implementation.

That thread also mentions that LSI is patented and there aren't a lot of implementations of it. So if you need a standard implementation you may have to use a language other than java.

浮生未歇 2024-08-18 13:50:12

S-Space Package 具有 LSA 的开源版本,带有绑定为 LSI 文档向量。 (这两种方法都在相同的术语文档矩阵上运行,并且除了输出之外都是等效的。)这是一种使用精简 SVD 的相当可扩展的方法。我已经用它在所有维基百科上运行 LSI,没有任何问题(删除出现次数少于 5 次的不常见术语后)。

正如 Scott Ray 提到的,SemanticVectors 包还有一个很好的 LSI 实现,最近改用相同的 Thin-SVD (SVDLIBJ),因此您可能会检查一下,就像以前没有检查过一样。

The S-Space Package has an open source version of LSA, with bindings for the LSI document vectors. (Both approaches operate on the same term-document matrix and are equivalent except in the output.) It's a fairly scalable approach that uses the thin-SVD. I've used it to run LSI on all of Wikipedia with no issue (after removing the infrequent terms with less than 5 occurrences).

As Scott Ray mentioned, the SemanticVectors package also has a good LSI implementation that recently switched to using the same thin-SVD (SVDLIBJ), so you might check that out as if you hadn't before.

瞄了个咪的 2024-08-18 13:50:12

谷歌搜索 NLP 工具提供了这个幻灯片,我认为这有帮助......

a google search for NLP tools provide this slides which i think helps ...

衣神在巴黎 2024-08-18 13:50:12

我相信LSA/LSI是在1989年获得专利的,这意味着专利应该刚刚到期。希望我们很快就能看到一些不错的开源应用程序。

I believe that LSA/LSI was patented in 1989, which means the patent should have just expired. Hopefully we will see some nice open source applications soon.

谈场末日恋爱 2024-08-18 13:50:12

您尝试过语义向量包吗?

http://code.google.com/p/semanticvectors/

Have you tried the Semantic Vector package?

http://code.google.com/p/semanticvectors/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文