是否有 Porter2 词干分析器的 java 实现
你知道 Porter2 词干分析器的 Java 实现(或者用 Java 编写的更好的词干分析器)吗?我知道这里有一个Java版本的Porter(不是Porter2):
http://tartarus。 org/~martin/PorterStemmer/java.txt
但在 http://tartarus.org/~ martin/PorterStemmer/ 作者提到 Porter 有点过时,建议使用 Porter2,可在
http://snowball.tartarus.org/algorithms/english/stemmer.html
但是,我的问题是这个 Porter2 是用 Snowball 编写的(我以前从未听说过,所以不要'我对此一无所知)。我正在寻找的是它的java版本。
谢谢。他将非常感谢您的帮助。
Do you know any java implementation of the Porter2 stemmer(or any better stemmer written in java)? I know that there is a java version of Porter(not Porter2) here :
http://tartarus.org/~martin/PorterStemmer/java.txt
but on http://tartarus.org/~martin/PorterStemmer/ the author mentions that the Porter is bit outdated and recommends to use Porter2, available at
http://snowball.tartarus.org/algorithms/english/stemmer.html
However, the problem with me is that this Porter2 is written in snowball(I never heard of it before, so don't know anything about it). What I am exactly looking for is a java version of it.
Thanks. Your help will he highly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
Snowball 算法可作为 Java 下载
并从 snowball.tartarus.org:
这就是你想要的,对吗?
您可以像这样创建它的实例:
The Snowball algo is available as a Java download
And from snowball.tartarus.org:
This is what you want, right?
You can create an instance of it like so:
它作为 MG4J 的一部分提供。
请参阅 EnglishStemmer 的文档,即Porter2。使用方法 processTerm(MutableString ms)
MG4J 还为您提供了其他词干分析器的 java 版本。请参阅 snowball包。所有这些词干分析器都可以独立使用。
It is available as a part of MG4J.
See the documentation for EnglishStemmer, i.e. Porter2. Use method processTerm(MutableString ms)
MG4J also gives you java versions of other stemmers. See the snowball package. All these stemmers can be used independently.
也许不是直接答案,但许多 NLP 工具包中都有词干分析器 - 请参阅 http://en.wikipedia。 org/wiki/Natural_language_processing_toolkits。
这里有一个相关的问题Tokenizer,stop Word Removal,Stemming in Java有几个可能有用的答案。
我们使用 OpenNLP,它是用 Java 编写的,可以提供该功能。如果您使用英语工作,我认为词干分析器之间的差异不会很重要。
Maybe not a direct answer, but there are stemmers in many NLP toolkits - see http://en.wikipedia.org/wiki/Natural_language_processing_toolkits.
There's a related question here Tokenizer, Stop Word Removal, Stemming in Java with several answers that might be useful.
We use OpenNLP which is written in Java and may provide the functionality. I wouldn't expect the variation between stemmers to be critical if you are working in English.
看起来像 Lucene 以一种或另一种形式集成一些词干算法。您可能会从包
org.apache.lucene.analysis
。然而,我担心词干代码会被深度集成到分析组件中,从而使其提取变得相当困难......Seems like Lucene integrates, in one form or another, some stemming algorithms. You may find what you're looking for starting at package
org.apache.lucene.analysis
. I however fear the stemming code to be deeply integrated into analysis components, making as a consequence quite hard its extraction ...以下链接包含 Snowball Stemmer API。它具有 Porter Stemmer2 实现。
http://preciselyconcise.com/apis_and_installations/snowball_stemmer.php
The following link contains snowball stemmer api.It has the porter stemmer2 implementation.
http://preciselyconcise.com/apis_and_installations/snowball_stemmer.php
这是我制作的轻量级包装,它是易于重用 和 可在 Maven Central 上获取。
Here is a lightweight wrapper I made that is easy to re-use and available on Maven Central.