We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(4)
Lucene 中提供了带有词干分析器的印地语分析器。它基于此算法(pdf)。
Hindi Analyzer, with stemmer, is available in Lucene. It is based on this algorithm(pdf).
hindi_stemmer 是“< a href="http://computing.open.ac.uk/Sites/EACLSouthAsia/Papers/p6-Ramanathan.pdf" rel="nofollow">印地语轻量级词干分析器" 作者:Ananthakrishnan Ramanathan 和 Durgesh D饶。
hindi_stemmer is a Python implementation of the Hindi stemmer described in "A Lightweight Stemmer for Hindi" by Ananthakrishnan Ramanathan and Durgesh D Rao.
/**
* 印地语轻词干分析器 - 删除名词和形容词中的数字、性别和大小写后缀
}
/**
* Hindi light stemmer- removes number, gender and case suffixes from nouns and adjectives
}
的 java 版本 Python Hindi Stemmer
我们创建了原始代码 原始印地语词干分析器的作者使用变量 L 的方式尚不清楚,但有一个完整的代码可以使用
import java.util.ArrayList;
导入 org.apache.commons.lang.StringUtils;
正如您所看到的,在某些情况下 utf-8 字符没有被很好地捕获。查看原始 python 代码,然后从那里复制后缀值。
We created a java version of original code Python Hindi Stemmer
Although the way authors of original Hindi stemmer used variable L is not well understood but there is a complete code what would work
import java.util.ArrayList;
import org.apache.commons.lang.StringUtils;
As you see utf-8 chars in some cases are not well captured . Look at original python code and just copy suffixes values from there.