We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(5)
以下 Java 库提供多种比较算法(Levenshtein、Jaro Winkler 等):
两个库都有一个 java 文档 (Apache Commons Lang Javadoc,Simmetrics Javadoc)。
The following Java libraries offer multiple compare algorithms (Levenshtein,Jaro Winkler,...):
Both libraries have a java documentation (Apache Commons Lang Javadoc,Simmetrics Javadoc).
Levensthein 距离是衡量字符串相似程度的指标。或者,更准确地说,需要进行多少次修改才能使它们相同。
算法在维基百科上以伪代码形式提供。将其转换为 Java 应该不是什么大问题,但它没有内置到基类库中。
Wikipedia 还有一些衡量字符串相似度的算法。
The Levensthein distance is a measure for how similar strings are. Or, more precisely, how many alterations have to be made that they are the same.
The algorithm is available in pseudo-code on Wikipedia. Converting that to Java shouldn't be much of a problem, but it's not built-in into the base class library.
Wikipedia has some more algorithms that measure similarity of strings.
是的,这是一个很好的指标,您可以使用 StringUtil.getLevenshteinDistance() 来自 apache commons
yeah thats a good metric, you could use StringUtil.getLevenshteinDistance() from apache commons
您可以在以下位置找到 Levenshtein 和其他字符串相似性/距离度量的实现
https://github.com/tdebatty/java-string-similarity
如果您的项目使用maven,安装很简单 然后
,以使用 Levenshtein 为例
You can find implementations of Levenshtein and other string similarity/distance measures on
https://github.com/tdebatty/java-string-similarity
If your project uses maven, installation is as simple as
Then, to use Levenshtein for example
无耻的插件,但我也写了一个库:
https://github.com/vickumar1981/stringdistance
它具有所有这些功能,再加上一些语音相似性功能(如果一个单词“听起来像”另一个单词 - 返回 true 或 false,这与其他模糊相似性(0-1 之间的数字)不同)。
还包括 DNA 测序算法,例如 Smith-Waterman 和 Needleman-Wunsch,它们是 Levenshtein 的通用版本。
我计划在不久的将来使其适用于任何数组,而不仅仅是字符串(字符数组)。
Shameless plug, but I wrote a library also:
https://github.com/vickumar1981/stringdistance
It has all these functions, plus a few for phonetic similarity (if one word "sounds like" another word - returns either true or false unlike the other fuzzy similarities which are numbers between 0-1).
Also includes dna sequencing algorithms like Smith-Waterman and Needleman-Wunsch which are generalized versions of Levenshtein.
I plan, in the near future, on making this work with any array and not just strings (an array of characters).