R 中的快速编辑距离?
是否有一个包包含以 C 或 Fortran 代码实现的 Levenshtein 距离计数函数?我有很多字符串需要比较,而来自 MiscPsycho
的 stringMatch
对此来说太慢了。
Is there a package that contains Levenshtein distance counting function which is implemented as a C or Fortran code? I have many strings to compare and stringMatch
from MiscPsycho
is too slow for this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
以及
stringdiststringdist
code> 包也能做到这一点,在某些条件下甚至比levenshteinDist
更快(1)And
stringdist
in thestringdist
package does it too, even faster thanlevenshteinDist
under certain conditions (1)levenshteinDist(来自
RecordLinkage package) 调用编译的 C 代码。尝试一下。
levenshteinDist (from the
RecordLinkage
package) calls compiled C code. Give it a try.您也可以尝试
Biostrings
中的stringDist
You could try
stringDist
fromBiostrings
as well您还可以使用
textTinyR
包中的levenshtein_distance()
。当涉及到大约 30k 字符的较大字符向量时,我遇到了所有其他包的“calloc”内存错误。只有textTinyR
对我有用!You could also use
levenshtein_distance()
from thetextTinyR
package. I got 'calloc' memory errors with all other packages when it came to larger character vectors of around 30k characters. OnlytextTinyR
worked for me!