使用单词列表计算编辑距离
首先我想说我是Python新手。我试图计算许多单词列表的编辑距离。到目前为止,我成功地为一对单词编写了代码,但在为列表编写代码时遇到了一些问题。我只有两个列表,单词一个在另一个下面,如下所示: 卡洛斯 斯蒂夫 Peter
我想使用 Levenshtein 距离来实现相似性方法。有人可以告诉我如何加载列表,然后使用函数来计算距离吗?
我将不胜感激!
这是我的代码,仅适用于两个字符串:
#!/usr/bin/env python
# -*- coding=utf-8 -*-
def lev_dist(source, target):
if source == target:
return 0
#words = open(test_file.txt,'r').read().split();
# Prepare matrix
slen, tlen = len(source), len(target)
dist = [[0 for i in range(tlen+1)] for x in range(slen+1)]
for i in xrange(slen+1):
dist[i][0] = i
for j in xrange(tlen+1):
dist[0][j] = j
# Counting distance
for i in xrange(slen):
for j in xrange(tlen):
cost = 0 if source[i] == target[j] else 1
dist[i+1][j+1] = min(
dist[i][j+1] + 1, # deletion
dist[i+1][j] + 1, # insertion
dist[i][j] + cost # substitution
)
return dist[-1][-1]
if __name__ == '__main__':
import sys
if len(sys.argv) != 3:
print 'Usage: You have to enter a source_word and a target_word'
sys.exit(-1)
source, target = sys.argv[1], sys.argv[2]
print lev_dist(source, target)
first i want to say that i am a newbie in python. I trying to calculate the Levenshtein Distance for many lists of word. Until now i succeed writing the code for a pair of word, but i'm having some problems doing it for lists. I just habe two lists with words one below the other like this:
carlos
stiv
peter
I want to use the Levenshtein distance for a similarity approach. Could somebady tell me how i can load the lists and then use a function to calculate de distance?
I'll appreciated!
Here is my code just for two strings:
#!/usr/bin/env python
# -*- coding=utf-8 -*-
def lev_dist(source, target):
if source == target:
return 0
#words = open(test_file.txt,'r').read().split();
# Prepare matrix
slen, tlen = len(source), len(target)
dist = [[0 for i in range(tlen+1)] for x in range(slen+1)]
for i in xrange(slen+1):
dist[i][0] = i
for j in xrange(tlen+1):
dist[0][j] = j
# Counting distance
for i in xrange(slen):
for j in xrange(tlen):
cost = 0 if source[i] == target[j] else 1
dist[i+1][j+1] = min(
dist[i][j+1] + 1, # deletion
dist[i+1][j] + 1, # insertion
dist[i][j] + cost # substitution
)
return dist[-1][-1]
if __name__ == '__main__':
import sys
if len(sys.argv) != 3:
print 'Usage: You have to enter a source_word and a target_word'
sys.exit(-1)
source, target = sys.argv[1], sys.argv[2]
print lev_dist(source, target)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在朋友的帮助下,我终于得到了代码:)
您可以计算编辑距离并将其与第二个列表中的每个单词进行比较,更改脚本中的最后一行,即: print(list1[0], list2[i]),以将 list1 中的第一个单词与每个单词进行比较在列表2中。
谢谢
I finally got the code working with some help from a friend :)
You can compute the Levenshtein distance and compare it to every word from the second list changing the last line in the script, i.e: print(list1[0], list2[i]), to compare the first word from the list1 to every word in list2.
Thanks
不要重新发明轮子:
http://pypi.python.org/pypi/python-Levenshtein/
Don't reinvent wheels:
http://pypi.python.org/pypi/python-Levenshtein/