带有匹配分数&GT的单词返回列表; x中的fuzzywuzzy python
我正在研究客户端的Web应用程序,我想从用户那里检索一个琐事问题的答案,但是即使拼写略有关闭,我也想将其视为正确的答案。我想知道是否有一种很好的方法可以使用fuzzywuzzy(当我将最初在Python的数据争吵时)返回一个单词列表,例如,匹配分数将大于.9。因此,如果我将“鸡”和.9传递给功能,它将返回所有相似性得分超过0.9的单词(“鸡肉”(“鸡肉”,“奇肯”等)。任何想法都会有所帮助,谢谢你!
I'm working on a client-side web application and I want to retrieve an answer for a trivia question from the user, but I want to deem it a correct answer even if the spelling is slightly off. I'm wondering if there is a good way of using fuzzywuzzy (when I wrangle the data originally in python) to return a list of words that would have a matching score greater than .9, for instance. So, if I pass "chicken" and .9 to the function, it returns all words that have a similarity score of over .9 for "chicken" ("chickenn, "chiken", etc.). Any thoughts would be helpful, thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
鉴于您的WebApp是随意的,只是为了娱乐,这听起来像是一种有效的用途。
如果是为了认真的业务,我认为最好不要使用模糊织物。这里的第一个问题是,您必须提出预定的比赛百分比阈值,这可能会导致用户的角度产生一些怪异的行为。例如,
fuzz.partial_ratio(“ He”,“ she”)
的相似性得分为80
,但含义非常不同。从我在类似应用程序中看到的内容,您可以存储可接受的单词列表,然后将用户输入与其进行比较,例如。
['鸡肉',“鸡”,“鸡”,“鸡”,“鸡”,“鸡”]
。或者,您可以通过各种python内置功能从用户输入中进行一些字符串清洁,例如
strip()
用于修剪whitespacesupper()和下()
转换所有 毕竟给上/小写字母,如果用户写
'ch!ck3n〜:)'
是用户的错。This sounds like a valid use, given that your webapp is casual and just for fun.
If it is for serious business, I think it is better not to use fuzzywuzzy. The first issue here is you have to come up with a pre-determined threshold of match percentage, which can lead to some weird behavior from users' perspective. For example,
fuzz.partial_ratio("he", "she")
has a similarity score of80
but with a pretty different meaning.From what I see in similar applications, you can either store a list of accepted words and compare user input with it, eg.
['chicken', 'chickens', 'Chicken', 'Chickens', 'CHICKEN', 'CHICKENS']
.Or you can do some string cleaning from user input by various Python built-in functions, such as
strip()
for trimming whitespacesupper() and lower()
to convert all letters to upper/lowercaseAfterall, it is users' fault for not getting marks if he writes
'ch!cK3n ~:)'
.