使用python匹配模糊字符串
我有一个用于EG的培训数据集。
Letter Word
A Apple
B Bat
C Cat
D Dog
E Elephant
我需要检查诸如
AD Apple Dog
AE Applet Elephant
DC Dog Cow
EB Elephant Bag
AED Apple Elephant Dog
D Door
ABC All Bat Cat
实例ad,ae,eb
之类的数据框架几乎是准确的(Apple和Applet彼此靠近,蝙蝠和袋子相似),但是DC 不匹配。
所需的输出:
Letters Words Status
AD Apple Dog Accept
AE Applet Elephant Accept
DC Dog Cow Reject
EB Elephant Bag Accept
AED Apple Elephant Dog Accept
D Door Reject
ABC All Bat Cat Accept
ABC
被接受,因为3个单词中有2个匹配。
接受的单词需要匹配70%(模糊匹配)。但是,阈值可能会变化。 我如何使用Python找到这些匹配。
I have a training dataset for eg.
Letter Word
A Apple
B Bat
C Cat
D Dog
E Elephant
and I need to check the dataframe such as
AD Apple Dog
AE Applet Elephant
DC Dog Cow
EB Elephant Bag
AED Apple Elephant Dog
D Door
ABC All Bat Cat
the instances AD,AE,EB
are almost accurate (Apple and Applet are considered closer to each other, similar for Bat and Bag) but DC
doesn't match.
Output Required:
Letters Words Status
AD Apple Dog Accept
AE Applet Elephant Accept
DC Dog Cow Reject
EB Elephant Bag Accept
AED Apple Elephant Dog Accept
D Door Reject
ABC All Bat Cat Accept
ABC
accepted because 2 of 3 words match.
The words accepted need to be matched 70% (Fuzzy Match). yet, threshold subject to change.
How can I find these matches using Python.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用
thefuzz
解决您的问题:输出:输出:
You can use
thefuzz
to solve your problem:Output: