评估字符串匹配的质量
将模式与一组字符串逐一进行比较,同时评估模式与每个字符串的匹配程度的最佳方法是什么?根据我对正则表达式的有限经验,使用正则表达式将字符串与模式匹配似乎是一个相当二元的操作……无论模式有多复杂,最终它要么匹配,要么不匹配。我正在寻找更强大的功能,而不仅仅是匹配。有没有与此相关的好的技术或算法?
下面是一个示例:
假设我有一个模式 foo bar
,我想从以下字符串中找到与其最匹配的字符串:
foo for
foo bax
foo buo
fxx bar
现在,这些字符串实际上都不匹配 模式,但是哪个非匹配最接近是匹配?在这种情况下,foo bax
将是最佳选择,因为它匹配 7 个字符中的 6 个。
抱歉,如果这是一个重复的问题,当我查看这个问题是否已经存在时,我真的不知道到底要搜索什么。
What would be the best way to compare a pattern with a set of strings, one by one, while rating the amount with which the pattern matches each string? In my limited experience with regex, matching strings with patterns using regex seems to be a pretty binary operation...no matter how complicated the pattern is, in the end, it either matches or it doesn't. I am looking for greater capabilities, beyond just matching. Is there a good technique or algorithm that relates to this?
Here's an example:
Lets say I have a pattern foo bar
and I want to find the string that most closely matches it out of the following strings:
foo for
foo bax
foo buo
fxx bar
Now, none of these actually match the pattern, but which non-match is the closest to being a match? In this case, foo bax
would be the best choice, since it matches 6 out of the 7 characters.
Apologies if this is a duplicate question, I didn't really know what exactly to search for when I looked to see if this question already exists.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这个有效,我检查了维基百科示例
“小猫”和“坐”之间的距离是3
This one works, I checked with Wikipedia example
distance between "kitten" and "sitting" is 3
这是一个有趣的问题!我首先想到的是正则表达式的匹配方式是构建一个 DFA。如果您可以直接访问为给定正则表达式构建(或刚刚构建)的 DFA你自己运行!)你可以运行输入测量从你转换到的最后一个状态到接受状态的距离,使用最短路径作为距离被接受的程度的度量,但我不知道有任何库可以会让你轻松做到这一点,甚至在许多情况下,这种测量方法也可能无法完全符合你的直觉。
That's an interesting question! The first thing that came to mind is that the way regular expressions are matched is by building a DFA. If you had direct access to the DFA that was built for a given regex (or just built it yourself!) you could run the input measure the distance from the last state you transitioned to and an accept state, using a shortest path as a measure of how close it was to being accepted, but I'm not aware of any libraries that would let you do that easily and even this measure probably wouldn't exactly map onto your intuition in a number of cases.