如何使用 jaro-winkler 查找表中最接近的值?
我的数据库中有一个 jaro-winkler 算法的实现。这个函数不是我写的。该函数比较两个值并给出匹配的概率。
所以 jaro(string1, string2, matchnoofchars) 将返回结果。
我不想比较两个字符串,而是想发送一个带有 matchnoofchars 的字符串,然后得到概率高于 95% 的结果集。
例如,当前函数能够为 jaro("Philadelphia","Philadelphlaa",9) 返回 97.62%,
我希望调整此函数,以便能够为“Philadelphia”的输入找到“Philadelphia”。我需要做出什么样的改变才能实现这一点?
我正在使用Oracle 9i。
I have an implementation of the jaro-winkler algorithm in my database. I did not write this function. The function compares two values and gives the probability of match.
So jaro(string1, string2, matchnoofchars) will return a result.
Instead of comparing two strings, I want to send one string with a matchnoofchars and then get a result set with the probability higher than 95%.
For example the current function is able to return 97.62% for jaro("Philadelphia","Philadelphlaa",9)
I wish to tweak this function so that I am able to find "Philadelphia" for an input of "Philadelphlaa". What kind of changes do I need to make for this to happen?
I am using Oracle 9i.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您有包含类似“费城”这样的单词的单词列表吗?
谁写了这个函数?
Oracle 有用于模糊文本比较的 utl_match 包: http:// /download.oracle.com/docs/cd/E14072_01/appdev.112/e10577/u_match.htm
你不能
选择w1.word 吗
来自单词 w1
其中 jaro(w1.word,'费城', 9) >= 0.95
?
如果表单词中存在该单词,则将选择“费城”。
Do you have a list of words that contain words like "Philadelphia"?
And who did write that function?
Oracle has package utl_match for fuzzy text comparison: http://download.oracle.com/docs/cd/E14072_01/appdev.112/e10577/u_match.htm
Can't you do
select w1.word
from words w1
where jaro(w1.word,'Philadelphlaa', 9) >= 0.95
?
This will select 'Philadelphia' if that word is present in table words.
有点脏但更快(未经测试!)。
我们假设前三个字符相同并且长度也大致相同。
A little dirty but faster (untested!).
Let's assume first three characters are the same and length is also approximately the same.