如何找出哪一组列表与原始列表最相似?
我正在尝试从一些有序列表中获取数字,例如列表可以是 'ABCD E',另一个: 'CBEADHGFIJ K'
所以我有一些可信数据,这是一组有序列表,每个列表都与一个字符串相关联,我正在尝试评估哪种自动方式最适合检索给定字符串的相同列表,以便该列表匹配(或尽可能相似)我的可信数据中相同字符串的列表。
我在统计方面没有很强的背景,我希望您可以参考我可以使用的方法以及有助于我了解如何实现它们的链接或资源。
I am trying to get numbers out of some ordered lists, for example a list could be
'A B C D E', and the other:
'C B E A D H G F I J K'
So I have some trusted data, which is a set of ordered lists each associated with a String, and I am trying to evaluate which automated way is the best to retrieve the same lists given a string so that the list would match (or be as similar as possible to) the list of the same string in my trusted data.
I don't have a strong background in statistics, I was hoping you can refer to methods that I can use and links or resources that would help me understand how to implement them.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
解决方案取决于您的相似性概念。一种流行的相似性度量是 Levenshtein 距离(获得一个字母所需的字母添加、删除和修改数量)来自另一个的字符串)。
The solution would depend on your notion of similarity. One popular similarity measure is the Levenshtein distance (number of letter additions, deletions and modification required to obtain one string from another).