当单词出现的顺序或次数不重要时,两个字符串之间的最佳匹配?
当单词出现的顺序或次数不重要时,在 C# 中匹配或计算两个字符串之间的距离的最佳算法是什么?
最佳方式:
- 大多与人类匹配一致
- 优雅、
- 高效、
- 可扩展,以便输入字符串可以与潜在的大量其他字符串相匹配
相关问题:
一些注释:
- 由于顺序和出现的独立性,输入可以被认为是一组唯一的单词,而不是字符数组意义上的字符串
- 不是专门寻找数据库解决方案,尽管一个会很有趣
- 我很 感兴趣太老了,不能成为家庭作业问题;)
What is the best algorithm to match or compute the distance between two strings in C# when the order or number of times a word appears is not important?
Best means:
- Would mostly agree with a human match
- Elegant
- Efficient
- Scalable, so that an input string could be matched to a potentially large collection of other strings
Related questions:
Some notes:
- Because of the order and occurrence independence, the inputs can be thought of as sets of unique words, not strings in the sense of arrays of characters
- Not specifically looking for a database solution, although one would be interesting
- I'm way too old for this to be a homework problem ;)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这看起来像是应用标准信息检索算法的典型案例。 首先想到的是余弦距离,但可能有更适合您的特定情况的距离。 这是开始挖掘该路线的一个很好的链接:
http: //www.miislita.com/information-retrieval-tutorial/cosine-similarity-tutorial.html
实现示例:
如何计算两个向量的余弦相似度?
This looks like a canonical case to apply standard information retrieval algorithms. Cosine distance is what first comes to mind, but there might be better matches to your particular case. This is a good link to start digging on that route:
http://www.miislita.com/information-retrieval-tutorial/cosine-similarity-tutorial.html
Implementation example:
How do I calculate the cosine similarity of two vectors?
寻找一种称为“双变音位”的方法,我相信对于逐字比较来说,这是最好的方法。 也适用于不同的语言! 太棒了。
如果比较字符串,也许您可以将其与余弦相似度一起使用。 将产生完美的结果。
Seach for a method called "Double Metaphone" which I beleive for word per word comparision it is the best available. Counts for different languages as well! queit amazing.
If comparing string maybe you can use this along with a cosine similarity. will yeild perfect results.