public int CalculateDistance(string s, string t) {
int n = s.Length; //length of s
int m = t.Length; //length of t
int[,] d = new int[n + 1, m + 1]; // matrix
int cost; // cost
// Step 1
if (n == 0) return m;
if (m == 0) return n;
// Step 2
for (int i = 0; i <= n; d[i, 0] = i++) ;
for (int j = 0; j <= m; d[0, j] = j++) ;
// Step 3
for (int i = 1; i <= n; i++) {
//Step 4
for (int j = 1; j <= m; j++) {
// Step 5
cost = (t.Substring(j - 1, 1) == s.Substring(i - 1, 1) ? 0 : 1);
// Step 6
d[i, j] = System.Math.Min(System.Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1), d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
As far as comparing names is concerned you might want to take a look at the Levenshtein distance algorithm. Given two strings it will calculate a distance measurement which can be used as a basis for catching duplicates.
I personally have used it in a tool I developed for an application with a rather large database that had a large number of duplicates in it. Using it in conjunction with some other data comparisons relevant to my domain I was able to point my tool at the application database and quickly find many of the duplicated records. Not going to lie, I thought it was pretty darn cool to see in action.
It's even quick to implement, here's a C# version:
public int CalculateDistance(string s, string t) {
int n = s.Length; //length of s
int m = t.Length; //length of t
int[,] d = new int[n + 1, m + 1]; // matrix
int cost; // cost
// Step 1
if (n == 0) return m;
if (m == 0) return n;
// Step 2
for (int i = 0; i <= n; d[i, 0] = i++) ;
for (int j = 0; j <= m; d[0, j] = j++) ;
// Step 3
for (int i = 1; i <= n; i++) {
//Step 4
for (int j = 1; j <= m; j++) {
// Step 5
cost = (t.Substring(j - 1, 1) == s.Substring(i - 1, 1) ? 0 : 1);
// Step 6
d[i, j] = System.Math.Min(System.Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1), d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
In the research community, the problem of finding similarity between two signals (up to environmental distortions such as noise, mild variations in tempo, pitch, or bitrate) is known as audio (or music) fingerprinting. This topic has been studied heavily for at least a decade. This early (and oft cited) paper by Haitsma and Kalker clearly describes the problem and proposes a simple solution.
The problem of finding musical similarity between two versions of the same song is known as cover song identification. This problem is also studied heavily but is still considered open.
Perhaps the two most popular commercial solutions for content-based musical search are Midomi and Shazam.
I believe this addresses your question. Check Google Scholar for recent solutions to these problems. The ISMIR proceedings are available for free online.
发布评论
评论(2)
就比较名称而言,您可能需要查看 Levenshtein 距离 算法。给定两个字符串,它将计算距离测量,该距离测量可用作捕获重复项的基础。
我个人在为一个应用程序开发的工具中使用了它,该应用程序具有相当大的数据库,其中有大量重复项。将其与与我的领域相关的其他一些数据比较结合使用,我能够将我的工具指向应用程序数据库并快速找到许多重复的记录。不会撒谎,我认为看到它的实际应用真是太酷了。
它甚至可以快速实现,这里有一个 C# 版本:
As far as comparing names is concerned you might want to take a look at the Levenshtein distance algorithm. Given two strings it will calculate a distance measurement which can be used as a basis for catching duplicates.
I personally have used it in a tool I developed for an application with a rather large database that had a large number of duplicates in it. Using it in conjunction with some other data comparisons relevant to my domain I was able to point my tool at the application database and quickly find many of the duplicated records. Not going to lie, I thought it was pretty darn cool to see in action.
It's even quick to implement, here's a C# version:
我在这里写了一个类似的答案:音乐识别和信号处理。
在研究界,寻找两个信号之间的相似性(直至噪声、节奏、音调或比特率的轻微变化等环境失真)的问题被称为音频(或音乐)指纹。这个话题已经被深入研究了至少十年。这个早期(并且经常被引用)Haitsma 和 Kalker 的论文清楚地描述了问题并提出了一个简单的解决方案。
寻找同一首歌的两个版本之间的音乐相似性的问题被称为翻唱歌曲标识。这个问题也得到了大量研究,但仍然被认为是开放的。
也许基于内容的音乐搜索的两个最流行的商业解决方案是 Midomi 和 Shazam。
我相信这解决了你的问题。检查 Google Scholar 以获取这些问题的最新解决方案。 ISMIR 会议记录可免费在线获取。
I wrote a similar answer here: Music Recognition and Signal Processing.
In the research community, the problem of finding similarity between two signals (up to environmental distortions such as noise, mild variations in tempo, pitch, or bitrate) is known as audio (or music) fingerprinting. This topic has been studied heavily for at least a decade. This early (and oft cited) paper by Haitsma and Kalker clearly describes the problem and proposes a simple solution.
The problem of finding musical similarity between two versions of the same song is known as cover song identification. This problem is also studied heavily but is still considered open.
Perhaps the two most popular commercial solutions for content-based musical search are Midomi and Shazam.
I believe this addresses your question. Check Google Scholar for recent solutions to these problems. The ISMIR proceedings are available for free online.