匹配两个系列的 Mfcc 系数

发布于 2024-11-27 21:55:36 字数 133 浏览 7 评论 0原文

我从包含相同语音内容的两个大约 30 秒的音频文件中提取了两个系列 MFCC 系数。音频文件是从不同来源记录在同一位置的。应该估计音频是否包含相同的对话或不同的对话。目前我测试了两个Mfcc系列的相关性计算，但结果不是很合理。对于这种情况有最佳实践吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

却一份温柔 2024-12-04 21:55:36

由于这两个向量实际上是直方图，因此您可能需要尝试计算向量之间的卡方距离（直方图的常见距离度量）。

d(i) = sum (x(i) - y(i))^2/(2 * (x(i)+y(i)));

可以在此工具箱中找到良好的（mex）实现：

http://www.mathworks.com/matlabcentral/fileexchange/15935-computing-pairwise-distances-and-metrics

调用如下：

d = slmetric_pw(X, Y, 'chisq');

Since the two vectors are effectively histograms, you might want to try calculating the chi-squared distance between the vectors (a common distance measure for histograms).

d(i) = sum (x(i) - y(i))^2/(2 * (x(i)+y(i)));

A good (mex) implementation can be found in this toolbox:

http://www.mathworks.com/matlabcentral/fileexchange/15935-computing-pairwise-distances-and-metrics

Call as follows:

d = slmetric_pw(X, Y, 'chisq');

回复收藏 0 原文

寄居人 2024-12-04 21:55:36

我遇到了同样的问题，解决方案是使用动态时间扭曲来匹配两个 MFCC 阵列算法。

计算 MFCC 后，对于两个信号中的每一个，您现在应该拥有一个数组，其中每个元素包含帧的 MFCC（数组的数组）。第一步是计算一个数组的每个元素与另一个数组的每个元素之间的“距离”，即每两组 MFCC 之间的距离（您可以尝试使用欧几里得距离）。

这应该会留下一个二维数组（我们称之为“dist”），其中元素 (i,j) 表示第一个信号中第 i 帧的 MFCC 与第 j 帧的 MFCC 之间的距离你的第二个信号。

在此数组上，您现在可以应用 DTW 算法：

dtw(1,1) = dist(1,1)
dtw(i,j) = min (dtw(i-1, j-1), dtw(i-1, j), dtw(i, j-1)) + dist(i,j)。

表示两个文件之间“差异”的值是 dtw(n,m)，其中 n = nr。第一个信号中的帧数，m = nr。第二个帧的帧数。

如需进一步阅读，本文可能会为您提供应用 DTW 的总体视图到 MFCC 和 DTW 算法的演示可能也会有所帮助。

回复收藏 0 原文

ぽ尐不点ル 2024-12-04 21:55:36

我最近遇到了同样的问题。我发现最好的方法是使用音频库 MIRtoolbox< /a>，在音频处理方面非常强大。

添加此库后，可以通过调用轻松计算两个 MFCC 的距离（距离较低<=>相似匹配）：

dist = mirgetdata(mirdist(mfcc1, mfcc2));

I faced the same problem recently. The best way I found is to use the audio library MIRtoolbox, which is very powerful in terms of audio processing.

After adding this library, the distance of two MFCCs can be easily computed by calling (lower distance <=> similar matches):

dist = mirgetdata(mirdist(mfcc1, mfcc2));

回复收藏 0 原文

心头的小情儿 2024-12-04 21:55:36

我知道这个问题已经存在了近 10 年，但我现在正在寻找同样的东西，我个人发现上述建议太复杂了。
对于仍在搜索的其他人，您可以从简单地使用 scipy 开始使用 mfcc 数据获取两个矩阵之间的距离：

>>> from scipy.spatial import minkowski_distance
>>> a = [[-2.231413e+01,-5.495589e+01,-2.177988e+01,-1.719458e+01,-1.513321e+01,1.324277e+01,-9.265136e-01,1.542478e+01,1.007597e+01,7.356851e-01,1.106412e+01,-9.447377e+00,-1.325694e+00 ],[-2.294377e+01,-5.487790e+01,-2.152807e+01,-1.725173e+01,-1.500316e+01,1.287956e+01,-7.995839e-01,1.540848e+01,1.040512e+01,3.215451e-01,1.113061e+01,-9.390820e+00,-1.065433e+00 ], [-2.251059e+01,-5.475804e+01,-2.188462e+01,-1.709198e+01,-1.516142e+01,1.278525e+01,-7.952995e-01,1.602424e+01,9.981795e+00,4.940354e-01,1.081703e+01,-9.485857e+00,-7.487018e-01 ]]
>>> b = [[-2.231413e+01,-5.495589e+01,-2.177988e+01,-1.719458e+01,-1.513321e+01,1.324277e+01,-9.265136e-01,1.542478e+01,1.007597e+01,7.356851e-01,1.106412e+01,-9.447377e+00,-1.325694e+00 ], [-2.294327e+01,-5.488413e+01,-2.152952e+01,-1.724601e+01,-1.500094e+01,1.287461e+01,-8.023301e-01,1.541246e+01,1.040808e+01,3.185866e-01,1.112774e+01,-9.388848e+00,-1.062943e+00], [-2.250507e+01,-5.481581e+01,-2.189883e+01,-1.704281e+01,-1.514221e+01,1.274256e+01,-8.183736e-01,1.606115e+01,1.000806e+01,4.662135e-01,1.079070e+01,-9.468561e+00,-7.260294e-01 ]]
>>> minkowski_distance(a, b)
array([0.        , 0.01274899, 0.11421053])

https://docs.scipy.org/doc/scipy/reference/ generated/scipy.spatial.minkowski_distance.html

为了获取详细的 MFCC 数据，我使用了 yaafe（打包在 Docker 容器中）：
http://yaafe.github.io/Yaafe/manual/install.html

这是解决安装问题的方法：https://github.com/Yaafe/Yaafe/issues/52

I know the question is here for almost 10 years, but I was searching for the same thing now and I personally found the above suggestions to be too complicated.
For others who is still searching you can start with simply using scipy to get distance between two matrices with your mfcc data:

>>> from scipy.spatial import minkowski_distance
>>> a = [[-2.231413e+01,-5.495589e+01,-2.177988e+01,-1.719458e+01,-1.513321e+01,1.324277e+01,-9.265136e-01,1.542478e+01,1.007597e+01,7.356851e-01,1.106412e+01,-9.447377e+00,-1.325694e+00 ],[-2.294377e+01,-5.487790e+01,-2.152807e+01,-1.725173e+01,-1.500316e+01,1.287956e+01,-7.995839e-01,1.540848e+01,1.040512e+01,3.215451e-01,1.113061e+01,-9.390820e+00,-1.065433e+00 ], [-2.251059e+01,-5.475804e+01,-2.188462e+01,-1.709198e+01,-1.516142e+01,1.278525e+01,-7.952995e-01,1.602424e+01,9.981795e+00,4.940354e-01,1.081703e+01,-9.485857e+00,-7.487018e-01 ]]
>>> b = [[-2.231413e+01,-5.495589e+01,-2.177988e+01,-1.719458e+01,-1.513321e+01,1.324277e+01,-9.265136e-01,1.542478e+01,1.007597e+01,7.356851e-01,1.106412e+01,-9.447377e+00,-1.325694e+00 ], [-2.294327e+01,-5.488413e+01,-2.152952e+01,-1.724601e+01,-1.500094e+01,1.287461e+01,-8.023301e-01,1.541246e+01,1.040808e+01,3.185866e-01,1.112774e+01,-9.388848e+00,-1.062943e+00], [-2.250507e+01,-5.481581e+01,-2.189883e+01,-1.704281e+01,-1.514221e+01,1.274256e+01,-8.183736e-01,1.606115e+01,1.000806e+01,4.662135e-01,1.079070e+01,-9.468561e+00,-7.260294e-01 ]]
>>> minkowski_distance(a, b)
array([0.        , 0.01274899, 0.11421053])

https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.minkowski_distance.html

To get the detailed MFCC data I was using yaafe (packaged in Docker container):
http://yaafe.github.io/Yaafe/manual/install.html

This is how to workaround the installation issue: https://github.com/Yaafe/Yaafe/issues/52

回复收藏 0 原文

~没有更多了~