比较 wav 文件

发布于 2024-10-05 12:53:35 字数 679 浏览 4 评论 0原文

我有一个(大部分)工作程序来比较两个 wav 文件,看看较小的文件是否在较大的文件中。这是用java完成的。

我首先确保两个 wav 文件都是规范波形格式。然后,我使用 AudioInputStream 从中获取数据的字节数组。我以一定帧速率(例如现在:4096 字节)的块取出数据。我获取较小输入的第一个块,然后遍历较大输入中相同大小的块。

我获取这些块并使用相同的数据创建双数组。我得到他们的 FFT,并使用相关函数在相关系数结果数组中找到峰值。然后我转到较小输入的下一个块,看看是否出现类似的峰值。

这是有效的,当文件相同时峰值很明显,并且大多数时候结果是正确的。我没有得到误报。然而,我确实得到了假阴性。

这是因为我不确定如何“对齐”数据。较小的文件可以来自较大文件中的任何点。大多数时候,这是通过我这样做的分块方法捕获的。但有时,尽管文件应返回高相关性,但文件似乎不同,并且未找到峰值。

如果我取出其中一个漏报(无峰值)的文件,并对它们进行一些调整,在它们的末尾或开头剪掉几千个字节,然后再次运行该程序,它会突然找到峰值,并且它是一个非常明确的匹配。因此,它确实有效,只是不知何故找不到相关性明显的峰值。我的相关函数会转换 FFT 以使它们匹配,因此我认为这将涵盖所有内容,但显然我没有涵盖所有数据。

我不确定如何将较小文件的块“对齐”到较大文件中出现的位置,以便关联函数能够识别相关发生的位置。一切正常,我只需要消除误报。有什么建议吗?

I have a (mostly) working program to compare two wav files, to see if the smaller one is in the bigger one. This is done in java.

I do this by first making sure both wav files are in canonical wave format. I then get a byte array of data out of them using AudioInputStream. I take out data in chunks of a certain frame rate (such as right now: 4096 bytes). I take the first chunk of the smaller input, and go through chunks of the same size in the larger input.

I take these chunks and create double arrays with the same data. I get their FFT's and use a correlate function to find a peak in the resulting array of correlation coefficients. I then go to the next chunk of the smaller input, and see if a similar peak appears.

This works, the peaks are obvious when the files are the same, and most of the time the results are correct. I do not get false positives. I do, however, get false negatives.

This is because i'm not sure how to "Align" the data. The smaller file could come from any point in the larger file. Most of the time, this is caught via the chunking method I do this. But sometimes, it is as if the files are different, and no peak is found, though the files should return a high correlation.

If I take one of the files that are false negatives (no peak), and tweak them around a bit, snipping away at the end or beginning of them a few thousand bytes, and run the program again, it suddenly finds the peak and it is a very clear match. Thus, it does work, it is just somehow not finding the peak where the correlation is obvious. The correlation function I have translates the FFTs so that they match, so I would think that this would cover everything, but clearly I am not covering all of the data.

I'm not sure how to "align" the chunk of the smaller file to wherever it occurs in the larger file so that the correlate function picks up on where the correlation occurs. Everything works, I just need to eliminate the false negatives. Any advice?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

£冰雨忧蓝° 2024-10-12 12:53:35

使用卷积滤波器比较两个波形。它会告诉您匹配是否发生以及在哪里发生。计算卷积的快速算法是可用

Use a convolution filter to compare two waveforms. It will tell you if and where a match occurs. Fast algorithms to compute convolutions are available.

情话墙 2024-10-12 12:53:35

这称为匹配过滤器。由于分块,您的实施正在受到影响。传统上,您将输入视为连续流,从每个样本开始提取一个块,然后进行相关。因此,如果您的输入长度为 10k 个样本,则最终会运行过滤器 10k 次,每次将 4k 个样本放入过滤器中(在您的示例中)。然而,这很慢。有几种方法可以加快速度:

  1. 使用小块(例如 256 个点)来加快 FFT 计算速度。您的相关性可能看起来不太好,从而导致更多误报,但也许您可以列出可能的匹配项,然后返回并查看更大的块。

  2. 不要从输入中的每个样本开始获取缓冲区,而是从每个第 512 个样本开始获取 4k 缓冲区,然后进行相关性(类似于 Marcelo Cantos 在评论中的建议)。然后,在中间周围的 512 个样本中查找峰值,因为时间偏移会导致尖峰偏移。此外,边缘处额外的不相关样本将导致峰值不是全值,因此如果有的话,您需要放宽该约束。同样,这可能会导致更多误报,因此您必须再次采用列表方法。

在实现细节方面,我假设您已经从较小的文件中预先计算了块?另外,您没有说明是否检查时域或频域的相关性。您可以在频域中寻找平坦幅度,这相当于时域中的尖峰,以节省逆 FFT。您必须做一些实验来确定频谱的平坦程度,但这可能会大大缩短时间。

This is called a matched filter. Your implementation is suffering because of the chunking. Traditionally, you take the input as a continuous stream, extract a chunk starting at each sample, and do the correlation. So if your input is 10k samples long, you end up running the filter 10k times, each time taking 4k samples into the filter (in your example). However, this is slow. There are a couple of ways to speed things up:

  1. Use small chunks, like 256 points, to make the FFT computations faster. Your correlations probably won't look quite as nice, leading to more false positives, but maybe you can make a list of possible matches and go back and look with bigger chunks.

  2. Rather than taking buffers starting from every sample in the input, take 4k buffers starting at every 512'th sample, say, and do the correlations (similar to Marcelo Cantos's suggestion in his comment). Then, look for peaks within the 512 samples around the middle, since the time shift will cause the spike to shift. Also, the extra non-correlated samples at the edges will cause the peak to not be full-valued, so you'll need to relax that constraint if you have it. Again, this might lead to more false positives, so you again have to resort to a list approach.

On the implementation detail side of things, I assume you already pre-compute the chunks from the smaller file? Also, you don't say whether you check the correlation in the time or frequency domains. You could look for flat magnitude in the frequency domain, which would equate to a spike in the time domain, to save yourself the inverse FFT. You'll have to do some experiments to determine how flat the spectrum is, but this might cut the time down quite a bit.

心房的律动 2024-10-12 12:53:35

我不确定我完全掌握了您正在使用的算法,但这里有一个想法:如果您可以通过手动剪掉开头和结尾的位来识别波,那么这对于您的算法来说不是一个可能的解决方案吗?

I'm not sure I completely grasp the algorithm you are using, but here's a thought: If you can get waves to be recognized by manually snipping away bits at the beginning and end, isn't that a possible solution for your algorithm also?

淡淡離愁欲言轉身 2024-10-12 12:53:35

您可以查看这篇论文。它解释了 Shazam 服务使用的算法,该算法从几秒钟的样本中识别音乐。
另一种方法此处,使用自组织映射来聚类相似的音乐。不完全是你想做的,但它可以给你想法。

You can have a look at this paper. It explains the algorithm used by the shazam service which identify music from a sample of a few seconds.
Another method here, using self organizing maps to cluster similar music. Not exactly what you want to do but it can give you ideas.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文