计算 2 个时间跨度之间的差异 DSP
这可能是一个广泛的答案,但我希望看到答案并与 SO 用户讨论这个线程。
到目前为止,我猜音频文件(WAV)的采样率可能是 44000 或 48000(我见过大多数这两个),从中我们可以确定文件中的单个秒(秒 00:00:01 ) 正好有 44000 个整数值,这意味着这里我们有一个 Int[]
,因此如果音频文件持续时间为 5 秒,则它有 5 * 44000 个整数(或 5 个样本)。
所以我的问题是,我们如何计算两个时间跨度之间内容的差异(或相似度),例如 Audio1.wav 和 Audio2.wav 在 00:00:01 具有相同的采样率。
This might be a wide answer but i would like to see answers and discuss this thread with SO users.
So far i guess a Audio File(WAV) has a Sample Rate which could be 44000 or 48000 (i've seen most these 2), and from that we can determine that a single Second into a File (second 00:00:01) has exactly 44000 Integer Values which means here we have an Int[]
, so if an Audio File Duration is 5 Seconds it has 5 * 44000 Integers (or 5 Samples).
So my question is, how can we calculate the difference (or similarity) of content between two time spans, like Audio1.wav and Audio2.wav at 00:00:01 with same Sample Rate.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的推理中有几个假设:
1. 该文件是原始未压缩(PCM 编码)数据。
2. 只有一个通道(单声道)。
最好从阅读一些格式说明和示例实现,然后搜索一些音频比较算法(1,2,3)。
链接问题:比较两个频谱图以查找偏移量他们匹配算法的地方
There are couple assumptions in your reasoning:
1. The file is the raw uncompressed (PCM encoded) data.
2. There is only one channel (mono).
It's better to start from reading some format descriptions and sample implementations, then search for some audio comparison algorithms (1, 2, 3).
Linked Q: Compare two spectogram to find the offset where they match algorithm
实现此目的的一种方法是将信号从 44100 Hz 重新采样到 48000 Hz,因此两个信号具有相同的采样率,并执行互相关。互相关的形状可以是相似性的度量。您可以查看峰的高度,或者峰中的能量与总能量的比率。
但请注意,当信号重复时,您将获得多个互相关峰值。
One way to do this would be to resample the signal from 44100 Hz to 48000 Hz, so both signals have the same samplerate, and perform a cross-correlation. The shape of the cross-correlation could be a measure of similarity. You could look at the height of the peak, or the ratio of energy in the peak to the total energy.
Note however that when the signal repeats itself, you will get multiple cross-correlation peaks.