音频指纹识别和标准化

发布于 2024-12-28 07:31:03 字数 1259 浏览 0 评论 0原文

我编写了一个应用程序,允许使用描述的方法进行音频指纹 在这里。它基本上将 mp3 转换为 wav,然后在数据库中创建一堆哈希码。然后,我使用我的 iPhone 创建一个录音,该录音有一些噪音,并比较哈希码并获取链接中记录的匹配项。哇,太酷了!!

我现在使用 USB 无线电接收器录制无线电样本。我在 byte[] 数组中获取声音数据,然后执行与存储哈希码完全相同的操作,然后尝试匹配它。这次不行了。

我的感觉是 mp3 已经标准化(对其应用了压缩),这可能就是区别。我想不出任何其他差异,因为它们(mp3 和收音机样本)都转换为 wav 格式(16 位)

我想我的问题是双重的:

  1. 如果我压缩收音机样本,你认为它会起作用吗? p>

  2. 为此,我需要应用压缩函数,这意味着我需要使柔和的声音更大声,而更大声的声音更柔和。

我已经开始编写一个函数,它接受一个字节数组(16 位格式的 wav 数据),并希望循环遍历它并相应地调整样本值以进行压缩,但我对此感到挣扎:

List<short> ints = new List<short>();
        for (int j = 0; j < byteArray.Count; j+=2)
        {
            //so for 16 bits every 2 bytes in the array is a sample
            short sample16 = 0;
            byte[] sample = new byte[2];
            sample[0] = byteArray[j];
            sample[1] = byteArray[j+1];

            sample16 = (short)(double)BitConverter.ToInt16(sample, 0);
            //at this point change the sample according to the compression needed
            ints.Add(sample16);

            //back again to test it
            byte[] buffer11 = BitConverter.GetBytes(sample16);
        }

I've written an application which allows audio fingerprinting using the method described here. It basically converts an mp3 to a wav and then creates a bunch of hashcodes in a database. I then create a recording using my iphone which has some noise and compare the hashcodes and get matches as documented in the link. Wow, its cool!!

Im now recording radio samples using a USB radio receiver. I get the sound data in a byte[] array and then do exactly the same thing where i store the hashcodes and then try to match it. This time it doesnt work.

My feeling is that the mp3 has been normalized (had compression applied to it) and this might be the difference. I couldnt think of any other differences as they are both (the mp3 and radio sample) converted to wav format (16bit)

I guess my question is twofold:

  1. if i compress the radio sample do you think that itll work?

  2. To do this i need to apply a compression function which means i need to make the soft sounds louder and the louder sounds softer.

Ive started writing a function which takes a byte array (of the wav data in 16 bit format) and wanted to cycle through it and adjust the sample values accordingly to do the compression but im struggling with this:

List<short> ints = new List<short>();
        for (int j = 0; j < byteArray.Count; j+=2)
        {
            //so for 16 bits every 2 bytes in the array is a sample
            short sample16 = 0;
            byte[] sample = new byte[2];
            sample[0] = byteArray[j];
            sample[1] = byteArray[j+1];

            sample16 = (short)(double)BitConverter.ToInt16(sample, 0);
            //at this point change the sample according to the compression needed
            ints.Add(sample16);

            //back again to test it
            byte[] buffer11 = BitConverter.GetBytes(sample16);
        }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

錯遇了你 2025-01-04 07:31:03

正如 sblom 在他的评论中已经指出的那样,频域哈希不受动态范围的影响。根据您提供的信息,我认为您的输入之间缺乏一些频率。请注意,MP3 具有基于人类感知的心理声学音频模型。它精确地丢弃或掩盖了一些频率。因此,您的无线电源可能包含或缺少一些重要的频率来正确识别您的输入。

As sblom already stated in his comments, frequency domain hashing is not affected by dynamic range. According to your given information, I would think lacking of some frequencies between your inputs. Note that, MP3 has a psychoacoustic audio model which based on human perception. It precisely discards or masks some frequencies. So, your radio source may include or lack of some important frequencies to correctly recognize your inputs.

吹泡泡o 2025-01-04 07:31:03

为了做到这一点,有大量重要的背景。您具体尝试执行的操作称为动态范围压缩

我认为您想要做的是测量一段样本的平均振幅(可能使用 均方根)。然后将该段中的所有样本除以 RMS 平均幅度。这将导致整首歌曲具有相同的 RMS 幅度。

您必须尝试每个片段的正确长度。如果是 10-40 毫秒,那么它可能会足够短,这样音量变化听起来就不会太刺耳,而且足够长,你会得到良好的 RMS 测量结果。

There's a ton of important background in order to get this right. What you're specifically trying to do is called Dynamic Range Compression.

I think what you'll want to do is measure the average amplitude over a segment of the samples (probably using Root Mean Square). And then divide all the samples in that segment by that RMS average amplitude. This will result in having the same RMS amplitude over the entire song.

You'll have to experiment with what the right length is for each segment. Probably, if it's 10-40 ms, it'll be short enough that volume changes won't sound too jarring and long enough that you'll get a good RMS measurement.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文