解析“口哨”音高/音符的声音
我正在尝试构建一个能够处理某人吹口哨的记录并输出音符的系统。
谁能推荐一个开源平台,我可以将其用作波形文件的音符/音高识别和分析的基础?
提前致谢
I am trying to build a system that will be able to process a record of someone whistling and output notes.
Can anyone recommend an open-source platform which I can use as the base for the note/pitch recognition and analysis of wave files ?
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
正如许多其他人已经说过的,FFT 是解决这个问题的方法。我使用来自 http://www.cs.princeton 的 FFT 代码在 Java 中编写了一个小示例.edu/introcs/97data/。为了运行它,您还需要该页面中的 Complex 类(请参阅源代码以获取确切的 URL)。
该代码读入一个文件,逐窗口遍历该文件,并对每个窗口执行 FFT。对于每个 FFT,它都会查找最大系数并输出相应的频率。这对于像正弦波这样的干净信号确实非常有效,但对于实际的口哨声,您可能需要添加更多。我已经用我自己创建的一些口哨文件进行了测试(使用笔记本电脑的集成麦克风),代码确实了解了正在发生的事情,但为了获得实际的笔记,还需要做更多的事情。
1)您可能需要一些更智能的窗口技术。我的代码现在使用的是一个简单的矩形窗口。由于 FFT 假设输入信号可以周期性地连续,因此当窗口中的第一个和最后一个样本不匹配时,会检测到附加频率。这称为频谱泄漏(http://en.wikipedia.org/wiki/Spectral_leakage) ,通常使用一个窗口来降低窗口开头和结尾处的样本权重 ( http:// en.wikipedia.org/wiki/Window_function)。虽然泄漏不应导致错误的频率被检测为最大值,但使用窗口会提高检测质量。
2) 要将频率与实际音符相匹配,您可以使用包含频率的数组(例如 a' 为 440 Hz),然后查找最接近已识别频率的频率。但是,如果口哨声不符合标准调音,则此功能将不再有效。鉴于口哨仍然正确,但只是调音不同(就像吉他或其他乐器可以以不同方式调音,但听起来仍然“好”,只要所有琴弦的调音一致),您仍然可以通过查看来找到音符以所识别的频率的比率。您可以阅读http://en.wikipedia.org/wiki/Pitch_%28music%29< /a> 作为起点。这也很有趣:http://en.wikipedia.org/wiki/Piano_key_frequencies
3)此外检测每个单独的音调开始和停止的时间点可能会很有趣。这可以作为预处理步骤添加。然后您可以对每个单独的音符进行 FFT。然而,如果吹口哨者不停歇,只是在音符之间弯曲,那就没那么容易了。
一定要看看其他人建议的库。我不知道其中任何一个,但也许它们已经包含执行我上面描述的功能。
现在来看代码。请告诉我什么对你有用,我发现这个话题很有趣。
编辑:我更新了代码以包含重叠和从频率到音符的简单映射器。但如上所述,它仅适用于“经过调整的”口哨者。
As many others have already said, FFT is the way to go here. I've written a little example in Java using FFT code from http://www.cs.princeton.edu/introcs/97data/. In order to run it, you will need the Complex class from that page also (see the source for the exact URL).
The code reads in a file, goes window-wise over it and does an FFT on each window. For each FFT it looks for the maximum coefficient and outputs the corresponding frequency. This does work very well for clean signals like a sine wave, but for an actual whistle sound you probably have to add more. I've tested with a few files with whistling I created myself (using the integrated mic of my laptop computer), the code does get the idea of what's going on, but in order to get actual notes more needs to be done.
1) You might need some more intelligent window technique. What my code uses now is a simple rectangular window. Since the FFT assumes that the input singal can be periodically continued, additional frequencies are detected when the first and the last sample in the window don't match. This is known as spectral leakage ( http://en.wikipedia.org/wiki/Spectral_leakage ), usually one uses a window that down-weights samples at the beginning and the end of the window ( http://en.wikipedia.org/wiki/Window_function ). Although the leakage shouldn't cause the wrong frequency to be detected as the maximum, using a window will increase the detection quality.
2) To match the frequencies to actual notes, you could use an array containing the frequencies (like 440 Hz for a') and then look for the frequency that's closest to the one that has been identified. However, if the whistling is off standard tuning, this won't work any more. Given that the whistling is still correct but only tuned differently (like a guitar or other musical instrument can be tuned differently and still sound "good", as long as the tuning is done consistently for all strings), you could still find notes by looking at the ratios of the identified frequencies. You can read http://en.wikipedia.org/wiki/Pitch_%28music%29 as a starting point on that. This is also interesting: http://en.wikipedia.org/wiki/Piano_key_frequencies
3) Moreover it might be interesting to detect the points in time when each individual tone starts and stops. This could be added as a pre-processing step. You could do an FFT for each individual note then. However, if the whistler doesn't stop but just bends between notes, this would not be that easy.
Definitely have a look at the libraries the others suggested. I don't know any of them, but maybe they contain already functionality for doing what I've described above.
And now to the code. Please let me know what worked for you, I find this topic pretty interesting.
Edit: I updated the code to include overlapping and a simple mapper from frequencies to notes. It works only for "tuned" whistlers though, as mentioned above.
我认为这个开源平台适合你
http://code.google.com/p/musicg-sound-api/
i think this open-source platform suits you
http://code.google.com/p/musicg-sound-api/
嗯,您始终可以使用 fftw 来执行快速傅里叶变换。这是一个非常受尊敬的框架。一旦获得信号的 FFT,您就可以分析所得阵列的峰值。简单的直方图样式分析应该为您提供最大音量的频率。然后您只需将这些频率与不同音高对应的频率进行比较。
Well, you could always use fftw to perform the Fast Fourier Transform. It's a very well respected framework. Once you've got an FFT of your signal you can analyze the resultant array for peaks. A simple histogram style analysis should give you the frequencies with the greatest volume. Then you just have to compare those frequencies to the frequencies that correspond with different pitches.
除了其他很棒的选项之外:
in addition to the other great options:
您可能需要考虑 Python(x,y)。它是一个秉承 Matlab 精神的 Python 科学编程框架,并且具有在 FFT 领域工作的简单功能。
You might want to consider Python(x,y). It's a scientific programming framework for python in the spirit of Matlab, and it has easy functions for working in the FFT domain.
如果您使用 Java,请查看 TarsosDSP 库。它有一个非常好的现成的音调检测器。
这里是android的一个例子,但我认为它不需要太多修改就可以使用它别处。
If you use Java, have a look at TarsosDSP library. It has a pretty good ready-to-go pitch detector.
Here is an example for android, but I think it doesn't require too much modifications to use it elsewhere.
我是 FFT 的粉丝,但对于单声道且相当纯净的正弦口哨音,过零检测器可以以更低的处理成本更好地确定实际频率。过零检测用于电子频率计数器,测量正在测试的时钟速率。
如果您要分析纯正弦波音调以外的任何内容,那么 FFT 绝对是您的最佳选择。
一个非常简单的GitHub 上 Java 实现零交叉检测
I'm a fan of the FFT but for the monophonic and fairly pure sinusoidal tones of whistling, a zero-cross detector would do a far better job at determining the actual frequency at a much lower processing cost. Zero-cross detection is used in electronic frequency counters that measure the clock rate of whatever is being tested.
If you going to analyze anything other than pure sine wave tones, then FFT is definitely the way to go.
A very simple implementation of zero cross detection in Java on GitHub