实时音高检测
用于实时检测用户歌唱的 FFT 和 自相关 没有得到好的结果。我找不到 C/C++ 方法。
麦克风输入数据是正确的,并且当使用正弦波时,结果或多或少是正确的音调。我通过从结果数组和每个索引中取出值来可视化自相关,在 X 轴上绘制索引,在 Y 轴上绘制值(两者都除以 100,000,我使用 OpenGL,使用 VST 插件不是一个选项)。它看起来像随机的点。如何可视化原始音频和自相关数据?
For real time pitch detection of a user's singing FFT and autocorrelation don't get a good result. I can't find C / C++ methods.
Microphone input data is correct and when using a sine wave results are more or less the correct pitch. I'm visualizing autocorrelation by taking the values out of results array and each index, plotting index on the X axis and the value on Y axis (both are divided by 100,000, I'm using OpenGL, using VST plugins isn't an option). It looks like random dots. How to visualize the raw audio and autocorrelation data?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
退后一步......为了让这个工作正常进行,您必须找到一种方法来绘制这个过程的中间步骤。您尝试做的事情并不是特别难,但很容易出错且繁琐。削波、开窗、接线不良、混叠、直流偏移、读取错误的通道、奇怪的 FFT 频率轴、阻抗不匹配、帧大小错误……谁知道呢。但如果您可以绘制原始数据,然后绘制 FFT,一切都会变得清晰。
Taking a step back... To get this working you MUST figure out a way to plot intermediate steps of this process. What you're trying to do is not particularly hard, but it is error prone and fiddly. Clipping, windowing, bad wiring, aliasing, DC offsets, reading the wrong channels, the weird FFT frequency axis, impedance mismatches, frame size errors... who knows. But if you can plot the raw data, and then plot the FFT, all will become clear.
我发现了几个实时音高跟踪的开源实现
dywapitchtrack使用小波-基于算法
“实时 C# 音调跟踪器”使用修改后的自相关方法现已从 Codeplex 中删除 - 尝试搜索 GitHub
aubio (提到通过piem;有多种算法可用)
还有一些音高跟踪器可能不适合设计实时,但据我所知可能可以这样使用,并且也可以作为将实时跟踪器与以下内容进行比较的参考:
Praat 是一个开源包,有时被语言学家用于音调提取,您可以在 http://www.fon.hum.uva.nl/paul/praat.html
Snack 和 WaveSurfer 还包含音调提取器< /p>
I found several open source implementations of real-time pitch tracking
dywapitchtrack uses a wavelet-based algorithm
"Realtime C# Pitch Tracker" uses a modified autocorrelation approach now removed from Codeplex - try searching on GitHub
aubio (mentioned by piem; several algorithms are available)
There are also some pitch trackers out there which might not be designed for real-time, but may be usable that way for all I know, and could also be useful as a reference to compare your real-time tracker to:
Praat is an open source package sometimes used for pitch extraction by linguists and you can find the algorithm documented at http://www.fon.hum.uva.nl/paul/praat.html
Snack and WaveSurfer also contain a pitch extractor
我知道这个答案不会让每个人都满意,但就这样吧。
这个东西很难,非常难。首先,尽可能多地阅读有关 FFT、自相关、小波的教程。尽管我仍然在 DSP 方面苦苦挣扎,但我确实从以下内容中获得了一些见解。
https://www.coursera.org/course/audio 该课程未在但视频仍然可用。
http://miracle.otago.ac.nz/tartini/papers/Philip_McLeod_PhD.pdf 关于音高识别算法开发的论文。
http://dsp.stackexchange.com 致力于数字信号处理的整个网站。
如果像我一样,您没有做足够的数学来完全遵循教程,请不要放弃,因为一些图表和示例仍然可以帮助我理解正在发生的事情。
接下来是测试数据和测试。自己编写一个库,生成用于检查算法的测试文件。
1)一个超级简单的纯正弦波发生器。假设您正在考虑编写 YAT(又一个调谐器),然后使用正弦发生器创建一系列 440Hz 左右的文件,例如以不同的增量从 420-460Hz 开始,看看您的代码有多敏感和准确。它能解析到 5Hz、1Hz 以内,甚至更精细吗?
2) 然后升级您的正弦波发生器,使其在信号中添加一系列较弱的谐波。
3) 接下来是现实世界中谐波的变化。因此,对于大多数弦乐器,您会看到一系列谐波作为基频 F0 的简单倍数,而对于单簧管和长笛等乐器,由于空气在室内的行为方式,偶次谐波将丢失或非常弱。对于某些乐器,F0 缺失,但可以从其他谐波的分布中确定。 F0 是人耳感知的音高。
4)通过以不规则的方式上下移动谐波峰值频率来故意引入一些失真。
要点是,如果您正在创建具有已知结果的文件,那么更容易验证您正在构建的内容是否确实有效,当然,除了错误之外。
还有许多包含声音样本的“库”。
上述 Coursera 系列中的 https://freesound.org。
http://theremin.music.uiowa.edu/MIS.html
接下来要注意的是您的麦克风并不完美,除非您花费了数千美元,否则它的频率响应范围会相当可变。特别是如果您使用低音,那么更便宜的麦克风,请阅读您的 PC 或手机中的内置麦克风,从 80-100Hz 左右开始有显着的滚降。对于相当好的外部设备,您可能会降低到 30-40Hz。去查找麦克风上的数据。
您还可以通过扬声器播放提示音,然后用您喜欢的麦克风录音来检查发生了什么。当然,现在我们谈论的是两组频率响应曲线。
就性能而言,有许多免费可用的库,但请注意各种许可模型。
最重要的是,在最初几次尝试后不要放弃。祝你好运。
I know this answer isn't going to make everyone happy but here goes.
This stuff is hard, very hard. Firstly go read as many tutorials as you can find on FFT, Autocorrelation, Wavelets. Although I'm still struggling with DSP I did get some insights from the following.
https://www.coursera.org/course/audio the course isn't running at the moment but the videos are still available.
http://miracle.otago.ac.nz/tartini/papers/Philip_McLeod_PhD.pdf thesis about the development of a pitch recognition algorithm.
http://dsp.stackexchange.com a whole site dedicated to digital signal processing.
If like me you didn't do enough maths to completely follow the tutorials don't give up as some of the diagrams and examples still helped me to understand what was going on.
Next is test data and testing. Write yourself a library that generates test files to use in checking your algorithm/s.
1) A super simple pure sine wave generator. So say you are looking at writing YAT(Yet Another Tuner) then use your sine generator to create a series of files around 440Hz say from 420-460Hz in varying increments and see how sensitive and accurate your code is. Can it resolve to within 5Hz, 1Hz, finer still?
2) Then upgrade your sine wave generator so that it adds a series of weaker harmonics to the signal.
3) Next are real world variations on harmonics. So whilst for most stringed instruments you'll see a series of harmonics as simple multiples of the fundamental frequency F0, for instruments like clarinets and flutes because of the way the air behaves in the chamber the even harmonics will be missing or very weak. And for some instruments F0 is missing but can be determined from the distribution of the other harmonics. F0 being what the human ear perceives as pitch.
4) Throw in some deliberate distortion by shifting the harmonic peak frequencies up and down in an irregular manner
The point being that if you are creating files with known results then its easier to verify that what you are building actually works, bugs aside of course.
There are also a number of "libraries" out there containing sound samples.
https://freesound.org from the Coursera series mentioned above.
http://theremin.music.uiowa.edu/MIS.html
Next be aware that your microphone is not perfect and unless you have spent thousands of dollars on it will have a fairly variable frequency response range. In particular if you are working with low notes then cheaper microphones, read the inbuilt ones in your PC or Phone, have significant rolloff starting at around 80-100Hz. For reasonably good external ones you might get down to 30-40Hz. Go find the data on your microphone.
You can also check what happens by playing the tone through speakers and then recording with you favourite microphone. But of course now we are talking about 2 sets of frequency response curves.
When it comes to performance there are a number of freely available libraries out there although do be aware of the various licensing models.
Above all don't give up after your first couple of tries. Best of luck.
下面是我设计的一种不寻常的两阶段算法的 C++ 源代码,该算法可以在 Windows 上播放的和弦 MP3 文件上进行实时音高检测。此免费应用程序(PitchScope Player,可在网络上获取)经常用于检测 MP3 录音中吉他或萨克斯管独奏的音符。该算法旨在检测 MP3 音乐文件中任何给定时刻的最主要音调(音符)。通过 MP3 录音期间任何给定时刻最主要音高(音符)的显着变化,可以准确地推断出音符开始。
当在钢琴上按下单个琴键时,我们听到的不仅仅是声音振动的一个频率,而是在不同数学相关频率下发生的多种声音振动的组合。这种不同频率的振动组合的元素被称为谐波或分音。例如,如果我们按钢琴上的中间 C 键,复合谐波的各个频率将从 261.6 Hz 作为基频开始,523 Hz 将是第二谐波,785 Hz 将是第三谐波,1046 Hz 将是是第 4 次谐波,依此类推。后面的谐波是基频 261.6 Hz 的整数倍(例如:2 x 261.6 = 523、3 x 261.6 = 785、4 x 261.6 = 1046)。底部链接是吉他独奏的复调 MP3 录音期间发生的实际和声的快照。
我使用修改后的 DFT 变换(具有对数频率间隔)而不是 FFT,首先通过查找具有峰值电平的频率来检测这些可能的谐波(见下图)。由于我为修改后的 Log DFT 收集数据的方式,我不必对信号应用加窗函数,也不必添加和重叠。我创建了 DFT,因此它的频率通道以对数方式定位,以便直接与吉他、萨克斯管等音符创建谐波的频率对齐。
现在退休了,我决定发布我的源代码名为 PitchScope Player 的免费演示应用程序中的音高检测引擎。 PitchScope Player 可在网络上使用,您可以下载适用于 Windows 的可执行文件,以查看我的算法在您选择的 mp3 文件上的工作情况。下面的 GitHub.com 链接将引导您访问我的完整源代码,您可以在其中查看我如何使用自定义对数 DFT 变换检测谐波,然后查找频率满足定义 ' 的正确整数关系的分音(谐波)沥青'。
我的音高检测算法实际上是一个两阶段的过程:a) 首先检测ScalePitch(“ScalePitch”有 12 个可能的音高值:{E、F、F#、G、G#、A、A# , B, C, C#, D, D#} ) b) 确定 ScalePitch 后,通过检查 4 个可能的八度候选音符的所有泛音来计算八度。该算法旨在检测和弦 MP3 文件中任何给定时刻的最主要音调(音符)。这通常对应于器乐独奏的音符。对我的两阶段音高检测算法的 C++ 源代码感兴趣的人可能希望从 GitHub.com 上 SPitchCalc.cpp 文件中的 Estimate_ScalePitch() 函数开始。
https://github.com/CreativeDetectors/PitchScope_Player
下面是对数 DFT 的图像(由我创建) C++ 软件)在和弦 mp3 录音中录制 3 秒的吉他独奏。它显示了在演奏独奏时吉他上各个音符的和声如何出现。对于这个对数 DFT 上的每个音符,我们可以看到它的多个谐波垂直延伸,因为每个谐波将具有相同的时间宽度。确定了音符的八度后,我们就知道基音的频率了。
Here's the C++ source code for an unusual two-stage algorithm that I devised which can do Realtime Pitch Detection on polyphonic MP3 files while being played on Windows. This free application (PitchScope Player, available on web) is frequently used to detect the notes of a guitar or saxophone solo upon a MP3 recording. The algorithm is designed to detect the most dominant pitch (a musical note) at any given moment in time within a MP3 music file. Note onsets are accurately inferred by a significant change in the most dominant pitch (a musical note) at any given moment during the MP3 recording.
When a single key is pressed upon a piano, what we hear is not just one frequency of sound vibration, but a composite of multiple sound vibrations occurring at different mathematically related frequencies. The elements of this composite of vibrations at differing frequencies are referred to as harmonics or partials. For instance, if we press the Middle C key on the piano, the individual frequencies of the composite's harmonics will start at 261.6 Hz as the fundamental frequency, 523 Hz would be the 2nd Harmonic, 785 Hz would be the 3rd Harmonic, 1046 Hz would be the 4th Harmonic, etc. The later harmonics are integer multiples of the fundamental frequency, 261.6 Hz ( ex: 2 x 261.6 = 523, 3 x 261.6 = 785, 4 x 261.6 = 1046 ). Linked at the bottom, is a snapshot of the actual harmonics which occur during a polyphonic MP3 recording of a guitar solo.
Instead of a FFT, I use a modified DFT transform, with logarithmic frequency spacing, to first detect these possible harmonics by looking for frequencies with peak levels (see diagram below). Because of the way that I gather data for my modified Log DFT, I do NOT have to apply a Windowing Function to the signal, nor do add and overlap. And I have created the DFT so its frequency channels are logarithmically located in order to directly align with the frequencies where harmonics are created by the notes on a guitar, saxophone, etc.
Now being retired, I have decided to release the source code for my pitch detection engine within a free demonstration app called PitchScope Player. PitchScope Player is available on the web, and you could download the executable for Windows to see my algorithm at work on a mp3 file of your choosing. The below link to GitHub.com will lead you to my full source code where you can view how I detect the harmonics with a custom Logarithmic DFT transform, and then look for partials (harmonics) whose frequencies satisfy the correct integer relationship which defines a 'pitch'.
My Pitch Detection Algorithm is actually a two-stage process: a) First the ScalePitch is detected ('ScalePitch' has 12 possible pitch values: {E, F, F#, G, G#, A, A#, B, C, C#, D, D#} ) b) and after ScalePitch is determined, then the Octave is calculated by examining all the harmonics for the 4 possible Octave-Candidate notes. The algorithm is designed to detect the most dominant pitch (a musical note) at any given moment in time within a polyphonic MP3 file. That usually corresponds to the notes of an instrumental solo. Those interested in the C++ source code for my Two-Stage Pitch Detection algorithm might want to start at the Estimate_ScalePitch() function within the SPitchCalc.cpp file at GitHub.com.
https://github.com/CreativeDetectors/PitchScope_Player
Below is the image of a Logarithmic DFT (created by my C++ software) for 3 seconds of a guitar solo on a polyphonic mp3 recording. It shows how the harmonics appear for individual notes on a guitar, while playing a solo. For each note on this Logarithmic DFT we can see its multiple harmonics extending vertically, because each harmonic will have the same time-width. After the Octave of the note is determined, then we know the frequency of the Fundamental.
几年前,我在一个项目中遇到了类似的麦克风输入问题 - 结果是由于直流偏移造成的。
在尝试 FFT 或您正在使用的任何其他方法之前,请确保消除任何偏差。
您也可能遇到净空或削波问题。
图表是诊断大多数音频问题的最佳方法。
I had a similar problem with microphone input on a project I did a few years back - turned out to be due to a DC offset.
Make sure you remove any bias before attempting FFT or whatever other method you are using.
It is also possible that you are running into headroom or clipping problems.
Graphs are the best way to diagnose most problems with audio.
看一下这个示例应用程序:
http://www.codeproject.com/ KB/audio-video/SoundCatcher.aspx
我意识到该应用程序是用 C# 编写的,而你需要 C++,我意识到这是 .Net/Windows,而你使用的是 mac...但我想到了他的 FFT 实现可能是一个起始参考点。尝试将您的 FFT 实现与他的进行比较。 (他的算法是 Cooley-Tukey 的 FFT 的迭代、广度优先版本)。它们相似吗?
此外,您描述的“随机”行为可能是因为您直接获取声卡返回的数据,而没有正确组装字节数组中的值。您是否要求声卡采样 16 位值,然后给它一个字节数组来存储这些值?如果是这样,请记住返回数组中的两个连续字节构成一个 16 位音频样本。
Take a look at this sample application:
http://www.codeproject.com/KB/audio-video/SoundCatcher.aspx
I realize the app is in C# and you need C++, and I realize this is .Net/Windows and you're on a mac... But I figured his FFT implementation might be a starting reference point. Try to compare your FFT implementation to his. (His is the iterative, breadth-first version of Cooley-Tukey's FFT). Are they similar?
Also, the "random" behavior you're describing might be because you're grabbing data returned by your sound card directly without assembling the values from the byte-array properly. Did you ask your sound card to sample 16 bit values, and then gave it a byte-array to store the values in? If so, remember that two consecutive bytes in the returned array make up one 16-bit audio sample.
以下是一些实现音高检测的开源库:
Here are some open source libraries that implement pitch detection: