我一直无法理解音频数据是如何存储的。但是,我想知道一种找到 PCM 数据音调的方法。举例来说,我以给定的采样率以 16 位单声道 PCM 格式录制了钢琴上敲击的单个琴键。我如何找到音频的频率(以赫兹为单位)?获取平均频率的简单代码对我来说很有效,但如何更好地理解格式的更详细解释将是理想的。
谢谢!
I've never been able to understand how audio data is stored. However, I'd like to know a way to find the pitch of PCM data. Let's say, for example, that I recorded a single key being striked on a piano, in 16-bit mono PCM format at a given sample rate. How could I find the frequency, in hertz, of the audio? Simple code to get the average frequency works for me, but a more detailed explanation of how to better understand the format would be ideal.
Thanks!
发布评论
评论(1)
PCM 音频不存储为一系列音高。为了解决这个问题,您需要快速傅里叶变换(FFT)。请参阅 https://stackoverflow.com/search?q=pitch+detection,有 10 篇关于此的帖子已经。
想想音频波形。 PCM 编码只是每秒对该波进行一定次数的采样,并且每个采样使用特定数量的位数。
图片来自 http://en.wikipedia.org/wiki/Pulse-code_modulation
44.1kHz 的 16 位单声道 PCM 意味着每秒 44,100 次,一个 16 位值(2 个字节)将存储代表采样特定时间的波形。 44.1kHz 的速度足以存储接近 22kHz 的频率(请参阅奈奎斯特频率)。
FFT 将这些样本从时域转换到频域。也就是说,您可以找到特定时间段内所有频率的级别。您查看的波段越多,计算量就越大。
PCM audio is not stored as a series of pitches. To figure that up, you need a Fast Fourier Transform, or FFT. See https://stackoverflow.com/search?q=pitch+detection, there are 10s of posts about this already.
Think of a audio waveform. PCM encoding is simply sampling that wave a certain number of times per second, and using a specific number of bits per sample.
Image from http://en.wikipedia.org/wiki/Pulse-code_modulation
16-bit Mono PCM at 44.1kHz means that 44,100 times per second, a 16-bit value (2 bytes) will be stored that represents the waveform at the specific time the sample was taken. 44.1kHz is fast enough to store frequencies that approach 22kHz (see Nyquist Frequency).
FFT turns those samples from the time domain to the frequency domain. That is, you can find what the levels of all the frequencies are for a particular period of time. The more bands you look at, the more computational intensive it is.