什么是线性 PCM 值
我正在 iPhone 操作系统中处理音频,有点困惑。
我目前正在以 pcm 值的形式从音频缓冲区获取输入,范围为 -32767 到 32768。我希望使用公式 20LOG10(p/pref) 执行 dbSPL 转换。
我知道 pRef 是 0.00002 帕斯卡,并且想将 pcm 值转换为帕斯卡。
我的问题是 a) 这些 pcm 值到底代表什么? b) 如何将这些值转换为帕斯卡。
非常感谢
I am working with audio in the iPhone OS and am a bit confused.
I am currently getting input from my audio buffer in the form of pcm values ranging from
-32767 to 32768. I am hoping to perform a dbSPL conversion using the formula 20LOG10(p/pref).
I am aware that pRef is .00002 pascals, and would like to convert the pcm values to pascals.
My question is
a) what are these pcm values representing exactly.
b) how do I turn these values to pascals.
Thanks so much
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果没有附加信息,您无法进行此转换。 PCM 值到物理压力单位(帕斯卡)的映射取决于音量设置、输出设备的特性(耳塞? PA 系统?)以及观察者相对于输出设备的位置(紧邻输出设备)。扬声器距离100米?)。
回答你问题的第一部分:如果你要绘制声压图
例如,对于 1 kHz 正弦波音调,线性量化 PCM 值与时间的关系
采样时间大致与环境声压变化成正比
那一刻。 (“大致”,因为输入和输出设备很少有绝对平坦的
整个音频范围内的响应曲线。)
You can't do this conversion without additional information. The mapping of PCM values to physical units of pressure (pascals) depends on the volume setting, characteristics of the output device (earbuds? a PA system?), and the position of the observer with respect to the output device (right next to the speaker? 100 meters away?).
To answer the first part of your question: if you were to graph the sound pressure
versus time for, say, a 1 kHz sine wave tone, the linear-quantized PCM values at the
sample times would be roughly proportional to the sound pressure variations from ambient
at that instant. ("Roughly", because input and output devices seldom have absolutely flat
response curves over the entire audio frequency range.)
让我们对这个问题有一些直觉
音频只是一条在零线上方和下方波动的曲线...如果曲线在足够长的时间内位于零线或太靠近零线,则此映射静音...您的扬声器表面和耳膜都不会摆动...或者,如果音频曲线在一段时间内经常从最大值剧烈摆动到最小值,则您的音量最大,因此帕斯卡值更大
您耳朵听到的野外音频是模拟的...要数字化音频,必须通过对原始音频曲线进行采样,将模拟曲线转换为二进制数据,以每秒 X 个样本记录曲线高度...音频的基本数字格式PCM 简单地将连续不间断的模拟曲线映射到图表上的不同点...PCM 音频仍然显示为曲线,但当您放大曲线上的不同点时...每个曲线点都有其 X 和 Y 值其中 X 代表时间(从左到右),Y 代表幅度(向上和向下)...但是,仅存储 Y 值,并且隐含 X 值,这意味着根据定义,每个连续的 Y 样本在时间上是分开的,该时间由因此,在采样率为 44100 赫兹的情况下,每秒记录 44100 个 Y 值(每个通道)。每秒
X 测量的数量我们称为采样率(通常为每秒 44,100 个)...用于记录 Y 保真度的位数,我们称为位深度 ...如果我们投入 3 位,则可能的 Y 值的范围必须适合这些行之一,
因此对于 3 位 Y 的可能值的数量为 2^3 或 8 个不同的值听起来非常失真,因为音频曲线远非连续,这就是为什么 CD 质量音频使用两个字节(16 位)信息来记录曲线高度 Y 的值,这给出了 2^16 个不同的 Y 值,这等于您的比例给了我们(-32767到32768)... 2^16 == 65536个不同的Y值...原来的连续不间断的模拟音频曲线现在被数字化为2^16个选择的高度值,范围从音频曲线的顶部到底部,其中对于人耳来说,与源音频曲线无法区分...使用浮点进行音频计算时,Y 值通常会被归一化...例如范围为 -1.0 到 +1.0 ...而不是 (-32767 到 32768 )
所以现在应该很清楚你关于帕斯卡(压力单位)的问题的核心是与Y值范围(位深度)正交的,而是音频曲线形状以及扬声器的绝对面积的函数表面...对于给定的频率选择,音频曲线遵循该频率的规范正弦曲线的程度,同时消耗所有可能的 Y 值范围将最大化幅度(音量),从而驱动帕斯卡值
Lets get some intuition for the question
Audio is simply a curve which fluctuates above and below a zero line ... if the curve sits at or too near the zero line for long enough period of time this maps to silence ... neither your speaker surface nor your eardrum do any wobbling ... alternatively if the audio curve violently wobbles from maximum value to minimum value more often than not for a stretch of time you have maximum volume hence a greater value of pascals
Audio in the wild which your ear hears is analog ... to digitize audio this analog curve must get converted into binary data by sampling the raw audio curve to record the curve height at X samples per second ... the fundamental digital format for audio is PCM which simply maps that continuous unbroken analog curve into distinct points on a graph ... PCM audio still appears as a curve yet when you zoom in its just distinct points on the curve ... each curve point has its X and Y value where X represents time (going left to right) and Y represents amplitude (going up and down) ... however only the Y values are stored and the X values are implied meaning each successive Y sample is by definition separated in time determined by the sampling rate so for a second of time with a sample rate of 44100 Hertz you will have 44100 values of Y per second of recording ( per channel )
The number of X measurements per second we call Sample Rate (often 44,100 per second) ... the number of bits used to record the fidelity of Y we call Bit Depth ... if we devote 3 bits the universe of possible Y values must fit in one of these rows
so for 3 bits the number of possible values of Y is 2^3 or 8 distinct values which sounds very distorted since the audio curve is far from continuous which is why CD quality audio uses two bytes ( 16 bits) of information to record the value of curve height Y which gives it 2^16 distinct values of Y which equates to the scale you gave us ( -32767 to 32768 ) ... 2^16 == 65536 distinct Y values ... the original continuous unbroken analog audio curve is now digitized into 2^16 choices of height values ranging from top to bottom of audio curve which to the human ear becomes indistinguishable from the source audio curve ... when doing audio calculations using floating point often the Y value gets normalized ... into say a range of -1.0 to +1.0 ... instead of ( -32767 to 32768)
So by now it should be clear the heart of your question concerning pascals ( unit of pressure ) is orthogonal to Y value range (bit depth) and is instead a function of the shape of the audio curve together with the sheer area of the speaker surface ... for a given choice of frequency(s) the degree to which the audio curve adheres to the canonical sine curve of that frequency while consuming the full range of possible Y values will maximize the amplitude ( volume ) thus driving the pascal value
你的问题既不是“iphone”、“objective-c”也不是“objective-c++”。但它可以非常简单地回答: http://en.wikipedia.org/wiki/Pulse- code_modulation
问候
Your question is neither “iphone”, “objective-c” or “objective-c++”. But it can be answered very simple: http://en.wikipedia.org/wiki/Pulse-code_modulation
Greetings