PCM 音频幅度值?
我开始使用 Android 智能手机进行录音。
我成功地将录音保存到 PCM 文件。当我解析数据并打印出带符号的 16 位值时,我可以创建如下图所示的图表。 但是,我不明白 y 轴上的振幅值。
振幅值的单位到底是什么?这些值是带符号的 16 位,因此它们的范围必须是 -32K 到 +32K。但这些值代表什么?分贝?
如果我使用 8 位值,则值的范围必须为 -128 到 +128。如何将其映射到 16 位值的音量/“响度”?您是否只使用 16 比 1 量化映射?
为什么会有负值?我认为完全沉默会导致值为 0。
如果有人可以向我指出一个包含所记录内容信息的网站,我将不胜感激。我在 PCM 文件格式上找到了网页,但没有找到数据值是什么。
I am starting out with audio recording using my Android smartphone.
I successfully saved voice recordings to a PCM file. When I parse the data and print out the signed, 16-bit values, I can create a graph like the one below. However, I do not understand the amplitude values along the y-axis.
What exactly are the units for the amplitude values? The values are signed 16-bit, so they must range from -32K to +32K. But what do these values represent? Decibels?
If I use 8-bit values, then the values must range from -128 to +128. How would that get mapped to the volume/"loudness" of the 16-bit values? Would you just use a 16-to-1 quantisation mapping?
Why are there negative values? I would think that complete silence would result in values of 0.
If someone can point me to a website with information on what's being recorded, I would appreciate it. I found webpages on the PCM file format, but not what the data values are.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
想想麦克风的表面。当安静时,表面在零位置处静止。当你说话时,你嘴周围的空气就会振动。振动就像弹簧一样,并且有两个方向的运动,如前后、上下、或进出。空气中的振动也会导致麦克风表面振动,如上下移动。当它向下移动时,可能会测量或采样正值。当它向上移动时,可能会被采样为负值。 (或者可能恰恰相反。)当您停止说话时,表面会回到零位置。
从 PCM 录音数据中获得的数字取决于系统的增益。对于常见的 16 位样本,范围是从 -32768 到 32767,以获得可记录的最大可能振动偏移,而不会失真、削波或溢出。通常增益设置得低一些,这样最大值就不会正好位于失真的边缘。
添加:
8 位 PCM 音频通常是无符号数据类型,范围为 0..255,值为 128 表示“静音”。因此,您必须添加/减去此偏差,并缩放约 256,以在 8 位和 16 位音频 PCM 波形之间进行转换。
Think of the surface of the microphone. When it's silent, the surface is motionless at position zero. When you talk, that causes the air around your mouth to vibrate. Vibrations are spring like, and have movement in both directions, as in back and forth, or up and down, or in and out. The vibrations in the air cause the microphone surface to vibrate as well, as in move up and down. When it moves down, that might be measured or sampled a positive value. When it moves up that might be sampled as a negative value. (Or it could be the opposite.) When you stop talking the surface settles back down to the zero position.
What numbers you get from your PCM recording data depend on the gain of the system. With common 16 bit samples, the range is from -32768 to 32767 for the largest possible excursion of a vibration that can be recorded without distortion, clipping or overflow. Usually the gain is set a bit lower so that the maximum values aren't right on the edge of distortion.
ADDED:
8-bit PCM audio is often an unsigned data type, with the range from 0..255, with a value of 128 indicating "silence". So you have to add/subtract this bias, as well as scale by about 256 to convert between 8-bit and 16-bit audio PCM waveforms.
原始数字是用于将模拟音频信号转换为数字信号的量化过程的产物。将音频信号视为 0 附近的振动更有意义,延伸至 +1 和 -1 以获得信号的最大偏移。除此之外,你会得到削波,这会扭曲谐波并且听起来很糟糕。
然而,计算机在分数方面表现不佳,因此使用 0 到 65536 之间的离散整数来映射该范围。在大多数此类应用中,+32767 被认为是麦克风或扬声器振膜的最大正偏移。采样点和声压级之间没有相关性,除非您开始考虑录音(或回放)电路的特性。
(顺便说一句,16 位音频非常标准且广泛使用。它是信噪比和动态范围的良好平衡。除非您进行一些时髦的非标准缩放,否则 8 位音频会很吵。)
The raw numbers are an artefact of the quantization process used to convert an analog audio signal into digital. It makes more sense to think of an audio signal as a vibration around 0, extending as far as +1 and -1 for maximum excursion of the signal. Outside that, you get clipping, which distorts the harmonics and sounds terrible.
However, computers don't work all that well in terms of fractions, so discrete integers from 0 to 65536 are used to map that range. In most applications like this, a +32767 is considered maximum positive excursion of the microphone's or speaker's diaphragm. There is no correlation between a sample point and a sound pressure level, unless you start factoring in the characteristics of the recording (or playback) circuits.
(BTW, 16-bit audio is very standard and widely used. It is a good balance of signal-to-noise ratio and dynamic range. 8-bit is noisy unless you do some funky non-standard scaling.)
这里有很多很好的答案,但它们并没有以易于阅读的方式直接解决您的问题。
这些值没有单位。它们只是代表来自模数转换器的数字。 A/D 转换器的数字是麦克风和前置放大器特性的函数。
我不明白这个问题。如果您录制 8 位音频,则您的值将为 8 位。您要将 8 位音频转换为 16 位吗?
麦克风上的振膜在两个方向上振动,从而产生正电压和负电压。值 0 表示静音,因为它表示隔膜不移动。请参阅麦克风工作原理
有关如何以数字方式表示声音的更多详细信息,请参阅此处。
Lots of good answers here, but they don't directly address your questions in an easy to read way.
The values have no unit. They simply represent a number that has come out of an analog-to-digital converter. The numbers from the A/D converter are a function of the microphone and pre-amplifier characteristics.
I don't understand this question. If you are recording 8-bit audio, your values will be 8-bits. Are you converting 8-bit audio to 16-bit?
The diaphragm on a microphone vibrates in both directions and as a result creates positive and negative voltages. A value of 0 is silence as it indicates that the diaphragm is not moving. See how microphones work
For more details on how sound is represented digitally, see here.
小澄清:正在记录隔膜的位置。当没有振动、位置没有变化时,就会出现沉默。因此,您看到的振动就是推动空气并随着时间的推移造成气压变化的原因。在任何振动的顶部和底部峰值处,空气不再被推动,因此峰值出现在寂静发生时。信号最响亮的部分是当位置变化最快时,即位于峰值中间的某个位置。隔膜从一个峰值移动到另一个峰值的速度决定了隔膜产生的压力大小。当顶部和底部峰值减少到零(或它们共享的其他数字)时,就没有振动,也没有声音。此外,随着振膜速度减慢,峰值之间的时间间隔更大,产生或记录的声压也会减少。
我推荐 Yamaha 扩声手册进行更深入的阅读。理解微积分的概念也有助于理解音频和振动。
Small clarification: The position of the diaphragm is being recorded. Silence occurs when there is no vibration, when there is no change in position. So the vibration you are seeing is what is pushing the air and creating changes in air pressure over time. The air is no longer being pushed at the top and bottom peaks of any vibration, so the peaks are when silence occurs. The loudest part of the signal is when the position changes the fastest which is somewhere in the middle of the peaks. The speed with which the diaphragm moves from one peak to another determines the amount of pressure that's generated by the diaphragm. When the top and bottom peaks are reduced to zero (or some other number they share) then there is no vibration and no sound at all. Also as the diaphragm slows down so that there's a greater space of time between peaks, there is less sound pressure being generated or recorded.
I recommend the Yamaha Sound Reinforcement Handbook for more in depth reading. Understanding the idea of calculus would help the understanding of audio and vibration as well.
16 位数字是来自麦克风的 A/D 转换器值(您知道这一点)。另请注意,麦克风和 A/D 转换器之间的放大器具有自动增益控制 (AGC)。 AGC 将主动改变麦克风信号的放大率,以防止过多的电压冲击 A/D 转换器(通常小于 2V dc)。此外,还有直流电压去耦功能,可将输入信号设置在 A/D 转换器范围的中间(例如 1 伏直流)。
因此,当麦克风没有声音时,AGC 放大器会向 A/D 转换器发送一条平线 1.0 伏直流信号。当声波撞击麦克风时,会产生相应的交流电压波。 AGC 放大器获取交流电压波,将其集中在 1.0 Vdc,并将其发送到 A/D 转换器。 A/D 采样(以每秒 44,000 次的速度测量直流电压),并吐出 +/-16 位电压值。因此 -65,536 = 0.0 Vdc,+65,536 = 2.0 Vdc。 +100 = 1.00001529 Vdc 和 -100 = 0.99998474 Vdc 的值到达 A/D 转换器。
+值高于1.0 Vdc,-值低于1.0 Vdc。
请注意,大多数音频系统使用对数公式对音频波进行对数曲线,以便人耳可以更好地听到它。在数字音频系统(带有 ADC)中,数字信号处理将这条曲线置于信号上。 DSP 芯片是一项大生意,TI 通过将它们用于各种应用而赚了大钱,而不仅仅是音频处理。 DSP 可以将非常复杂的数学运算转化为实时数据流,这会阻塞 iPhone 的 ARM7 处理器。假设您正在向由 256 个超声波传感器/接收器组成的阵列发送 2MHz 脉冲,您就明白了。
The 16bit numbers are the A/D convertor values from your microphone (you knew this). Know also that the amplifier between your microphone and the A/D convertor has an Automatic Gain Control (AGC). The AGC will actively change the amplification of the microphone signal to prevent too much voltage from hitting the A/D convertor (usually < 2Volts dc). Also, there is DC voltage de-coupling which sets the input signal in the middle of the A/D convertor's range (say 1Volt dc).
So, when there is no sound hitting the microphone, the AGC amplifier is sending a flat line 1.0 Volt dc signal to the A/D convertor. When sound waves hit the microphone, it creates a corresponding AC voltage wave. The AGC amp takes the AC voltage wave, centers it at 1.0 Vdc, and sends it to the A/D convertor. The A/D samples (measures the DC Voltage at say 44,000 / per second), and spits out the +/-16bit values of the voltage. So -65,536 = 0.0 Vdc and +65,536 = 2.0 Vdc. A value of +100 = 1.00001529 Vdc and -100 = 0.99998474 Vdc hitting the A/D convertor.
+Values are above 1.0 Vdc, -Values are below 1.0 Vdc.
Note, most audio systems use a log formula to curve the audio wave logarithmically, so a human ear can better hear it. In digital audio systems (with ADCs), Digital Signal Processing puts this curve on the signal. DSPs chips are big business, TI has made a fortune using them for all kinds of applications, not just audio processing. DSPs can work the very complicated math onto a real time stream of data that would choke an iPhone's ARM7 processor. Say you are sending 2MHz pulses to an array of 256 ultrasound sensor/receivers--you get the idea.