从 iPhone 上的线性 PCM 中提取幅度数据

发布于 2024-09-25 18:19:47 字数 4649 浏览 3 评论 0原文

我在从 iPhone 上存储在 audio.caf 中的线性 PCM 中提取幅度数据时遇到困难。

我的问题是:

  1. 线性 PCM 将幅度样本存储为 16 位值。这是正确的吗?
  2. AudioFileReadPacketData() 返回的数据包中如何存储幅度?录制单声道线性 PCM 时,每个样本(在一帧、一个数据包中)不就是 SInt16 的一个数组吗?字节顺序是什么(大端与小端)?
  3. 线性 PCM 幅度的每一步在物理上意味着什么?
  4. iPhone上录制线性PCM时,中心点是0(SInt16)还是32768(UInt16)?物理波形/气压中的最大最小值意味着什么?

还有一个额外问题:是否存在 iPhone 麦克风无法测量的声音/气压波形?

我的代码如下:

// get the audio file proxy object for the audio
AudioFileID fileID;
AudioFileOpenURL((CFURLRef)audioURL, kAudioFileReadPermission, kAudioFileCAFType, &fileID);

// get the number of packets of audio data contained in the file
UInt64 totalPacketCount = [self packetCountForAudioFile:fileID];

// get the size of each packet for this audio file
UInt32 maxPacketSizeInBytes = [self packetSizeForAudioFile:fileID];

// setup to extract the audio data
Boolean inUseCache = false;
UInt32 numberOfPacketsToRead = 4410; // 0.1 seconds of data
UInt32 ioNumPackets = numberOfPacketsToRead;
UInt32 ioNumBytes = maxPacketSizeInBytes * ioNumPackets;
char *outBuffer = malloc(ioNumBytes);
memset(outBuffer, 0, ioNumBytes);

SInt16 signedMinAmplitude = -32768;
SInt16 signedCenterpoint = 0;
SInt16 signedMaxAmplitude = 32767;

SInt16 minAmplitude = signedMaxAmplitude;
SInt16 maxAmplitude = signedMinAmplitude;

// process each and every packet
for (UInt64 packetIndex = 0; packetIndex < totalPacketCount; packetIndex = packetIndex + ioNumPackets)
{
   // reset the number of packets to get
   ioNumPackets = numberOfPacketsToRead;

   AudioFileReadPacketData(fileID, inUseCache, &ioNumBytes, NULL, packetIndex, &ioNumPackets, outBuffer);

   for (UInt32 batchPacketIndex = 0; batchPacketIndex < ioNumPackets; batchPacketIndex++)
   {
      SInt16 packetData = outBuffer[batchPacketIndex * maxPacketSizeInBytes];
      SInt16 absoluteValue = abs(packetData);

      if (absoluteValue < minAmplitude) { minAmplitude = absoluteValue; }
      if (absoluteValue > maxAmplitude) { maxAmplitude = absoluteValue; }
   }
}

NSLog(@"minAmplitude: %hi", minAmplitude);
NSLog(@"maxAmplitude: %hi", maxAmplitude);

使用此代码,我几乎总是得到最小值 0 和最大值 128!这使得没有 对我来说有感觉。

我正在使用 AVAudioRecorder 录制音频,如下所示:

// specify mono, 44.1 kHz, Linear PCM with Max Quality as recording format
NSDictionary *recordSettings = [[NSDictionary alloc] initWithObjectsAndKeys:
   [NSNumber numberWithFloat: 44100.0], AVSampleRateKey,
   [NSNumber numberWithInt: kAudioFormatLinearPCM], AVFormatIDKey,
   [NSNumber numberWithInt: 1], AVNumberOfChannelsKey,
   [NSNumber numberWithInt: AVAudioQualityMax], AVEncoderAudioQualityKey,
   nil];

// store the sound file in the app doc folder as calibration.caf
NSString *documentsDir = [NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES) lastObject];
NSURL *audioFileURL = [NSURL fileURLWithPath:[documentsDir stringByAppendingPathComponent: @"audio.caf"]];

// create the audio recorder
NSError *createAudioRecorderError = nil;
AVAudioRecorder *newAudioRecorder = [[AVAudioRecorder alloc] initWithURL:audioFileURL settings:recordSettings error:&createAudioRecorderError];
[recordSettings release];

if (newAudioRecorder)
{
   // record the audio
   self.recorder = newAudioRecorder;
   [newAudioRecorder release];

   self.recorder.delegate = self;
   [self.recorder prepareToRecord];
   [self.recorder record];
}
else
{
   NSLog(@"%@", [createAudioRecorderError localizedDescription]);
}

感谢您提供的任何见解。这是我使用 Core Audio 的第一个项目,所以请随意拆解我的方法!

PS 我试图搜索核心音频列表档案,但请求不断给出错误:( http://search.lists.apple.com/?q=linear+pcm+amplitude&cmd=Search%21&ul=coreaudio-api )

PPS 我看过:

http://en.wikipedia.org/wiki/Sound_Pressure

http://en.wikipedia.org/wiki/Linear_PCM

http://wiki.multimedia.cx/index.php?title=PCM

获取声音文件内给定时间的振幅?

http://music.columbia.edu/pipermail/ music-dsp/2002-April/048341.html

我还阅读了整个核心音频概述和大部分音频会话编程指南,但我的问题仍然存在。

I'm having difficulty extracting amplitude data from linear PCM on the iPhone stored in a audio.caf.

My questions are:

  1. Linear PCM stores amplitude samples as 16-bit values. Is this correct?
  2. How is amplitude stored in packets returned by AudioFileReadPacketData()? When recording mono linear PCM, isn't each sample, (in one frame, in one packet) just an array for SInt16? What is the byte order (big endian vs. little endian)?
  3. What does each step in linear PCM amplitude mean physically?
  4. When linear PCM is recorded on the iPhone, is the center point 0 (SInt16) or 32768 (UInt16)? What do the max min values mean in the physical wave form/air pressure?

and a bonus question: Are there sound/air pressure wave forms that the iPhone mic can't measure?

My code follows:

// get the audio file proxy object for the audio
AudioFileID fileID;
AudioFileOpenURL((CFURLRef)audioURL, kAudioFileReadPermission, kAudioFileCAFType, &fileID);

// get the number of packets of audio data contained in the file
UInt64 totalPacketCount = [self packetCountForAudioFile:fileID];

// get the size of each packet for this audio file
UInt32 maxPacketSizeInBytes = [self packetSizeForAudioFile:fileID];

// setup to extract the audio data
Boolean inUseCache = false;
UInt32 numberOfPacketsToRead = 4410; // 0.1 seconds of data
UInt32 ioNumPackets = numberOfPacketsToRead;
UInt32 ioNumBytes = maxPacketSizeInBytes * ioNumPackets;
char *outBuffer = malloc(ioNumBytes);
memset(outBuffer, 0, ioNumBytes);

SInt16 signedMinAmplitude = -32768;
SInt16 signedCenterpoint = 0;
SInt16 signedMaxAmplitude = 32767;

SInt16 minAmplitude = signedMaxAmplitude;
SInt16 maxAmplitude = signedMinAmplitude;

// process each and every packet
for (UInt64 packetIndex = 0; packetIndex < totalPacketCount; packetIndex = packetIndex + ioNumPackets)
{
   // reset the number of packets to get
   ioNumPackets = numberOfPacketsToRead;

   AudioFileReadPacketData(fileID, inUseCache, &ioNumBytes, NULL, packetIndex, &ioNumPackets, outBuffer);

   for (UInt32 batchPacketIndex = 0; batchPacketIndex < ioNumPackets; batchPacketIndex++)
   {
      SInt16 packetData = outBuffer[batchPacketIndex * maxPacketSizeInBytes];
      SInt16 absoluteValue = abs(packetData);

      if (absoluteValue < minAmplitude) { minAmplitude = absoluteValue; }
      if (absoluteValue > maxAmplitude) { maxAmplitude = absoluteValue; }
   }
}

NSLog(@"minAmplitude: %hi", minAmplitude);
NSLog(@"maxAmplitude: %hi", maxAmplitude);

With this code I almost always get a min of 0 and a max of 128! That makes no
sense to me.

I'm recording the audio using the AVAudioRecorder as follows:

// specify mono, 44.1 kHz, Linear PCM with Max Quality as recording format
NSDictionary *recordSettings = [[NSDictionary alloc] initWithObjectsAndKeys:
   [NSNumber numberWithFloat: 44100.0], AVSampleRateKey,
   [NSNumber numberWithInt: kAudioFormatLinearPCM], AVFormatIDKey,
   [NSNumber numberWithInt: 1], AVNumberOfChannelsKey,
   [NSNumber numberWithInt: AVAudioQualityMax], AVEncoderAudioQualityKey,
   nil];

// store the sound file in the app doc folder as calibration.caf
NSString *documentsDir = [NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES) lastObject];
NSURL *audioFileURL = [NSURL fileURLWithPath:[documentsDir stringByAppendingPathComponent: @"audio.caf"]];

// create the audio recorder
NSError *createAudioRecorderError = nil;
AVAudioRecorder *newAudioRecorder = [[AVAudioRecorder alloc] initWithURL:audioFileURL settings:recordSettings error:&createAudioRecorderError];
[recordSettings release];

if (newAudioRecorder)
{
   // record the audio
   self.recorder = newAudioRecorder;
   [newAudioRecorder release];

   self.recorder.delegate = self;
   [self.recorder prepareToRecord];
   [self.recorder record];
}
else
{
   NSLog(@"%@", [createAudioRecorderError localizedDescription]);
}

Thanks for any insight you can offer. This is my first project using Core Audio, so feel free to tear apart my approach!

P.S. I have tried to searched the Core Audio list archives, but the request keeps giving an error: ( http://search.lists.apple.com/?q=linear+pcm+amplitude&cmd=Search%21&ul=coreaudio-api )

P.P.S. I have looked at:

http://en.wikipedia.org/wiki/Sound_pressure

http://en.wikipedia.org/wiki/Linear_PCM

http://wiki.multimedia.cx/index.php?title=PCM

Get the amplitude at a given time within a sound file?

http://music.columbia.edu/pipermail/music-dsp/2002-April/048341.html

I have also read the entirety of the Core Audio Overview and most of the Audio Session Programming Guide, but my questions remain.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

小瓶盖 2024-10-02 18:19:47

1) os x/iphone 文件读取例程允许您确定样本格式,通常为 SInt8、SInt16、SInt32、Float32、Float64 或 LPCM 的连续 24 位有符号整数之一

2) 对于 int 格式,MIN_FOR_TYPE 表示最大幅度在负相中,MAX_FOR_TYPE 表示在正相中的最大幅度。 0 等于沉默。浮点格式在 [-1...1] 之间调制,与浮点数一样为零。当读取、写入、记录或使用特定格式时,字节顺序很重要 - 文件可能需要特定格式,并且您通常希望以本机字节顺序操作数据。苹果音频文件库中的一些例程允许您传递表示源字节顺序的标志,而不是手动转换它。 CAF 有点复杂 - 它就像一个或多个音频文件的元包装器,并且支持多种类型。

3)lpcm的幅度表示只是一个强力的线性幅度表示(播放时不需要转换/解码,并且幅度步长相等)。

4) 参见#2。这些值与气压无关,与0 dBFS有关;例如,如果您将流直接输出到 DAC,则 int max(如果是浮点,则为 -1/1)表示单个样本将被削波的级别。

额外的一点是,就像每个 ADC 和组件链一样,它在输入电压方面的处理能力也受到限制。此外,采样率定义了可以捕获的最高频率(最高为采样率的一半)。 adc 可以使用固定或可选的位深度,但在选择其他位深度时,最大输入电压通常不会改变。

您在代码级别犯的一个错误:您正在将“outBuffer”操作为字符 - 而不是 SInt16

1) the os x/iphone file read routines allow you to determine the sample format, typically one of SInt8, SInt16, SInt32, Float32, Float64, or contiguous 24 bit signed int for LPCM

2) for int formats, MIN_FOR_TYPE represents the max amplitude in the negative phase, and MAX_FOR_TYPE represents the maximum amplitude in the positive. 0 equals silence. floating point formats modulate between [-1...1], with zero as with float. when reading, writing, recording, or working with a specific format, endianness will matter - a file may require a specific format, and you typically want to manipulate the data in the native endianness. some routines in the apple audio file libs allow you to pass a flag denoting source endianness, rather than you manually converting it. CAF is a bit more complicated - it acts like a meta wrapper for one or more audio files, and supports many types.

3) the amplitude representation for lpcm is just a brute-force linear amplitude representation (no conversion/decoding is required to playback, and the amplitude steps are equal).

4) see #2. the values are not related to air pressure, they are related to 0 dBFS; e.g. if you're outputting the stream straight to a DAC, then the int max (or -1/1 if floating point) represents the level at which an individual sample will clip.

Bonus) it, like every ADC and component chain has limits to what it can handle on input in terms of voltage. additionally, the sampling rate defines the highest frequency that may be captured (the highest being half of the sampling rate). the adc may use a fixed or selectable bit depth, but the max input voltage does not generally change when choosing another bit depth.

one mistake you're making at the code level: you're manipulating `outBuffer' as chars - not SInt16

旧情别恋 2024-10-02 18:19:47
  1. 如果您要求以您的录音格式提供 16 位样本,那么您将获得 16 位样本。但许多 Core Audio 录制/播放 API 以及可能的 caf 文件格式中确实存在其他格式。

  2. 在 mono 中,您只得到一个有符号 16 位整数数组。您可以在某些核心音频录制 API 中专门要求使用大端或小端。

  3. 除非您想要校准特定设备型号的麦克风或外部麦克风(并确保音频处理/AGC 已关闭),否则您可能需要考虑任意缩放音频电平。另外,响应也会随麦克风方向性和音频频率而变化。

  4. 16 位音频样本的中心点通常为 0(范围约为 -32k 到 32k)。无偏差。

  1. If you ask for 16-bit samples in your recording format, then you get 16-bit samples. But other formats do exist in many Core Audio record/play APIs, and in possible caf file formats.

  2. In mono, you just get an array of signed 16-bit ints. You can specifically ask for big or little endian in some of the Core Audio recording APIs.

  3. Unless you want to calibrate for your particular device model's mic or your external mic (and make sure audio processing/AGC is turned off), you might want to consider the audio levels to be arbitrary scaled. Plus the response varies with mic directionality and audio frequency as well.

  4. The center point for 16-bit audio samples is commonly 0 (range about -32k to 32k). No bias.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文