使用扩展音频文件服务将两个文件混合在一起

发布于 2024-10-01 16:29:05 字数 6183 浏览 10 评论 0 原文

我正在使用音频单元进行一些自定义音频后处理。我有两个文件正在合并在一起(下面的链接),但在输出中出现了一些奇怪的噪音。我做错了什么?

我已经验证,在此步骤之前,2 个文件(workTrack1workTrack2)处于正确状态并且听起来不错。在此过程中也没有出现任何错误。

缓冲区处理代码

- (BOOL)mixBuffersWithBuffer1:(const int16_t *)buffer1 buffer2:(const int16_t *)buffer2 outBuffer:(int16_t *)mixbuffer outBufferNumSamples:(int)mixbufferNumSamples {
    BOOL clipping = NO;

    for (int i = 0 ; i < mixbufferNumSamples; i++) {
        int32_t s1 = buffer1[i];
        int32_t s2 = buffer2[i];
        int32_t mixed = s1 + s2;

        if ((mixed < -32768) || (mixed > 32767)) {
            clipping = YES; // don't break here because we dont want to lose data, only to warn the user
        }

        mixbuffer[i] = (int16_t) mixed;
    }
    return clipping;
}

混音代码

////////////////////////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////      PHASE 4      ////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// In phase 4, open workTrack1 and workTrack2 for reading,
// mix together, and write out to outfile.

// open the outfile for writing -- this will erase the infile if they are the same, but its ok cause we are done with it
err = [self openExtAudioFileForWriting:outPath audioFileRefPtr:&outputAudioFileRef numChannels:numChannels];
if (err) { [self cleanupInBuffer1:inBuffer1 inBuffer2:inBuffer2 outBuffer:outBuffer err:err]; return NO; }

// setup vars
framesRead = 0;
totalFrames = [self totalFrames:mixAudioFile1Ref]; // the long one.
NSLog(@"Mix-down phase, %d frames (%0.2f secs)", totalFrames, totalFrames / RECORD_SAMPLES_PER_SECOND);

moreToProcess = YES;
while (moreToProcess) {

    conversionBuffer1.mBuffers[0].mDataByteSize = LOOPER_BUFFER_SIZE;
    conversionBuffer2.mBuffers[0].mDataByteSize = LOOPER_BUFFER_SIZE;

    UInt32 frameCount1 = framesInBuffer;
    UInt32 frameCount2 = framesInBuffer;

    // Read a buffer of input samples up to AND INCLUDING totalFrames
    int numFramesRemaining = totalFrames - framesRead; // Todo see if we are off by 1 here.  Might have to add 1
    if (numFramesRemaining == 0) {
        moreToProcess = NO; // If no frames are to be read, then this phase is finished

    } else {
        if (numFramesRemaining < frameCount1) { // see if we are near the end
            frameCount1 = numFramesRemaining;
            frameCount2 = numFramesRemaining;
            conversionBuffer1.mBuffers[0].mDataByteSize = (frameCount1 * bytesPerFrame);
            conversionBuffer2.mBuffers[0].mDataByteSize = (frameCount2 * bytesPerFrame);
        }

        NSbugLog(@"Attempting to read %d frames from mixAudioFile1Ref", (int)frameCount1);
        err = ExtAudioFileRead(mixAudioFile1Ref, &frameCount1, &conversionBuffer1);
        if (err) { [self cleanupInBuffer1:inBuffer1 inBuffer2:inBuffer2 outBuffer:outBuffer err:err]; return NO; }

        NSLog(@"Attempting to read %d frames from mixAudioFile2Ref", (int)frameCount2);
        err = ExtAudioFileRead(mixAudioFile2Ref, &frameCount2, &conversionBuffer2);
        if (err) { [self cleanupInBuffer1:inBuffer1 inBuffer2:inBuffer2 outBuffer:outBuffer err:err]; return NO; }

        NSLog(@"Read %d frames from mixAudioFile1Ref in mix-down phase", (int)frameCount1);
        NSLog(@"Read %d frames from mixAudioFile2Ref in mix-down phase", (int)frameCount2);

        // If no frames were returned, phase is finished
        if (frameCount1 == 0) {
            moreToProcess = NO;

        } else { // Process pcm data

            // if buffer2 was not filled, fill with zeros
            if (frameCount2 < frameCount1) {
                bzero(inBuffer2 + frameCount2, (frameCount1 - frameCount2));
                frameCount2 = frameCount1;
            }

            const int numSamples = (frameCount1 * bytesPerFrame) / sizeof(int16_t);

            if ([self mixBuffersWithBuffer1:(const int16_t *)inBuffer1
                                    buffer2:(const int16_t *)inBuffer2
                                  outBuffer:(int16_t *)outBuffer
                        outBufferNumSamples:numSamples]) {
                NSLog(@"Clipping");
            }
            // Write pcm data to the main output file
            conversionOutBuffer.mBuffers[0].mDataByteSize = (frameCount1 * bytesPerFrame);
            err = ExtAudioFileWrite(outputAudioFileRef, frameCount1, &conversionOutBuffer);

            framesRead += frameCount1;
        } // frame count
    } // else

    if (err) {
        moreToProcess = NO;
    }
} // while moreToProcess

// Check for errors
TTDASSERT(framesRead == totalFrames);
if (err) {
    if (error) *error = [NSError errorWithDomain:kUAAudioSelfCrossFaderErrorDomain
                                            code:UAAudioSelfCrossFaderErrorTypeMixDown
                                        userInfo:[NSDictionary dictionaryWithObjectsAndKeys:[NSNumber numberWithInt:err],@"Underlying Error Code",[self commonExtAudioResultCode:err],@"Underlying Error Name",nil]];
    [self cleanupInBuffer1:inBuffer1 inBuffer2:inBuffer2 outBuffer:outBuffer err:err];
    return NO;
}
NSLog(@"Done with mix-down phase");


假设

  • mixAudioFile1Ref 始终比 mixAudioFile2Ref
  • mixAudioFile2Ref 用完字节后,outputAudioFileRef code> 听起来应该与 mixAudioFile2Ref 完全相同

预期的声音应该在开始时混合淡入和淡出,以便在曲目循环时产生自交叉淡入淡出。请听一下输出,看看代码,让我知道哪里出了问题。

源音http://cl.ly/2g2F2A3k1r3S36210V23
产生的提示音http://cl.ly/3q2w3S3Y0x0M3i2a1W3v

I am doing some custom audio post-processing using audio units. I have two files that I am merging together (links below), but am coming up with some weird noise in the output. What am I doing wrong?

I have verified that before this step, the 2 files (workTrack1 and workTrack2) are in a proper state and sound good. No errors are hit in the process as well.

Buffer Processing code:

- (BOOL)mixBuffersWithBuffer1:(const int16_t *)buffer1 buffer2:(const int16_t *)buffer2 outBuffer:(int16_t *)mixbuffer outBufferNumSamples:(int)mixbufferNumSamples {
    BOOL clipping = NO;

    for (int i = 0 ; i < mixbufferNumSamples; i++) {
        int32_t s1 = buffer1[i];
        int32_t s2 = buffer2[i];
        int32_t mixed = s1 + s2;

        if ((mixed < -32768) || (mixed > 32767)) {
            clipping = YES; // don't break here because we dont want to lose data, only to warn the user
        }

        mixbuffer[i] = (int16_t) mixed;
    }
    return clipping;
}

Mixdown code:

////////////////////////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////      PHASE 4      ////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// In phase 4, open workTrack1 and workTrack2 for reading,
// mix together, and write out to outfile.

// open the outfile for writing -- this will erase the infile if they are the same, but its ok cause we are done with it
err = [self openExtAudioFileForWriting:outPath audioFileRefPtr:&outputAudioFileRef numChannels:numChannels];
if (err) { [self cleanupInBuffer1:inBuffer1 inBuffer2:inBuffer2 outBuffer:outBuffer err:err]; return NO; }

// setup vars
framesRead = 0;
totalFrames = [self totalFrames:mixAudioFile1Ref]; // the long one.
NSLog(@"Mix-down phase, %d frames (%0.2f secs)", totalFrames, totalFrames / RECORD_SAMPLES_PER_SECOND);

moreToProcess = YES;
while (moreToProcess) {

    conversionBuffer1.mBuffers[0].mDataByteSize = LOOPER_BUFFER_SIZE;
    conversionBuffer2.mBuffers[0].mDataByteSize = LOOPER_BUFFER_SIZE;

    UInt32 frameCount1 = framesInBuffer;
    UInt32 frameCount2 = framesInBuffer;

    // Read a buffer of input samples up to AND INCLUDING totalFrames
    int numFramesRemaining = totalFrames - framesRead; // Todo see if we are off by 1 here.  Might have to add 1
    if (numFramesRemaining == 0) {
        moreToProcess = NO; // If no frames are to be read, then this phase is finished

    } else {
        if (numFramesRemaining < frameCount1) { // see if we are near the end
            frameCount1 = numFramesRemaining;
            frameCount2 = numFramesRemaining;
            conversionBuffer1.mBuffers[0].mDataByteSize = (frameCount1 * bytesPerFrame);
            conversionBuffer2.mBuffers[0].mDataByteSize = (frameCount2 * bytesPerFrame);
        }

        NSbugLog(@"Attempting to read %d frames from mixAudioFile1Ref", (int)frameCount1);
        err = ExtAudioFileRead(mixAudioFile1Ref, &frameCount1, &conversionBuffer1);
        if (err) { [self cleanupInBuffer1:inBuffer1 inBuffer2:inBuffer2 outBuffer:outBuffer err:err]; return NO; }

        NSLog(@"Attempting to read %d frames from mixAudioFile2Ref", (int)frameCount2);
        err = ExtAudioFileRead(mixAudioFile2Ref, &frameCount2, &conversionBuffer2);
        if (err) { [self cleanupInBuffer1:inBuffer1 inBuffer2:inBuffer2 outBuffer:outBuffer err:err]; return NO; }

        NSLog(@"Read %d frames from mixAudioFile1Ref in mix-down phase", (int)frameCount1);
        NSLog(@"Read %d frames from mixAudioFile2Ref in mix-down phase", (int)frameCount2);

        // If no frames were returned, phase is finished
        if (frameCount1 == 0) {
            moreToProcess = NO;

        } else { // Process pcm data

            // if buffer2 was not filled, fill with zeros
            if (frameCount2 < frameCount1) {
                bzero(inBuffer2 + frameCount2, (frameCount1 - frameCount2));
                frameCount2 = frameCount1;
            }

            const int numSamples = (frameCount1 * bytesPerFrame) / sizeof(int16_t);

            if ([self mixBuffersWithBuffer1:(const int16_t *)inBuffer1
                                    buffer2:(const int16_t *)inBuffer2
                                  outBuffer:(int16_t *)outBuffer
                        outBufferNumSamples:numSamples]) {
                NSLog(@"Clipping");
            }
            // Write pcm data to the main output file
            conversionOutBuffer.mBuffers[0].mDataByteSize = (frameCount1 * bytesPerFrame);
            err = ExtAudioFileWrite(outputAudioFileRef, frameCount1, &conversionOutBuffer);

            framesRead += frameCount1;
        } // frame count
    } // else

    if (err) {
        moreToProcess = NO;
    }
} // while moreToProcess

// Check for errors
TTDASSERT(framesRead == totalFrames);
if (err) {
    if (error) *error = [NSError errorWithDomain:kUAAudioSelfCrossFaderErrorDomain
                                            code:UAAudioSelfCrossFaderErrorTypeMixDown
                                        userInfo:[NSDictionary dictionaryWithObjectsAndKeys:[NSNumber numberWithInt:err],@"Underlying Error Code",[self commonExtAudioResultCode:err],@"Underlying Error Name",nil]];
    [self cleanupInBuffer1:inBuffer1 inBuffer2:inBuffer2 outBuffer:outBuffer err:err];
    return NO;
}
NSLog(@"Done with mix-down phase");

ASSUMPTIONS

  • mixAudioFile1Ref is always longer than mixAudioFile2Ref
  • After the mixAudioFile2Ref runs out of bytes, the outputAudioFileRef should sound exactly the same as mixAudioFile2Ref

The expected sound is supposed to be mixing a fade-in over a fade-out in the beginning to produce a self-crossfade when the track is looped. Please listen to the output, look at the code and let me know where I am going wrong.

Source tone sound: http://cl.ly/2g2F2A3k1r3S36210V23
Resulting tone sound: http://cl.ly/3q2w3S3Y0x0M3i2a1W3v

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

老街孤人 2024-10-08 16:29:05

事实证明这里有两个问题。

缓冲区处理代码

int32_t mix = s1 + s2; 导致削波。更好的方法是除以混合通道数:int32_t mix = (s1 + s2)/2;,然后在稍后的另一遍中进行归一化。

帧!=字节
当声音耗尽时将第二个轨道的缓冲区清零时,我错误地将偏移量和持续时间设置为帧而不是字节。这会在缓冲区中产生垃圾并产生您定期听到的噪音。易于修复:

if (frameCount2 < frameCount1) {
    bzero(inBuffer2 + (frameCount2 * bytesPerFrame), (frameCount1 - frameCount2) * bytesPerFrame);
    frameCount2 = frameCount1;
}

现在示例很棒: http://cl.ly/1E2q1L441s2b3e2X2z0J

Turns out there were two problems here.

Buffer Processing Code

int32_t mixed = s1 + s2; was causing clipping. A better way is to divide by the number of channels mixed:int32_t mixed = (s1 + s2)/2; then normalize in another pass later.

Frames != bytes
When zeroing out the second track's buffers when the sound ran out, I was incorrectly setting the offset and duration as frames not bytes. This produced garbage in the buffer and created the noise you hear periodically. Easy to fix:

if (frameCount2 < frameCount1) {
    bzero(inBuffer2 + (frameCount2 * bytesPerFrame), (frameCount1 - frameCount2) * bytesPerFrame);
    frameCount2 = frameCount1;
}

Now the sample is great: http://cl.ly/1E2q1L441s2b3e2X2z0J

千紇 2024-10-08 16:29:05

您发布的答案看起来不错;我只能看到一个小问题。您的削波解决方案(除以二)会有所帮助,但它也相当于应用 50% 的增益减少。这与标准化不同。 归一化是查看整个音频文件、找到最高峰并应用给定的过程增益降低,​​使该峰值达到一定水平(通常为 0.0dB)。结果是在正常(即非削波)情况下,输出信号将非常低并且需要再次升压。

在混音过程中,您无疑遇到了导致失真的溢出,因为该值会环绕并导致信号跳跃。相反,您想要做的是应用一种名为“砖墙限制器的技术”,它基本上对裁剪的样本应用了硬上限。最简单的方法是:

int32_t mixed = s1 + s2;
if(mixed >= 32767) {
  mixed = 32767;
}
else if(mixed <= -32767) {
  mixed = -32767;
}

这种技术的结果是,您会在削波的样本周围听到一点失真,但声音不会像整数溢出的情况那样完全被破坏。尽管存在失真,但不会破坏聆听体验。

Your posted answer looks good; I can only see one minor problem. Your solution for the clipping, dividing by two will help but it also is the equivalent of applying a 50% gain reduction. That is not the same as normalization; normalization is the process of looking through an entire audio file, finding the highest peak, and applying a given gain reduction so that this peak hits a certain level (usually 0.0dB). The result is that under normal (ie, non-clipping) circumstances, the output signal will be very low and need to be boosted again.

During your mixdown, you no doubt encountered an overflow which caused distortion, since the value would wrap around and cause a jump in the signal. What you want to do instead is to apply a technique called a "brick-wall limiter", which basically applies a hard ceiling to samples which are clipping. The simplest way to do this is:

int32_t mixed = s1 + s2;
if(mixed >= 32767) {
  mixed = 32767;
}
else if(mixed <= -32767) {
  mixed = -32767;
}

The result of this technique is that you will hear a bit of distortion around samples which are clipping, but the sound will not be completely mangled as would be the case with integer overflow. The distortion, although present, doesn't destroy the listening experience.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文