pyalsaaudio录制允许大量延迟，而不会超越缓冲区

发布于 2025-02-07 10:16:05 字数 1892 浏览 3 评论 0原文

我想在Ubuntu上实时录制音频，而Pyalsaaudio似乎最适合正确检测我的输入设备。我从随附的recordTest.py脚本开始，并想尝试延迟，以查看何时填充缓冲区并给我一个错误（或至少返回-epipe） - 根据pyalsaaudio文档的PCM.Read（）（）：

如果超出量，此功能将返回负大小：-epipe。这表明即使操作本身成功，数据也丢失了。尝试使用更大的时期大小。

但是，很小的缓冲区大小并没有引起问题，因此为了进一步调查，我在大量时间内添加了sleep（）在rectionTest.py中读取（）之间的呼叫之间：

inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NONBLOCK, 
    channels=1, rate=44100, format=alsaaudio.PCM_FORMAT_S16_LE, 
    periodsize=160, device=device)

loops_with_data = 3000 #3000*160/44100 = 10.9 seconds of audio
first_time = True
while loops_with_data > 0:
    # Read data from device
    l, data = inp.read()
    print("l:",l)

    if l:
        f.write(data)
        if first_time:
            #big delay after first data read
            time.sleep(100)
            first_time = False
        else:
            #smaller delay otherwise, still longer than one period length
            time.sleep(.01)
        loops_with_data-=1

我希望这会覆盖缓冲区 -但是，Lead（）返回的L的价值永远不会负面，几乎总是160。当我播放音频时，我得到了我所说的麦克风的前10.9秒的完美录制。不知何故，似乎使用的缓冲区是巨大的，可以存储价值100秒的音频，因此当读取（）被称为100秒后，它仍然可以访问所有旧时期的帧。这样做的问题是，如果我的应用程序在读取时间太长的调用（）之间运行一个函数，那么音频将不断延迟，我将变得更加明智，因为什么都没有表明这正在发生。

我已经尝试挖掘Alsaaudio.c，并发现了一些怪异 - 无论我做什么，PCM对象似乎总是认为它具有合理数量的帧数（假设帧=音频示例），但是缓冲区每个缓冲区的时间和时间始终显示为0。我尝试使用python中的INP.Info（）并在C文件本身中打印。这是非常奇怪的，因为C文件清楚地尝试使用SND_PCM_HW_HW_HW_PARAMS_SET_PERIODS_NEAR（）：

dir = 0;
unsigned int periods = 4;
snd_pcm_hw_params_set_periods_near(self->handle, hwparams, &periods, &dir);

但是，在以下行之后，设置为0：

/* Query current settings. These may differ from the requested values, 
which should therefore be synced with actual values */

snd_pcm_hw_params_current(self->handle, hwparams);

我尝试了其他各种功能（例如SND_PCM_HW_PARAMS_SET_SET_SET_PERIOD_PERIODS_PERIN）和SNDDEDPARAM和SNDDED PARCM，））没有运气。

原文

I want to record audio in realtime on Ubuntu and pyalsaaudio seems to work best for detecting my input devices correctly. I started off with the included recordtest.py script, and wanted to experiment with latency to see when the buffer would fill up and give me an error (or at least return -EPIPE) - as per the pyalsaaudio documentation for PCM.read():

In case of an overrun, this function will return a negative size: -EPIPE. This indicates that data was lost, even if the operation itself succeeded. Try using a larger periodsize.

However, a tiny buffer size wasn't causing problems, so to further investigate I added in huge time.sleep()'s in between calls to read() in recordtest.py:

inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NONBLOCK, 
    channels=1, rate=44100, format=alsaaudio.PCM_FORMAT_S16_LE, 
    periodsize=160, device=device)

loops_with_data = 3000 #3000*160/44100 = 10.9 seconds of audio
first_time = True
while loops_with_data > 0:
    # Read data from device
    l, data = inp.read()
    print("l:",l)

    if l:
        f.write(data)
        if first_time:
            #big delay after first data read
            time.sleep(100)
            first_time = False
        else:
            #smaller delay otherwise, still longer than one period length
            time.sleep(.01)
        loops_with_data-=1

I would've expected this to overrun the buffer - however, the value of l returned by read() is never negative, and almost always 160. When I play back the audio, I get a perfect recording of the first 10.9 seconds of what I said into the microphone. Somehow it seems that the buffer being used is huge, storing over 100 seconds worth of audio so that when read() is called 100 seconds later, it can still access all the old periods of frames. The problem with this is that if my application runs a function in between calls to read() that take too long, the audio will keep getting more and more delayed and I'll be none the wiser, since nothing indicates that this is happening.

I've tried digging into alsaaudio.c, and have discovered some weirdness - no matter what I do,the PCM object always seems to think it has a buffer size of a reasonable number of frames (assuming frames = audio samples), but buffer time and number of periods per buffer always show up as 0. I've tried printing this using inp.info() in python, and printing in the c file itself. It's extra weird because the c file is clearly trying to set 4 periods per buffer using snd_pcm_hw_params_set_periods_near():

dir = 0;
unsigned int periods = 4;
snd_pcm_hw_params_set_periods_near(self->handle, hwparams, &periods, &dir);

But after the following line, periods gets set to 0:

/* Query current settings. These may differ from the requested values, 
which should therefore be synced with actual values */

snd_pcm_hw_params_current(self->handle, hwparams);

I've tried all sorts of other functions (like snd_pcm_hw_params_set_periods_min() and snd_pcm_hw_params_set_periods_max()) with no luck.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一指流沙 2025-02-14 10:16:05

函数SND_PCM_DROP允许您删除缓冲区的内容。该功能已经可以从Pyalsaaudio作为PCM设备的Drop方法获得。

之后：

#big delay after first data read
            time.sleep(100)

您可以简单地添加

            inp.drop()

所有在调用drop（）之前到达的输入。（但是从脚本自身数据变量中的脚本开始，仍然有一些声音）

似乎可以更微妙的解决方案，但需要添加SND_PCM_FORWARD，以及可以在Pyalsaaudio接口上添加SND_PCM_FORWARD。

在这里，我用于分析和测试的完整修改脚本。（我将大延迟缩短到4秒钟。）我还将源头用于Wav-file创作，因为Audacity对创建Wav-Files的另一种方法不满意。

import time
import alsaaudio
import numpy as np
import struct
import soundfile as sf

conversion_dicts = {
        alsaaudio.PCM_FORMAT_S16_LE: {'dtype': np.int16, 'endianness': '<', 'formatchar': 'h', 'bytewidth': 2},
}

def get_conversion_string(audioformat, noofsamples):
    conversion_dict = conversion_dicts[audioformat]
    conversion_string = f"{conversion_dict['endianness']}{noofsamples}{conversion_dict['formatchar']}"
    return conversion_string

device = 'default'
fs = 44100

inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NONBLOCK, 
    channels=1, rate=fs, format=alsaaudio.PCM_FORMAT_S16_LE, 
    periodsize=160, device=device)

print(inp.info())

f = sf.SoundFile("test.wav", 'wb', samplerate=fs, channels=1)

dtype = np.int16 

loops_with_data = 3000 #3000*160/44100 = 10.9 seconds of audio
first_time = True

while loops_with_data > 0:
    # Read data from device
    l, rawdata = inp.read()
    
    conversion_string = get_conversion_string(alsaaudio.PCM_FORMAT_S16_LE, l)
    data = np.array(struct.unpack(conversion_string, rawdata), dtype=dtype)
    

    if l > 0:
        print(f"\r{loops_with_data:4}", end='')
        f.write(data)
        if first_time:
            #big delay after first data read
            time.sleep(4)
            inp.drop()
            first_time = False
        else:
            #smaller delay otherwise, still longer than one period length
            time.sleep(.01)
        loops_with_data-=1
    else:
        print(".", end='')
        
f.close()

The function snd_pcm_drop allows you to drop the contents of the buffer. This function is already available from pyalsaaudio as the drop method for a PCM device.

After:

#big delay after first data read
            time.sleep(100)

you can simply add

            inp.drop()

All input that arrived before calling drop() will be ignored. (But there is still some sound from the start of the script in the scripts own data variable)

More subtle solutions seem possible, but would require adding snd_pcm_forward and perhaps snd_pcm_forwardable to the pyalsaaudio interface.

Here the complete modified script I used for analysis and testing. (I shortened the big delay to 4 seconds.) I also used soundfile for wav-file creation as audacity wasn't happy with the other method of creating wav-files.

import time
import alsaaudio
import numpy as np
import struct
import soundfile as sf

conversion_dicts = {
        alsaaudio.PCM_FORMAT_S16_LE: {'dtype': np.int16, 'endianness': '<', 'formatchar': 'h', 'bytewidth': 2},
}

def get_conversion_string(audioformat, noofsamples):
    conversion_dict = conversion_dicts[audioformat]
    conversion_string = f"{conversion_dict['endianness']}{noofsamples}{conversion_dict['formatchar']}"
    return conversion_string

device = 'default'
fs = 44100

inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NONBLOCK, 
    channels=1, rate=fs, format=alsaaudio.PCM_FORMAT_S16_LE, 
    periodsize=160, device=device)

print(inp.info())

f = sf.SoundFile("test.wav", 'wb', samplerate=fs, channels=1)

dtype = np.int16 

loops_with_data = 3000 #3000*160/44100 = 10.9 seconds of audio
first_time = True

while loops_with_data > 0:
    # Read data from device
    l, rawdata = inp.read()
    
    conversion_string = get_conversion_string(alsaaudio.PCM_FORMAT_S16_LE, l)
    data = np.array(struct.unpack(conversion_string, rawdata), dtype=dtype)
    

    if l > 0:
        print(f"\r{loops_with_data:4}", end='')
        f.write(data)
        if first_time:
            #big delay after first data read
            time.sleep(4)
            inp.drop()
            first_time = False
        else:
            #smaller delay otherwise, still longer than one period length
            time.sleep(.01)
        loops_with_data-=1
    else:
        print(".", end='')
        
f.close()

回复收藏 0 原文

~没有更多了~