音频框架未转换为ndarray

发布于 2025-02-11 03:57:22 字数 2725 浏览 1 评论 0原文

我正在尝试运行COLAB文件培训OpenAI的Jukebox，但是当我尝试运行加载音频的功能代码时，我会遇到错误：

file“/content/jukebox/jukebox/data/files_dataset.py”，第82行，在get_song_chunk中数据，sr = load_audio（文件名，sr = self.sr，offset = offset，distation = self.sample_length）文件“/content/jukebox/jukebox/utils/io.py”，第48行，load_audio 帧= frame.to_ndarray（format ='fltp'）＃转换为floats而不是int16 attributeError：'list'对象没有属性'to_ndarray'

它似乎正在将框架输入解释为列表，当打印看起来像这样时：

[＆lt; av.audioframe 0，pts = none，778个样本，位于22050Hz，立体声，fltp at 0x7FD03DD64150＆gt;]

当我尝试更改为frage = resampler.resample（frame）时，我得到此错误：

typeError：'av.audio.frame.audioframe'对象不能解释为整数

我对音频文件的了解不多，所以我不确定如何调试，并在这里提供帮助。

加载音频的完整代码在下面。

def load_audio(file, sr, offset, duration, resample=True, approx=False, time_base='samples', check_duration=True):
    if time_base == 'sec':
        offset = offset * sr
        duration = duration * sr
    # Loads at target sr, stereo channels, seeks from offset, and stops after duration
    container = av.open(file)
    audio = container.streams.get(audio=0)[0] # Only first audio stream
    audio_duration = audio.duration * float(audio.time_base)
    if approx:
        if offset + duration > audio_duration*sr:
            # Move back one window. Cap at audio_duration
            offset = np.min(audio_duration*sr - duration, offset - duration)
    else:
        if check_duration:
            assert offset + duration <= audio_duration*sr, f'End {offset + duration} beyond duration {audio_duration*sr}'
    if resample:
        resampler = av.AudioResampler(format='fltp',layout='stereo', rate=sr)
    else:
        assert sr == audio.sample_rate
    offset = int(offset / sr / float(audio.time_base)) #int(offset / float(audio.time_base)) # Use units of time_base for seeking
    duration = int(duration) #duration = int(duration * sr) # Use units of time_out ie 1/sr for returning
    sig = np.zeros((2, duration), dtype=np.float32)
    container.seek(offset, stream=audio)
    total_read = 0
    for frame in container.decode(audio=0): # Only first audio stream
        if resample:
            frame.pts = None
            frame = resampler.resample(frame)
        frame = frame.to_ndarray(format='fltp') # Convert to floats and not int16
        read = frame.shape[-1]
        if total_read + read > duration:
            read = duration - total_read
        sig[:, total_read:total_read + read] = frame[:, :read]
        total_read += read
        if total_read == duration:
            break
    assert total_read <= duration, f'Expected {duration} frames, got {total_read}'
    return sig, sr

原文

I am trying to run a colab file training openAI's jukebox, however when I try to run the function code which loads the audio, I am getting an error:

File "/content/jukebox/jukebox/data/files_dataset.py", line 82, in get_song_chunk
data, sr = load_audio(filename, sr=self.sr, offset=offset, duration=self.sample_length)
File "/content/jukebox/jukebox/utils/io.py", line 48, in load_audio
frame = frame.to_ndarray(format='fltp') # Convert to floats and not int16
AttributeError: 'list' object has no attribute 'to_ndarray'

It seems to be interpreting the frame input as a list, which when printed looks like this:

[<av.AudioFrame 0, pts=None, 778 samples at 22050Hz, stereo, fltp at
0x7fd03dd64150>]

When I try to change to frame = resampler.resample(frame) I get this error:

TypeError: 'av.audio.frame.AudioFrame' object cannot be interpreted as
an integer

I don't really know much about audio files so i'm not sure how to debug and would appreciate help here.

the full code to load the audio is below.

def load_audio(file, sr, offset, duration, resample=True, approx=False, time_base='samples', check_duration=True):
    if time_base == 'sec':
        offset = offset * sr
        duration = duration * sr
    # Loads at target sr, stereo channels, seeks from offset, and stops after duration
    container = av.open(file)
    audio = container.streams.get(audio=0)[0] # Only first audio stream
    audio_duration = audio.duration * float(audio.time_base)
    if approx:
        if offset + duration > audio_duration*sr:
            # Move back one window. Cap at audio_duration
            offset = np.min(audio_duration*sr - duration, offset - duration)
    else:
        if check_duration:
            assert offset + duration <= audio_duration*sr, f'End {offset + duration} beyond duration {audio_duration*sr}'
    if resample:
        resampler = av.AudioResampler(format='fltp',layout='stereo', rate=sr)
    else:
        assert sr == audio.sample_rate
    offset = int(offset / sr / float(audio.time_base)) #int(offset / float(audio.time_base)) # Use units of time_base for seeking
    duration = int(duration) #duration = int(duration * sr) # Use units of time_out ie 1/sr for returning
    sig = np.zeros((2, duration), dtype=np.float32)
    container.seek(offset, stream=audio)
    total_read = 0
    for frame in container.decode(audio=0): # Only first audio stream
        if resample:
            frame.pts = None
            frame = resampler.resample(frame)
        frame = frame.to_ndarray(format='fltp') # Convert to floats and not int16
        read = frame.shape[-1]
        if total_read + read > duration:
            read = duration - total_read
        sig[:, total_read:total_read + read] = frame[:, :read]
        total_read += read
        if total_read == duration:
            break
    assert total_read <= duration, f'Expected {duration} frames, got {total_read}'
    return sig, sr

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

南街女流氓 2025-02-18 03:57:22

如果您的变量frame将其解释为列表，则可以用frame = resampler.resample（frame） frame = resampler.resampler.resample（frame）[0] [0] 。一旦我进行了编辑，您的代码就不会出错。

回复收藏 0 原文

極樂鬼 2025-02-18 03:57:22

尝试替换frame.to.to_ndarray（format ='fltp'）通过变量frame> frame：

import numpy as np

#frame = frame.to_ndarray(format='fltp') #Original line
frame = np.ndarray(frame)

如果您希望它是特定的数据类型，则可以更改dtype ndarray的参数函数：

frame = np.ndarray(frame, dtype=np.float32)

Try replacing frame = frame.to_ndarray(format='fltp') by a direct assignation of the variable frame:

import numpy as np

#frame = frame.to_ndarray(format='fltp') #Original line
frame = np.ndarray(frame)

If you want it to be a specific data type, you can change the dtype argument of the ndarray function:

frame = np.ndarray(frame, dtype=np.float32)

回复收藏 0 原文

屌丝范 2025-02-18 03:57:22

尝试：frame = frage [0] .to_ndarray（格式='fltp'）

回复收藏 0 原文

~没有更多了~

关于作者

阳光下的泡沫是彩色的

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

音频框架未转换为ndarray

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

qq_aHcEbj

qq_ikhFfg

寻找我们的幸福

把昨日还给我

wj_zym

巴黎夜雨

友情链接

音频框架未转换为ndarray

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

qq_aHcEbj

qq_ikhFfg

寻找我们的幸福

把昨日还给我

wj_zym

巴黎夜雨

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。