音频框架未转换为ndarray
我正在尝试运行COLAB文件培训OpenAI的Jukebox,但是当我尝试运行加载音频的功能代码时,我会遇到错误:
file“/content/jukebox/jukebox/data/files_dataset.py”,第82行,在get_song_chunk中 数据,sr = load_audio(文件名,sr = self.sr,offset = offset,distation = self.sample_length) 文件“/content/jukebox/jukebox/utils/io.py”,第48行,load_audio 帧= frame.to_ndarray(format ='fltp')#转换为floats而不是int16 attributeError:'list'对象没有属性'to_ndarray'
它似乎正在将框架输入解释为列表,当打印看起来像这样时:
[< av.audioframe 0,pts = none,778个样本,位于22050Hz,立体声,fltp at 0x7FD03DD64150>]
当我尝试更改为frage = resampler.resample(frame)
时,我得到此错误:
typeError:'av.audio.frame.audioframe'对象不能解释为 整数
我对音频文件的了解不多,所以我不确定如何调试,并在这里提供帮助。
加载音频的完整代码在下面。
def load_audio(file, sr, offset, duration, resample=True, approx=False, time_base='samples', check_duration=True):
if time_base == 'sec':
offset = offset * sr
duration = duration * sr
# Loads at target sr, stereo channels, seeks from offset, and stops after duration
container = av.open(file)
audio = container.streams.get(audio=0)[0] # Only first audio stream
audio_duration = audio.duration * float(audio.time_base)
if approx:
if offset + duration > audio_duration*sr:
# Move back one window. Cap at audio_duration
offset = np.min(audio_duration*sr - duration, offset - duration)
else:
if check_duration:
assert offset + duration <= audio_duration*sr, f'End {offset + duration} beyond duration {audio_duration*sr}'
if resample:
resampler = av.AudioResampler(format='fltp',layout='stereo', rate=sr)
else:
assert sr == audio.sample_rate
offset = int(offset / sr / float(audio.time_base)) #int(offset / float(audio.time_base)) # Use units of time_base for seeking
duration = int(duration) #duration = int(duration * sr) # Use units of time_out ie 1/sr for returning
sig = np.zeros((2, duration), dtype=np.float32)
container.seek(offset, stream=audio)
total_read = 0
for frame in container.decode(audio=0): # Only first audio stream
if resample:
frame.pts = None
frame = resampler.resample(frame)
frame = frame.to_ndarray(format='fltp') # Convert to floats and not int16
read = frame.shape[-1]
if total_read + read > duration:
read = duration - total_read
sig[:, total_read:total_read + read] = frame[:, :read]
total_read += read
if total_read == duration:
break
assert total_read <= duration, f'Expected {duration} frames, got {total_read}'
return sig, sr
I am trying to run a colab file training openAI's jukebox, however when I try to run the function code which loads the audio, I am getting an error:
File "/content/jukebox/jukebox/data/files_dataset.py", line 82, in get_song_chunk
data, sr = load_audio(filename, sr=self.sr, offset=offset, duration=self.sample_length)
File "/content/jukebox/jukebox/utils/io.py", line 48, in load_audio
frame = frame.to_ndarray(format='fltp') # Convert to floats and not int16
AttributeError: 'list' object has no attribute 'to_ndarray'
It seems to be interpreting the frame input as a list, which when printed looks like this:
[<av.AudioFrame 0, pts=None, 778 samples at 22050Hz, stereo, fltp at
0x7fd03dd64150>]
When I try to change to frame = resampler.resample(frame)
I get this error:
TypeError: 'av.audio.frame.AudioFrame' object cannot be interpreted as
an integer
I don't really know much about audio files so i'm not sure how to debug and would appreciate help here.
the full code to load the audio is below.
def load_audio(file, sr, offset, duration, resample=True, approx=False, time_base='samples', check_duration=True):
if time_base == 'sec':
offset = offset * sr
duration = duration * sr
# Loads at target sr, stereo channels, seeks from offset, and stops after duration
container = av.open(file)
audio = container.streams.get(audio=0)[0] # Only first audio stream
audio_duration = audio.duration * float(audio.time_base)
if approx:
if offset + duration > audio_duration*sr:
# Move back one window. Cap at audio_duration
offset = np.min(audio_duration*sr - duration, offset - duration)
else:
if check_duration:
assert offset + duration <= audio_duration*sr, f'End {offset + duration} beyond duration {audio_duration*sr}'
if resample:
resampler = av.AudioResampler(format='fltp',layout='stereo', rate=sr)
else:
assert sr == audio.sample_rate
offset = int(offset / sr / float(audio.time_base)) #int(offset / float(audio.time_base)) # Use units of time_base for seeking
duration = int(duration) #duration = int(duration * sr) # Use units of time_out ie 1/sr for returning
sig = np.zeros((2, duration), dtype=np.float32)
container.seek(offset, stream=audio)
total_read = 0
for frame in container.decode(audio=0): # Only first audio stream
if resample:
frame.pts = None
frame = resampler.resample(frame)
frame = frame.to_ndarray(format='fltp') # Convert to floats and not int16
read = frame.shape[-1]
if total_read + read > duration:
read = duration - total_read
sig[:, total_read:total_read + read] = frame[:, :read]
total_read += read
if total_read == duration:
break
assert total_read <= duration, f'Expected {duration} frames, got {total_read}'
return sig, sr
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您的变量
frame
将其解释为列表,则可以用frame = resampler.resample(frame)
frame = resampler.resampler.resample(frame)[0] [0] 。一旦我进行了编辑,您的代码就不会出错。If your variable
frame
is interpreted as a list, you could replaceframe = resampler.resample(frame)
withframe = resampler.resample(frame)[0]
. Your code ran without errors once I made this edit.尝试替换
frame.to.to_ndarray(format ='fltp')
通过变量frame> frame
:如果您希望它是特定的数据类型,则可以更改
dtype
ndarray的参数
函数:Try replacing
frame = frame.to_ndarray(format='fltp')
by a direct assignation of the variableframe
:If you want it to be a specific data type, you can change the
dtype
argument of thendarray
function:尝试:
frame = frage [0] .to_ndarray(格式='fltp')
Try:
frame = frame[0].to_ndarray(format='fltp')