在张量流中过滤音频信号
我正在建立一个基于音频的深度学习模型。作为预访问的一部分,我想在数据集中增加音频。我想做的一种增强是应用RIR(房间脉冲响应)功能。我正在使用Python 3.9.5
和Tensorflow 2.8
。
在Python中,这样做的标准方法是,如果RIR作为 n taps的有限脉冲响应(FIR)给出,则使用 scipy lfilter
import numpy as np
from scipy import signal
import soundfile as sf
h = np.load("rir.npy")
x, fs = sf.read("audio.wav")
y = signal.lfilter(h, 1, x)
在所有文件上循环运行可能需要很长时间。使用TensorFlow MAP
TensorFlow数据集上的实用程序:
# define filter function
def h_filt(audio, label):
h = np.load("rir.npy")
x = audio.numpy()
y = signal.lfilter(h, 1, x)
return tf.convert_to_tensor(y, dtype=tf.float32), label
# apply it via TF map on dataset
aug_ds = ds.map(h_filt)
使用tf.numpy_function
:
tf_h_filt = tf.numpy_function(h_filt, [audio, label], [tf.float32, tf.string])
# apply it via TF map on dataset
aug_ds = ds.map(tf_h_filt)
我有两个问题:
- 这种方式正确且足够快(对于50,000个文件少于一分钟) ?
- 有更快的方法吗?例如,用内置的TensForflow功能替换Scipy函数。我没有找到等效的
lfilter
或 Scipy的卷曲。
I am building an audio-based deep learning model. As part of the preporcessing I want to augment the audio in my datasets. One augmentation that I want to do is to apply RIR (room impulse response) function. I am working with Python 3.9.5
and TensorFlow 2.8
.
In Python the standard way to do it is, if the RIR is given as a finite impulse response (FIR) of n taps, is using SciPy lfilter
import numpy as np
from scipy import signal
import soundfile as sf
h = np.load("rir.npy")
x, fs = sf.read("audio.wav")
y = signal.lfilter(h, 1, x)
Running in loop on all the files may take a long time. Doing it with TensorFlow map
utility on TensorFlow datasets:
# define filter function
def h_filt(audio, label):
h = np.load("rir.npy")
x = audio.numpy()
y = signal.lfilter(h, 1, x)
return tf.convert_to_tensor(y, dtype=tf.float32), label
# apply it via TF map on dataset
aug_ds = ds.map(h_filt)
Using tf.numpy_function
:
tf_h_filt = tf.numpy_function(h_filt, [audio, label], [tf.float32, tf.string])
# apply it via TF map on dataset
aug_ds = ds.map(tf_h_filt)
I have two questions:
- Is this way correct and fast enough (less than a minute for 50,000 files)?
- Is there a faster way to do it? E.g. replace the SciPy function with a built-in TensforFlow function. I didn't find the equivalent of
lfilter
or SciPy's convolve.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是您可以执行的一种方法
请注意,张量流函数设计用于接收具有多个通道的批量输入,并且过滤器可以具有多个输入通道和多个输出通道。令
N
为批次大小I
、输入通道数、F
滤波器宽度、L
输入宽度和输出通道数。使用padding='SAME'
它映射形状(N, L, I)
的输入和形状(F, I, O)
的过滤器code> 到形状(N, L, O)
的输出。Here is one way you could do
Notice that tensor flow function is designed to receive batches of inputs with multiple channels, and the filter can have multiple input channels and multiple output channels. Let
N
be the size of the batchI
, the number of input channels,F
the filter width,L
the input width andO
the number of output channels. Usingpadding='SAME'
it maps an input of shape(N, L, I)
and a filter of shape(F, I, O)
to an output of shape(N, L, O)
.