在张量流中过滤音频信号

发布于 2025-01-20 21:50:06 字数 1420 浏览 0 评论 0原文

我正在建立一个基于音频的深度学习模型。作为预访问的一部分,我想在数据集中增加音频。我想做的一种增强是应用RIR(房间脉冲响应)功能。我正在使用Python 3.9.5Tensorflow 2.8

在Python中,这样做的标准方法是,如果RIR作为 n taps的有限脉冲响应(FIR)给出,则使用 scipy lfilter

import numpy as np
from scipy import signal
import soundfile as sf

h = np.load("rir.npy")
x, fs = sf.read("audio.wav")

y = signal.lfilter(h, 1, x)

在所有文件上循环运行可能需要很长时间。使用TensorFlow MAP TensorFlow数据集上的实用程序:

# define filter function
def h_filt(audio, label):
    h = np.load("rir.npy")
    x = audio.numpy()
    y = signal.lfilter(h, 1, x)
    return tf.convert_to_tensor(y, dtype=tf.float32), label

# apply it via TF map on dataset
aug_ds = ds.map(h_filt)

使用tf.numpy_function

tf_h_filt = tf.numpy_function(h_filt, [audio, label], [tf.float32, tf.string])

# apply it via TF map on dataset
aug_ds = ds.map(tf_h_filt)

我有两个问题:

  1. 这种方式正确且足够快(对于50,000个文件少于一分钟) ?
  2. 有更快的方法吗?例如,用内置的TensForflow功能替换Scipy函数。我没有找到等效的lfilter Scipy的卷曲

I am building an audio-based deep learning model. As part of the preporcessing I want to augment the audio in my datasets. One augmentation that I want to do is to apply RIR (room impulse response) function. I am working with Python 3.9.5 and TensorFlow 2.8.

In Python the standard way to do it is, if the RIR is given as a finite impulse response (FIR) of n taps, is using SciPy lfilter

import numpy as np
from scipy import signal
import soundfile as sf

h = np.load("rir.npy")
x, fs = sf.read("audio.wav")

y = signal.lfilter(h, 1, x)

Running in loop on all the files may take a long time. Doing it with TensorFlow map utility on TensorFlow datasets:

# define filter function
def h_filt(audio, label):
    h = np.load("rir.npy")
    x = audio.numpy()
    y = signal.lfilter(h, 1, x)
    return tf.convert_to_tensor(y, dtype=tf.float32), label

# apply it via TF map on dataset
aug_ds = ds.map(h_filt)

Using tf.numpy_function:

tf_h_filt = tf.numpy_function(h_filt, [audio, label], [tf.float32, tf.string])

# apply it via TF map on dataset
aug_ds = ds.map(tf_h_filt)

I have two questions:

  1. Is this way correct and fast enough (less than a minute for 50,000 files)?
  2. Is there a faster way to do it? E.g. replace the SciPy function with a built-in TensforFlow function. I didn't find the equivalent of lfilter or SciPy's convolve.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夜灵血窟げ 2025-01-27 21:50:06

这是您可以执行的一种方法

请注意,张量流函数设计用于接收具有多个通道的批量输入,并且过滤器可以具有多个输入通道和多个输出通道。令 N 为批次大小 I、输入通道数、F 滤波器宽度、L输入宽度和输出通道数。使用 padding='SAME' 它映射形状 (N, L, I) 的输入和形状 (F, I, O) 的过滤器code> 到形状 (N, L, O) 的输出。

import numpy as np
from scipy import signal
import tensorflow as tf

# data to compare the two approaches
x = np.random.randn(100)
h = np.random.randn(11)

# h
y_lfilt = signal.lfilter(h, 1, x)

# Since the denominator of your filter transfer function is 1
# the output of lfiler matches the convolution
y_np = np.convolve(h, x)
assert np.allclose(y_lfilt, y_np[:len(y_lfilt)])

# now let's do the convolution using tensorflow
y_tf = tf.nn.conv1d(
    # x must be padded with half of the size of h
    # to use padding 'SAME'
    np.pad(x, len(h) // 2).reshape(1, -1, 1), 
    # the time axis of h must be flipped
    h[::-1].reshape(-1, 1, 1), # a 1x1 matrix of filters
    stride=1, 
    padding='SAME', 
    data_format='NWC')

assert np.allclose(y_lfilt, np.squeeze(y_tf)[:len(y_lfilt)])

Here is one way you could do

Notice that tensor flow function is designed to receive batches of inputs with multiple channels, and the filter can have multiple input channels and multiple output channels. Let N be the size of the batch I, the number of input channels, F the filter width, L the input width and O the number of output channels. Using padding='SAME' it maps an input of shape (N, L, I) and a filter of shape (F, I, O) to an output of shape (N, L, O).

import numpy as np
from scipy import signal
import tensorflow as tf

# data to compare the two approaches
x = np.random.randn(100)
h = np.random.randn(11)

# h
y_lfilt = signal.lfilter(h, 1, x)

# Since the denominator of your filter transfer function is 1
# the output of lfiler matches the convolution
y_np = np.convolve(h, x)
assert np.allclose(y_lfilt, y_np[:len(y_lfilt)])

# now let's do the convolution using tensorflow
y_tf = tf.nn.conv1d(
    # x must be padded with half of the size of h
    # to use padding 'SAME'
    np.pad(x, len(h) // 2).reshape(1, -1, 1), 
    # the time axis of h must be flipped
    h[::-1].reshape(-1, 1, 1), # a 1x1 matrix of filters
    stride=1, 
    padding='SAME', 
    data_format='NWC')

assert np.allclose(y_lfilt, np.squeeze(y_tf)[:len(y_lfilt)])
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文