使用 Octave 构建一种算法来切断语音信号的停顿

发布于 2025-01-17 02:09:29 字数 438 浏览 2 评论 0原文

您能解释一下如何做到这一点吗? (使用 Octave,构建一种切断语音信号停顿的算法)

音频文件: https://www.dropbox.com/s/34ait9wo4b1j1ld/test1.ogg?dl=1

这是我的计划:

  1. 读取音频文件
  2. 创建一个滤波器核心来滤除高频噪声、信号中的“峰值”,尤其是在休息期间
  3. 使用卷积应用此滤波器
  4. 为滤波后的音频信号设置阈值:暂停为 0,其余为 1
  5. 。结合原始信号和滤波后的信号,不间断地构建新信号。

问题是我不知道如何在 Octave 中启动/执行它。我只知道一点理论。

Can you please explain how to do this? (Using Octave, construct an algorithm that cuts off pauses for a speech signal)

The audio file: https://www.dropbox.com/s/34ait9wo4b1j1ld/test1.ogg?dl=1

Here is my plan:

  1. Read the audio file
  2. Create a filter core to filter out high-frequency noise, "peaks" in the signal, especially during breaks
  3. Apply this filter using convolution
  4. Set a threshold for the filtered audio signal: pauses to 0, the rest to 1.
  5. By combining the original and filtered signals, construct a new signal without pauses.

The issue is that I don't know how to start / do it in Octave. I only know a bit of theory.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

┾廆蒐ゝ 2025-01-24 02:09:29

部分答案,帮助您入门。

对于第 1 步,手册中介绍了 Octave 的基本音频处理功能。

要检查您的文件,请使用 audioinfo 命令(此示例假设 .ogg 文件位于 Octave 当前的工作目录中):

info = audioinfo ('test1.ogg')

它应该为您提供一个包含音频元数据的结构(压缩方法) 、通道数、采样率等)。

要读取文件,请使用 audioread 命令:

[y, fs] = audioread ('test1.ogg');

其中 y 是列向量(如果有单个通道)(如果有多个通道,则为列中的每个通道),fs 是采样频率。

要播放音频,请使用:

player = audioplayer (y, fs); 
play (player);

对于步骤 2 和 3,您需要获取信号处理 软件包,查看如何安装软件包 这里

我知道您想要一个低通滤波器,例如 FIR 滤波器,或巴特沃斯滤波器。两者默认都是低通的。

例如,对于具有截止 pi*Wc 的 n 阶巴特沃斯滤波器,创建并应用滤波器:

[b, a] = butter (n, wc);
yf = filter (b, a, y);

而对于 n 阶 FIR 滤波器截止 pi*Wc

b = fir1 (n, Wc);
yf = filter (b, 1, y);

我猜对于 FIR 滤波器来说,过滤器conv 的工作方式类似,但要使用 conv,您需要获取输入的多项式系数(请参阅此处< /a>)。

周围有许多令人毛骨悚然的细节,这些细节很大程度上取决于您的目标以及您的数据是什么。事情可能像描述的那样复杂 此处,请参阅此代码

A partial answer, to get you started.

For step 1, basic Octave's audio processing features are described in the manual.

To check your file use the audioinfo command (this example assumes .ogg file is in Octave's current working directoy):

info = audioinfo ('test1.ogg')

which should give you an struct with the audio metadata (compression method, number of channels, sample rate, etc.).

To read the file use the audioread command:

[y, fs] = audioread ('test1.ogg');

where y is a column vector if you have a single channel (or each channel in a column if more channels), and fs is the sampling frequency.

To play the audio use:

player = audioplayer (y, fs); 
play (player);

For steps 2 and 3 you'll need to get the signal processing package, see how to install packages here.

I understand you want a low-pass filter like a FIR filter, or a Butterworth filter. Both do low-pass as default.

For example, for a n-order Butterworth filter with cutoff pi*Wc, create and aplly filter:

[b, a] = butter (n, wc);
yf = filter (b, a, y);

Whereas for a n-order FIR filter with cutoff pi*Wc:

b = fir1 (n, Wc);
yf = filter (b, 1, y);

I guess that for FIR filters both filter and conv work similarily, but for using conv you need to take the input's polynomial coefficients (see here).

There are many creepy details around, which strongly depend on your goals, and what your data is. Things can be as complicated as described here, see this code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文