语音处理中的矢量量化讲解

发布于 2024-08-21 21:03:41 字数 1593 浏览 13 评论 0原文

我无法从这个确定研究论文准确地描述了我如何根据训练数据集重现标准矢量量化算法来确定未识别语音输入的语言。以下是一些基本信息：

摘要信息 利用声学特征的语言识别（例如日语、英语、德语等）是当前语音的一个重要而又困难的问题技术。 ...本文使用的语音数据库包含20种语言：16种 4 名男性和 4 名女性说出了两次句子。每个的持续时间句子大约8秒。第一个算法基于标准矢量量化 (VQ) 技术。每种语言都有其特点通过其自己的 VQ 代码本 $alt text$ 。

识别算法 第一种算法基于标准矢量量化 (VQ) 技术。每种语言 k 都有其自己的 VQ 码本 $alt text$ 的特征。在识别阶段，输入语音通过 $alt text$ 进行量化，并计算累积的量化失真 d_k。被识别为最小失真的语言。计算 VQ 失真，应用几种 LPC 频谱失真测量...在本例中，WLR——加权最小比率——距离：

标准 VQ 算法： 密码本，

, for each language is generated using training sentences. The accumulated distance for input vector in sentence, ![alt text][4], is defined as: [![alt text][5]][5]

距离d可以是与声学特征相对应的任意距离，并且必须与用于生成码本的距离相同。每种语言都有其 VQ 码本 $alt text$ 的特征。

我的问题是，我到底该怎么做？我有一组 50 个英语句子。在 MATLAB 中，我可以轻松计算任何给定信号的 WLR。但是，我如何制定密码本，因为我必须使用 WLR 进行英语的“密码本生成”。我也很好奇如何将大小为 16 的 VQ 码本（被发现是最佳大小）与给定的输入信号进行比较。如果有人能帮我提炼这篇论文，我将不胜感激。

谢谢！

原文

I'm having trouble determining from this research paper exactly how I can reproduce the Standard Vector Quantization algorithm to determine the language of an unidentified speech input, based on a training set of data. Here's some basic info:

Abstract info
Language recognition (e.g. Japanese, English, German, etc) using acoustic features is an important yet difficult problem for current speech
technology. ... The speech data base used in this paper contains 20 languages: 16
sentences uttered twice by 4 males and 4 females. The duration of each
sentence is about 8 seconds. The first algorithm is based on the standard
Vector Quantization (VQ) technique. Every language is characterized
by its own VQ codebook, $alt text$ .

Recognition Algorithms
The first algorithm is based on the standard Vector Quantization (VQ) technique. Every language, k, is characterized by its own VQ codebook, $alt text$ . In the recognition stage input speech is quantized by $alt text$ and the accumulated quantization distortion, d_k, is calculated. The language which as the minimal distortion is recognized. Calcualating VQ distortion, several LPC spectral distortion measures are applied...in this case, the WLR -- weighted least ratio -- distance:

Standard VQ Algorithm:
A codebook,

, for each language is generated using training sentences. The accumulated distance for input vector in sentence, ![alt text][4], is defined as: [![alt text][5]][5]

The distance d can be any distance which corresponds to the acoustic features and it must be the same as the one used for codebook generation. Each language is characterized by its VQ codebook, $alt text$ .

My question is, how exactly do I do this? I have a set of 50 sentences in English. In MATLAB, I can easily calculated the WLR for any given signal. But, how do I formulate a codebook, since I must use the WLR for "codebook generation" for English. I'm also curious as to how to compare a VQ codebook of size 16 (which was found to be the best size), to a given input signal. If anyone could help distill this paper down for me, I'd appreciate it greatly.

Thanks!

分享到QQ

分享到微博