语音处理中的矢量量化讲解

发布于 2024-08-21 21:03:41 字数 1593 浏览 5 评论 0原文

我无法从 这个 确定研究论文准确地描述了我如何根据训练数据集重现标准矢量量化算法来确定未识别语音输入的语言。以下是一些基本信息:

摘要信息 利用声学特征的语言识别(例如日语、英语、德语等)是当前语音的一个重要而又困难的问题 技术。 ...本文使用的语音数据库包含20种语言:16种 4 名男性和 4 名女性说出了两次句子。每个的持续时间 句子大约8秒。第一个算法基于标准 矢量量化 (VQ) 技术。每种语言都有其特点 通过其自己的 VQ 代码本 alt text

识别算法 第一种算法基于标准矢量量化 (VQ) 技术。每种语言 k 都有其自己的 VQ 码本 alt text 的特征。在识别阶段,输入语音通过 alt text 进行量化,并计算累积的量化失真 d_k。被识别为最小失真的语言。计算 VQ 失真,应用几种 LPC 频谱失真测量...在本例中,WLR——加权最小比率——距离:

.

标准 VQ 算法: 密码本,alt text

, for each language is generated using training sentences. The accumulated distance for input vector in sentence, ![alt text][4], is defined as: [![alt text][5]][5]

距离d可以是与声学特征相对应的任意距离,并且必须与用于生成码本的距离相同。每种语言都有其 VQ 码本 alt text 的特征。

我的问题是,我到底该怎么做?我有一组 50 个英语句子。在 MATLAB 中,我可以轻松计算任何给定信号的 WLR。但是,我如何制定密码本,因为我必须使用 WLR 进行英语的“密码本生成”。我也很好奇如何将大小为 16 的 VQ 码本(被发现是最佳大小)与给定的输入信号进行比较。如果有人能帮我提炼这篇论文,我将不胜感激。

谢谢!

I'm having trouble determining from this research paper exactly how I can reproduce the Standard Vector Quantization algorithm to determine the language of an unidentified speech input, based on a training set of data. Here's some basic info:

Abstract info
Language recognition (e.g. Japanese, English, German, etc) using acoustic features is an important yet difficult problem for current speech
technology. ... The speech data base used in this paper contains 20 languages: 16
sentences uttered twice by 4 males and 4 females. The duration of each
sentence is about 8 seconds. The first algorithm is based on the standard
Vector Quantization (VQ) technique. Every language is characterized
by its own VQ codebook, alt text.

Recognition Algorithms
The first algorithm is based on the standard Vector Quantization (VQ) technique. Every language, k, is characterized by its own VQ codebook, alt text. In the recognition stage input speech is quantized by alt text and the accumulated quantization distortion, d_k, is calculated. The language which as the minimal distortion is recognized. Calcualating VQ distortion, several LPC spectral distortion measures are applied...in this case, the WLR -- weighted least ratio -- distance:

.

Standard VQ Algorithm:
A codebook, alt text

, for each language is generated using training sentences. The accumulated distance for input vector in sentence, ![alt text][4], is defined as: [![alt text][5]][5]

The distance d can be any distance which corresponds to the acoustic features and it must be the same as the one used for codebook generation. Each language is characterized by its VQ codebook, alt text.

My question is, how exactly do I do this? I have a set of 50 sentences in English. In MATLAB, I can easily calculated the WLR for any given signal. But, how do I formulate a codebook, since I must use the WLR for "codebook generation" for English. I'm also curious as to how to compare a VQ codebook of size 16 (which was found to be the best size), to a given input signal. If anyone could help distill this paper down for me, I'd appreciate it greatly.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

dawn曙光 2024-08-28 21:03:41

第二个问题(将码本与给定信号进行比较)更简单:对于每个码本条目 V_k_j,您必须计算与输入信号的距离 d。具有最小距离“d”的“j”将对应于最适合的密码本条目。作为距离函数,您可以使用 WLR

构建码本(trainig)有点复杂。您必须将句子划分为长度为 N (16) 的向量,然后使用某种聚类算法(如 k-means)对这些向量进行聚类。然后找到每个簇的平均值。这意味着并且将是密码本条目。这是我首先想到的事情。

另一种算法(我相信,它会更好)可以在这里找到。
此外,Wikipedia 中描述了两种简单的训练算法

The second question (compare codebook to given signal) is more easy: for each codebook entry V_k_j you must calculate distance d with input signal. The 'j' with smallest distance 'd' will corespond to best fitted codebook entry. As a distance function you can use WLR

Building codebook (trainig) is bit more complicated. You must divide you sentences to vectors with lenght N (16) and then use some clustering algorithm (like k-means) to cluster these vectors. Then find mean in every cluster. This mean and will be codebook entry. It is a fisrt thing that comes to mind.

Another algorithm (I believe, it will be better) can be found here.
Also, two simple training algorithms are described in Wikipedia

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文