如何计算每个簇的协方差矩阵,例如 k 均值?
我到处搜索,只找到了如何创建从一个向量到另一个向量的协方差矩阵,例如 cov(xi, xj)。我感到困惑的一件事是,如何从集群中获取协方差矩阵。每个簇有许多向量。如何将它们放入一个协方差矩阵中。有什么建议吗?
信息:
输入:簇中的向量,Xi = (x0,x1,...,xt), x0 = { 5 1 2 3 4} -->一个列向量
(实际上它是一个MFCC特征向量,每个向量有12个系数,在用k-means对它们进行聚类后,8个簇,现在我想获得每个簇的协方差矩阵,以将其用作高斯混合模型中的协方差矩阵)
output : covariance matrix n x n
I've been searching everywhere and I've only found how to create a covariance matrix from one vector to another vector, like cov(xi, xj). One thing I'm confused about is, how to get a covariance matrix from a cluster. Each cluster has many vectors. how to get them into one covariance matrix. Any suggestions??
info :
input : vectors in a cluster, Xi = (x0,x1,...,xt), x0 = { 5 1 2 3 4} --> a column vector
(actually it's an MFCC feature vector which has 12 coefficients per vector, after clustering them with k-means, 8 cluster, now i want to get the covariance matrix for each cluster to use it as the covariance matrix in Gaussian Mixture Model)
output : covariance matrix n x n
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您要问的问题是:给定一组 D 维的 N 个点(例如您最初聚类为“speaker1”的点),将 D 维高斯拟合到这些点 strong>(我们将其称为“代表扬声器 1 的高斯”)。为此,只需计算样本均值和样本协方差:http:// /en.wikipedia.org/wiki/Multivariate_normal_distribution#Estimation_of_parameters 或 http://en.wikipedia.org/wiki/Sample_mean_and_covariance
对其他 k=8 个发言者重复此操作。我相信您也许能够使用“非参数”随机过程,或修改算法(例如在许多扬声器上运行几次),以删除 k=8 个扬声器的假设。请注意,标准 k 均值聚类算法(以及其他常见算法,如 EM)非常变化无常,因为它们会根据您的初始化方式给出不同的答案,因此您可能希望执行适当的正则化来惩罚“坏”解决方案发现他们。
(下面是我在澄清问题之前的回答)
协方差是两个随机变量的属性,它是一个粗略的衡量一个变量影响另一个
变量的程度 >协方差矩阵仅仅是 NxM 个独立协方差的表示,
cov(x_i,y_j)
,集合 X=(x1,x2,...,xN) 中的每个元素以及Y=(y1,y2,...,yN)所以问题归结为,您实际上想用您正在搜索的这个“协方差矩阵”做什么?梅尔频率倒谱系数...每个系数是否对应于八度音阶的每个音符?您已选择
k=12
作为您想要的集群数量?您基本上是在尝试挑选音乐中的音符吗?我不确定协方差如何推广到向量,但我猜测两个向量 x 和 y 之间的协方差只是
E[x dot y] - (E[x] dot E[y])
(基本上用点积代替乘法)这会给你一个标量,协方差矩阵的每个元素一个标量。然后你只需将这个过程放在两个 for 循环中即可。或者也许您可以分别找到每个维度的协方差矩阵。如果不确切知道你在做什么,就无法提供比这更多的建议。
The question you are asking is: Given a set of N points of dimension D (e.g. the points you initially clustered as "speaker1"), fit a D-dimensional gaussian to those points (which we will call "the gaussian which represents speaker1"). To do so, merely calculate the sample mean and sample covariance: http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Estimation_of_parameters or http://en.wikipedia.org/wiki/Sample_mean_and_covariance
Repeat for the other k=8 speakers. I believe you may be able to use a "non-parametric" stochastic process, or modify the algorithm (e.g. run it a few times on many speakers), to remove your assumption of k=8 speakers. Note that the standard k-means clustering algorithms (and other common algorithms like EM) are very fickle in that they will give you different answers depending on how you initialize, so you may wish to perform appropriate regularization to penalize "bad" solutions as you discover them.
(below is my answer before you clarified your question)
covariance is a property of two random variables, which is a rough measure of how much changing one affects the other
a covariance matrix is merely a representation for the NxM separate covariances,
cov(x_i,y_j)
, each element from the set X=(x1,x2,...,xN) and Y=(y1,y2,...,yN)So the question boils down to, what you are actually trying to do with this "covariance matrix" you are searching for? Mel-Frequency Cepstral Coefficients... does each coefficient correspond to each note of an octave? You have chosen
k=12
as the number of clusters you'd like? Are you basically trying to pick out notes in music?I'm not sure how covariance generalizes to vectors, but I would guess that the covariance between two vectors x and y is just
E[x dot y] - (E[x] dot E[y])
(basically replace multiplication with dot product) which would give you a scalar, one scalar per element of your covariance matrix. Then you would just stick this process inside two for-loops.Or perhaps you could find the covariance matrix for each dimension separately. Without knowing exactly what you're doing though, one cannot give further advice than that.