Matlab主成分分析(特征值阶)

发布于 2024-10-17 02:13:05 字数 357 浏览 1 评论 0原文

我想使用 Matlab 的“princomp”函数,但该函数给出排序数组中的特征值。这样我就无法找出哪一列对应哪个特征值。 对于 Matlab,

m = [1,2,3;4,5,6;7,8,9];
[pc,score,latent] = princomp(m);

与 相同 即

m = [2,1,3;5,4,6;8,7,9];
[pc,score,latent] = princomp(m);

交换前两列不会改变任何内容。潜在的结果(特征值)将为:(27,0,0) 信息(哪个特征值对应于哪个原始(输入)列)丢失了。 有没有办法告诉matlab不要对特征值进行排序?

I want to use the "princomp" function of Matlab but this function gives the eigenvalues in a sorted array. This way I can't find out to which column corresponds which eigenvalue.
For Matlab,

m = [1,2,3;4,5,6;7,8,9];
[pc,score,latent] = princomp(m);

is the same as

m = [2,1,3;5,4,6;8,7,9];
[pc,score,latent] = princomp(m);

That is, swapping the first two columns does not change anything. The result (eigenvalues) in latent will be: (27,0,0)
The information (which eigenvalue corresponds to which original (input) column) is lost.
Is there a way to tell matlab to not to sort the eigenvalues?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

枫林﹌晚霞¤ 2024-10-24 02:13:05

使用 PCA,返回的每个主成分将是原始列/维度的线性组合。也许一个例子可以消除您的任何误解。

让我们考虑由 150 个实例和 4 个维度组成的 Fisher-Iris 数据集,并对数据应用 PCA。为了让事情更容易理解,我在调用 PCA 函数之前首先将数据归零:

load fisheriris
X = bsxfun(@minus, meas, mean(meas));    %# so that mean(X) is the zero vector

[PC score latent] = princomp(X);

让我们看看第一个返回的主成分(PC 矩阵的第一列):

>> PC(:,1)
      0.36139
    -0.084523
      0.85667
      0.35829

这表示为线性组合因此,

PC1 =  0.36139*dim1 + -0.084523*dim2 + 0.85667*dim3 + 0.35829*dim4

为了在主成分形成的新坐标系中表达相同的数据,新的第一维度应该是原始维度根据上式的线性组合。

我们可以简单地将其计算为 X*PC ,这正是 PRINCOMP (score) 的第二个输出中返回的内容,以确认此尝试:

>> all(all( abs(X*PC - score) < 1e-10 ))
    1

最后,每个的重要性主成分可以通过它解释的数据的方差多少来确定。这是由 PRINCOMP (latent) 的第三个输出返回的。


我们可以自己计算数据的 PCA,而无需使用 PRINCOMP:

[V E] = eig( cov(X) );
[E order] = sort(diag(E), 'descend');
V = V(:,order);

协方差矩阵 V 的特征向量是主成分(与上面的 PC 相同,尽管符号可以反转) ),相应的特征值 E 表示解释的方差量(与 latent 相同)。请注意,习惯上按主成分的特征值对其进行排序。和以前一样,为了表达新坐标中的数据,我们只需计算 X*V (如果确保匹配符号,则应与上面的 score 相同) )

With PCA, each principle component returned will be a linear combination of the original columns/dimensions. Perhaps an example might clear up any misunderstanding you have.

Lets consider the Fisher-Iris dataset comprising of 150 instances and 4 dimensions, and apply PCA on the data. To make things easier to understand, I am first zero-centering the data before calling PCA function:

load fisheriris
X = bsxfun(@minus, meas, mean(meas));    %# so that mean(X) is the zero vector

[PC score latent] = princomp(X);

Lets look at the first returned principal component (1st column of PC matrix):

>> PC(:,1)
      0.36139
    -0.084523
      0.85667
      0.35829

This is expressed as a linear combination of the original dimensions, i.e.:

PC1 =  0.36139*dim1 + -0.084523*dim2 + 0.85667*dim3 + 0.35829*dim4

Therefore to express the same data in the new coordinates system formed by the principal components, the new first dimension should be a linear combination of the original ones according to the above formula.

We can compute this simply as X*PC which is the exactly what is returned in the second output of PRINCOMP (score), to confirm this try:

>> all(all( abs(X*PC - score) < 1e-10 ))
    1

Finally the importance of each principal component can be determined by how much variance of the data it explains. This is returned by the third output of PRINCOMP (latent).


We can compute the PCA of the data ourselves without using PRINCOMP:

[V E] = eig( cov(X) );
[E order] = sort(diag(E), 'descend');
V = V(:,order);

the eigenvectors of the covariance matrix V are the principal components (same as PC above, although the sign can be inverted), and the corresponding eigenvalues E represent the amount of variance explained (same as latent). Note that it is customary to sort the principal component by their eigenvalues. And as before, to express the data in the new coordinates, we simply compute X*V (should be the same as score above, if you make sure to match the signs)

菩提树下叶撕阳。 2024-10-24 02:13:05

“信息(哪个特征值对应于哪个原始(输入)列)丢失了。”

由于每个主成分是所有输入变量的线性函数,因此每个主成分(特征向量、特征值)对应于所有原始输入列。忽略 PCA 中任意的符号可能变化,对输入变量重新排序不会改变 PCA 结果。

“有没有办法告诉 matlab 不要对特征值进行排序?”

我对此表示怀疑:PCA(以及一般的特征分析)通常按方差对结果进行排序,尽管我注意到 princomp() 从最大方差到最小方差排序,而 eig() 按相反方向排序。

有关使用 MATLAB 插图(无论是否有 princomp())的 PCA 的更多说明,请参阅:

主成分分析

"The information (which eigenvalue corresponds to which original (input) column) is lost."

Since each principal component is a linear function of all input variables, each principal component (eigenvector, eigenvalue), corresponds to all of the original input columns. Ignoring possible changes in sign, which are arbitrary in PCA, re-ordering the input variables about will not change the PCA results.

"Is there a way to tell matlab to not to sort the eigenvalues?"

I doubt it: PCA (and eigen analysis in general) conventionally sorts the results by variance, though I'd note that princomp() sorts from greatest to least variance, while eig() sorts in the opposite direction.

For more explanation of PCA using MATLAB illustrations, with or without princomp(), see:

Principal Components Analysis

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文