通过PCA进行特征提取
我正在尝试从包含 63 个样本的 2000 个特征的数据中选择一个特征子集。现在我知道如何在 MATLAB 中进行 PCA 了。我使用了“pcacov”,它也返回特征向量和特征值。但是,我不知道如何选择我想要的功能。我的意思是,如果功能没有标记,我如何选择我的功能?或者它们会以相同的顺序返回?
I'm trying to select a subset of features from a data that contains 2000 of them for 63 samples. Now I know how to do PCA in MATLAB. I used 'pcacov' and it returns the eigenvectors and the eigenvalues too. However, I don't know how to select the features I want. I mean if the features aren't labeled, how can I select my features ? or they will be returned in the same order ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您这样称呼它
,那么主成分就是第一个返回参数中的向量,其方差与第二个返回参数中的一样。它们是对应的,并从最重要到最不重要排序。
如果函数帮助这么说,你可以假设这一点,否则这样假设是不安全的,你可以做类似的事情。
varsorted
和pcsorted
将从最重要到最不重要的顺序排列。7年后编辑:我在重新阅读这个问题时意识到我的答案实际上并没有回答这个问题。我认为所问的是主要成分是否已排序。唐·雷巴的回答是对实际问题的回答。但我无法删除选定的答案。
If you call it like
then the principal components are the vectors in the first return argument with variances as in the second return argument. They are in correspondence and sorted from most significant to least significant.
You can assume this if the function help says so, otherwise it's not safe to assume so and you can do something like.
And
varsorted
andpcsorted
will be in order from most to least significant.Edit 7 years later: I realized in re-reading the question that my answer doesn't actually answer this. I thought what was being asked was are the principal components sorted. Don Reba's answer is an answer to the actual question asked. I can't delete a selected answer though.
PCA 不会告诉您哪些特征最重要,而是告诉您哪些特征组合保持最大方差。
PCA 的作用是旋转数据集,使其沿第一个维度具有最大方差,沿第二个维度具有第二大方差,依此类推。因此,当您将特征向量乘以前 N 个特征向量时,您所做的就是旋转该集合并保留前 N 个维度,以将向量转换为保留大部分方差的低维表示。
PCA does not tell you which features are the most significant, but which combinations of features keep the most variance.
What PCA does is rotate your dataset in such a way that it has the most variance along the first dimension, second most along second, and so on. So, what you do when you multiply your feature vectors by the first N eigenvectors is rotate the set and keep the first N dimensions to transform your vectors into a lower-dimensional representation that keeps most of the variance.