最先进的维度算法
我们知道有一些算法可以减少数据集的维度,例如 PCA 和 Isomap,
- 目前最先进的算法是什么? 降低数据集的维度。
- 你有一个例子吗,也许是在 MATLAB 上?
假设我们有一个包含 100,000 个属性的数据集,例如 Dorothea 数据集 (由结构分子特征表示的化学化合物必须分为活性(与凝血酶结合)或非活性。这是 NIPS 2003 特征选择挑战赛的 5 个数据集之一。)
Data Set Characteristics: Multivariate
Number of Instances: 1950
Area: Life
Attribute Characteristics: Integer
Number of Attributes: 100000
Date Donated 2008-02-29
Associated Tasks: Classification
Missing Values? N/A
Number of Web Hits: 17103
We know there are algorithms to reduce the dimension of data sets like PCA and Isomap
- What is the state of the art in the
reducing dimensionality to data sets. - Do you have an example, maybe on MATLAB?
Lets say we have a data set with 100,000 attributes like Dorothea Data Set
(Chemical compounds represented by structural molecular features must be classified as active (binding to thrombin) or inactive. This is one of 5 datasets of the NIPS 2003 feature selection challenge.)
Data Set Characteristics: Multivariate
Number of Instances: 1950
Area: Life
Attribute Characteristics: Integer
Number of Attributes: 100000
Date Donated 2008-02-29
Associated Tasks: Classification
Missing Values? N/A
Number of Web Hits: 17103
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
具体到Matlab,您可以从他们的统计工具箱。
查找特征选择和特征转换部分。另外,我会尝试 SVD、FastMap 和 RobustMap。您需要阅读一些相关内容,然后决定哪一个最适合您的数据。
Specific to Matlab, you can take some ideas from the manual of their Statistics Toolbox.
Look for the Feature Selection and Feature Transformation sections. Also, I would try SVD, FastMap and RobustMap. You'll need to read a bit about each and decide which one is most suitable for your data.
最大方差展开是当今特别流行的技术。一种称为“结构保留嵌入”的类似方法在 ICML 2009 上获得了最佳论文。其他一些技术包括拉普拉斯特征图、局部线性嵌入和核 PCA。
Maximum Variance Unfolding is a particularly popular technique these days. A similar approach called Structure Preserving Embedding got best paper at ICML 2009. A few other techniques include Laplacian Eigenmaps, Locally Linear Embedding, and Kernel PCA.