当前位置：文江博客话题详情

顺序特征选择Matlab

发布于 2024-12-18 09:50:51 字数 102 浏览 1 评论 0原文

有人能解释一下如何在Matlab中使用这个函数吗 “sequentialfs”

看起来很直接，但我不知道我们如何为它设计一个函数处理程序？！

有什么线索吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

两人的回忆 2024-12-25 09:50:51

这是一个比文档中的示例更简单的示例。

首先让我们创建一个非常简单的数据集。我们有一些类标签y。 500 个来自0 类，500 个来自1 类，并且它们是随机排序的。

>> y = [zeros(500,1); ones(500,1)];
>> y = y(randperm(1000));

我们有 100 个变量 x，我们想用它们来预测 y。其中 99 个只是随机噪声，但其中一个与类别标签高度相关。

>> x = rand(1000,99);
>> x(:,100) = y + rand(1000,1)*0.1;

现在假设我们要使用线性判别分析对点进行分类。如果我们要直接执行此操作而不应用任何特征选择，我们首先将数据分为训练集和测试集：

>> xtrain = x(1:700, :); xtest = x(701:end, :);
>> ytrain = y(1:700); ytest = y(701:end);

然后我们将对它们进行分类：

>> ypred = classify(xtest, xtrain, ytrain);

的错误率：

>> sum(ytest ~= ypred)
ans =
     0

最后我们将测量预测这样我们就得到了完美的分类。

要使函数句柄与 sequentialfs 一起使用，只需将这些部分放在一起：

>> f = @(xtrain, ytrain, xtest, ytest) sum(ytest ~= classify(xtest, xtrain, ytrain));

并将它们全部一起传递到 sequentialfs 中：

>> fs = sequentialfs(f,x,y)
fs =
  Columns 1 through 16
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 17 through 32
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 33 through 48
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 49 through 64
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 65 through 80
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 81 through 96
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 97 through 100
     0     0     0     1

最终的1输出中的内容表明，正如预期的那样，变量 100 是 x 中变量中 y 的最佳预测变量。

sequentialfs 文档中的示例稍微复杂一些，主要是因为预测的类标签是字符串而不是上面的数值，因此使用 ~strcmp 来计算错误率而不是 ~=。此外，它利用交叉验证来估计错误率，而不是如上所述的直接评估。

Here's a simpler example than the one in the documentation.

First let's create a very simple dataset. We have some class labels y. 500 are from class 0, and 500 are from class 1, and they are randomly ordered.

>> y = [zeros(500,1); ones(500,1)];
>> y = y(randperm(1000));

And we have 100 variables x that we want to use to predict y. 99 of them are just random noise, but one of them is highly correlated with the class label.

>> x = rand(1000,99);
>> x(:,100) = y + rand(1000,1)*0.1;

Now let's say we want to classify the points using linear discriminant analysis. If we were to do this directly without applying any feature selection, we would first split the data up into a training set and a test set:

>> xtrain = x(1:700, :); xtest = x(701:end, :);
>> ytrain = y(1:700); ytest = y(701:end);

Then we would classify them:

>> ypred = classify(xtest, xtrain, ytrain);

And finally we would measure the error rate of the prediction:

>> sum(ytest ~= ypred)
ans =
     0

and in this case we get perfect classification.

To make a function handle to be used with sequentialfs, just put these pieces together:

>> f = @(xtrain, ytrain, xtest, ytest) sum(ytest ~= classify(xtest, xtrain, ytrain));

And pass all of them together into sequentialfs:

>> fs = sequentialfs(f,x,y)
fs =
  Columns 1 through 16
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 17 through 32
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 33 through 48
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 49 through 64
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 65 through 80
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 81 through 96
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 97 through 100
     0     0     0     1

The final 1 in the output indicates that variable 100 is, as expected, the best predictor of y among the variables in x.

The example in the documentation for sequentialfs is a little more complex, mostly because the predicted class labels are strings rather than numerical values as above, so ~strcmp is used to calculate the error rate rather than ~=. In addition it makes use of cross-validation to estimate the error rate, rather than direct evaluation as above.

回复收藏 0 原文

~没有更多了~