在 matlab 中直观地将数据分为两类

发布于 2024-08-15 19:17:45 字数 114 浏览 5 评论 0原文

我有两个数据簇,每个簇都有 x,y (坐标)和一个知道其类型的值(1 class1,2 class 2)。我已经绘制了这些数据,但我想用边界(视觉上)分割这些类。做这样的事情的功能是什么。我尝试了轮廓,但没有帮助!

I have two clusters of data each cluster has x,y (coordinates) and a value to know it's type(1 class1,2 class 2).I have plotted these data but i would like to split these classes with boundary(visually). what is the function to do such thing. i tried contour but it did not help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

顾挽 2024-08-22 19:17:45

考虑这个分类问题(使用鸢尾花数据集):

点散点图

正如您所看到的,除了您事先知道边界方程的易于分离的簇之外,找到边界并不是一项简单的任务...

一个想法是使用 判别分析函数分类以找到边界(您可以在线性和二次边界之间进行选择)。

下面是一个完整的示例来说明该过程。该代码需要统计工具箱:

%# load Iris dataset (make it binary-class with 2 features)
load fisheriris
data = meas(:,1:2);
labels = species;
labels(~strcmp(labels,'versicolor')) = {'non-versicolor'};

NUM_K = numel(unique(labels));      %# number of classes
numInst = size(data,1);             %# number of instances

%# visualize data
figure(1)
gscatter(data(:,1), data(:,2), labels, 'rb', '*o', ...
    10, 'on', 'sepal length', 'sepal width')
title('Iris dataset'), box on, axis tight

%# params
classifierType = 'quadratic';       %# 'quadratic', 'linear'
npoints = 100;
clrLite = [1 0.6 0.6 ; 0.6 1 0.6 ; 0.6 0.6 1];
clrDark = [0.7 0 0 ; 0 0.7 0 ; 0 0 0.7];

%# discriminant analysis
%# classify the grid space of these two dimensions
mn = min(data); mx = max(data);
[X,Y] = meshgrid( linspace(mn(1),mx(1),npoints) , linspace(mn(2),mx(2),npoints) );
X = X(:); Y = Y(:);
[C,err,P,logp,coeff] = classify([X Y], data, labels, classifierType);

%# find incorrectly classified training data
[CPred,err] = classify(data, data, labels, classifierType);
bad = ~strcmp(CPred,labels);

%# plot grid classification color-coded
figure(2), hold on
image(X, Y, reshape(grp2idx(C),npoints,npoints))
axis xy, colormap(clrLite)

%# plot data points (correctly and incorrectly classified)
gscatter(data(:,1), data(:,2), labels, clrDark, '.', 20, 'on');

%# mark incorrectly classified data
plot(data(bad,1), data(bad,2), 'kx', 'MarkerSize',10)
axis([mn(1) mx(1) mn(2) mx(2)])

%# draw decision boundaries between pairs of clusters
for i=1:NUM_K
    for j=i+1:NUM_K
        if strcmp(coeff(i,j).type, 'quadratic')
            K = coeff(i,j).const;
            L = coeff(i,j).linear;
            Q = coeff(i,j).quadratic;
            f = sprintf('0 = %g + %g*x + %g*y + %g*x^2 + %g*x.*y + %g*y.^2',...
                K,L,Q(1,1),Q(1,2)+Q(2,1),Q(2,2));
        else
            K = coeff(i,j).const;
            L = coeff(i,j).linear;
            f = sprintf('0 = %g + %g*x + %g*y', K,L(1),L(2));
        end
        h2 = ezplot(f, [mn(1) mx(1) mn(2) mx(2)]);
        set(h2, 'Color','k', 'LineWidth',2)
    end
end

xlabel('sepal length'), ylabel('sepal width')
title( sprintf('accuracy = %.2f%%', 100*(1-sum(bad)/numInst)) )

hold off

classification borders withquadratic discriminant function

Consider this classification problem (using the Iris dataset):

points scatter plot

As you can see, except for easily separable clusters for which you know the equation of the boundary beforehand, finding the boundary is not a trivial task...

One idea is to use the discriminant analysis function classify to find the boundary (you have a choice between linear and quadratic boundary).

The following is a complete example to illustrate the procedure. The code requires the Statistics Toolbox:

%# load Iris dataset (make it binary-class with 2 features)
load fisheriris
data = meas(:,1:2);
labels = species;
labels(~strcmp(labels,'versicolor')) = {'non-versicolor'};

NUM_K = numel(unique(labels));      %# number of classes
numInst = size(data,1);             %# number of instances

%# visualize data
figure(1)
gscatter(data(:,1), data(:,2), labels, 'rb', '*o', ...
    10, 'on', 'sepal length', 'sepal width')
title('Iris dataset'), box on, axis tight

%# params
classifierType = 'quadratic';       %# 'quadratic', 'linear'
npoints = 100;
clrLite = [1 0.6 0.6 ; 0.6 1 0.6 ; 0.6 0.6 1];
clrDark = [0.7 0 0 ; 0 0.7 0 ; 0 0 0.7];

%# discriminant analysis
%# classify the grid space of these two dimensions
mn = min(data); mx = max(data);
[X,Y] = meshgrid( linspace(mn(1),mx(1),npoints) , linspace(mn(2),mx(2),npoints) );
X = X(:); Y = Y(:);
[C,err,P,logp,coeff] = classify([X Y], data, labels, classifierType);

%# find incorrectly classified training data
[CPred,err] = classify(data, data, labels, classifierType);
bad = ~strcmp(CPred,labels);

%# plot grid classification color-coded
figure(2), hold on
image(X, Y, reshape(grp2idx(C),npoints,npoints))
axis xy, colormap(clrLite)

%# plot data points (correctly and incorrectly classified)
gscatter(data(:,1), data(:,2), labels, clrDark, '.', 20, 'on');

%# mark incorrectly classified data
plot(data(bad,1), data(bad,2), 'kx', 'MarkerSize',10)
axis([mn(1) mx(1) mn(2) mx(2)])

%# draw decision boundaries between pairs of clusters
for i=1:NUM_K
    for j=i+1:NUM_K
        if strcmp(coeff(i,j).type, 'quadratic')
            K = coeff(i,j).const;
            L = coeff(i,j).linear;
            Q = coeff(i,j).quadratic;
            f = sprintf('0 = %g + %g*x + %g*y + %g*x^2 + %g*x.*y + %g*y.^2',...
                K,L,Q(1,1),Q(1,2)+Q(2,1),Q(2,2));
        else
            K = coeff(i,j).const;
            L = coeff(i,j).linear;
            f = sprintf('0 = %g + %g*x + %g*y', K,L(1),L(2));
        end
        h2 = ezplot(f, [mn(1) mx(1) mn(2) mx(2)]);
        set(h2, 'Color','k', 'LineWidth',2)
    end
end

xlabel('sepal length'), ylabel('sepal width')
title( sprintf('accuracy = %.2f%%', 100*(1-sum(bad)/numInst)) )

hold off

classification boundaries with quadratic discriminant function

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文