聚类和 matlab

发布于 2024-12-09 10:00:50 字数 2480 浏览 6 评论 0原文

我正在尝试对 KDD 1999 cup 数据集中的一些数据进行聚类，

文件的输出如下所示：

0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.

具有该格式的 48000 条不同记录。我已经清理了数据并删除了文本，只保留了数字。现在的输出如下所示：

在此处输入图像描述

我在 excel 中创建了一个逗号分隔的文件，然后另存为 csv 文件从matlab中的csv文件创建了一个数据源，我尝试通过matlab中的fcm工具箱运行它（findcluster输出38种数据类型，预计有38列）。

然而，这些集群看起来不像集群，或者它不接受并按照我需要的方式工作。

有人可以帮忙找到集群吗？我是 matlab 新手，所以没有任何经验，而且我对聚类也很陌生。

方法：

选择簇数 (K)
初始化质心（从数据集中随机选择 K 个模式）
将每个模式分配给距离最近的质心的簇
计算每个簇的平均值作为其新的质心
重复步骤 3，直到满足停止条件（没有模式移动到另一个集群）

这就是我想要实现的目标：

在此处输入图像描述

这就是我的目标我得到：

在此处输入图像描述

load kddcup1.dat
plot(kddcup1(:,1),kddcup1(:,2),'o')  
[center,U,objFcn] = fcm(kddcup1,2);
Iteration count = 1, obj. fcn = 253224062681230720.000000
Iteration count = 2, obj. fcn = 241493132059137410.000000
Iteration count = 3, obj. fcn = 241484544542298110.000000
Iteration count = 4, obj. fcn = 241439204971005280.000000
Iteration count = 5, obj. fcn = 241090628742523840.000000
Iteration count = 6, obj. fcn = 239363408546874750.000000
Iteration count = 7, obj. fcn = 238580863900727680.000000
Iteration count = 8, obj. fcn = 238346826370420990.000000
Iteration count = 9, obj. fcn = 237617756429912510.000000
Iteration count = 10, obj. fcn = 226364785036628320.000000
Iteration count = 11, obj. fcn = 94590774984961184.000000
Iteration count = 12, obj. fcn = 2220521449216102.500000
Iteration count = 13, obj. fcn = 2220521273191876.200000
Iteration count = 14, obj. fcn = 2220521273191876.700000
Iteration count = 15, obj. fcn = 2220521273191876.700000

figure
plot(objFcn)
title('Objective Function Values')
xlabel('Iteration Count')
ylabel('Objective Function Value')

    maxU = max(U);
    index1 = find(U(1, :) == maxU);
    index2 = find(U(2, :) == maxU);
    figure
    line(kddcup1(index1, 1), kddcup1(index1, 2), 'linestyle',...
    'none','marker', 'o','color','g');
    line(kddcup1(index2,1),kddcup1(index2,2),'linestyle',...
    'none','marker', 'x','color','r');
    hold on
    plot(center(1,1),center(1,2),'ko','markersize',15,'LineWidth',2)
    plot(center(2,1),center(2,2),'kx','markersize',15,'LineWidth',2)

原文

I'm trying to cluster some data I have from the KDD 1999 cup dataset

the output from the file looks like this:

0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.

with 48 thousand different records in that format. I have cleaned the data up and removed the text keeping only the numbers. The output looks like this now:

enter image description here

I created a comma delimited file in excel and saved as a csv file then created a data source from the csv file in matlab, ive tryed running it through the fcm toolbox in matlab (findcluster outputs 38 data types which is expected with 38 columns).

The clusters however don't look like clusters or its not accepting and working the way I need it to.

Could anyone help finding the clusters? Im new to matlab so don't have any experience and I'm also new to clustering.

The method:

Chose number of clusters (K)
Initialize centroids (K patterns randomly chosen from data set)
Assign each pattern to the cluster with closest centroid
Calculate means of each cluster to be its new centroid
Repeat step 3 until a stopping criteria is met (no pattern move to another cluster)

This is what I'm trying to achieve:

enter image description here

This is what I'm getting:

enter image description here

load kddcup1.dat
plot(kddcup1(:,1),kddcup1(:,2),'o')  
[center,U,objFcn] = fcm(kddcup1,2);
Iteration count = 1, obj. fcn = 253224062681230720.000000
Iteration count = 2, obj. fcn = 241493132059137410.000000
Iteration count = 3, obj. fcn = 241484544542298110.000000
Iteration count = 4, obj. fcn = 241439204971005280.000000
Iteration count = 5, obj. fcn = 241090628742523840.000000
Iteration count = 6, obj. fcn = 239363408546874750.000000
Iteration count = 7, obj. fcn = 238580863900727680.000000
Iteration count = 8, obj. fcn = 238346826370420990.000000
Iteration count = 9, obj. fcn = 237617756429912510.000000
Iteration count = 10, obj. fcn = 226364785036628320.000000
Iteration count = 11, obj. fcn = 94590774984961184.000000
Iteration count = 12, obj. fcn = 2220521449216102.500000
Iteration count = 13, obj. fcn = 2220521273191876.200000
Iteration count = 14, obj. fcn = 2220521273191876.700000
Iteration count = 15, obj. fcn = 2220521273191876.700000

figure
plot(objFcn)
title('Objective Function Values')
xlabel('Iteration Count')
ylabel('Objective Function Value')

    maxU = max(U);
    index1 = find(U(1, :) == maxU);
    index2 = find(U(2, :) == maxU);
    figure
    line(kddcup1(index1, 1), kddcup1(index1, 2), 'linestyle',...
    'none','marker', 'o','color','g');
    line(kddcup1(index2,1),kddcup1(index2,2),'linestyle',...
    'none','marker', 'x','color','r');
    hold on
    plot(center(1,1),center(1,2),'ko','markersize',15,'LineWidth',2)
    plot(center(2,1),center(2,2),'kx','markersize',15,'LineWidth',2)

分享到QQ

分享到微博