如何对时间序列数据进行 K 均值聚类？

发布于 2024-09-15 02:22:51 字数 1311 浏览 9 评论 0 原文

如何对时间序列数据进行 K 均值聚类？我了解当输入数据是一组点时这是如何工作的，但我不知道如何使用 1XM 对时间序列进行聚类，其中 M 是数据长度。特别是，我不确定如何更新时间序列数据的集群平均值。

我有一组带标签的时间序列，我想使用 K-means 算法来检查是否会得到相似的标签。我的 X 矩阵将是 NXM，其中 N 是时间序列的数量，M 是数据长度，如上所述。

有谁知道该怎么做？例如，我如何修改这个 k-means MATLAB 代码以便它适用于时间序列数据？另外，我希望能够使用欧几里德距离之外的不同距离度量。

为了更好地说明我的疑问，这是我针对时间序列数据修改的代码：

% Check if second input is centroids
if ~isscalar(k) 
    c=k;
    k=size(c,1);
else
    c=X(ceil(rand(k,1)*n),:); % assign centroid randomly at start
end

% allocating variables
g0=ones(n,1); 
gIdx=zeros(n,1);
D=zeros(n,k);

% Main loop converge if previous partition is the same as current
while any(g0~=gIdx)
%     disp(sum(g0~=gIdx))
    g0=gIdx;
    % Loop for each centroid
    for t=1:k
        %  d=zeros(n,1);
        % Loop for each dimension
        for s=1:n
            D(s,t) = sqrt(sum((X(s,:)-c(t,:)).^2)); 
        end
    end
    % Partition data to closest centroids
    [z,gIdx]=min(D,[],2);
    % Update centroids using means of partitions
    for t=1:k

        % Is this how we calculate new mean of the time series?
        c(t,:)=mean(X(gIdx==t,:));

    end
end

原文

How can I do K-means clustering of time series data?
I understand how this works when the input data is a set of points, but I don't know how to cluster a time series with 1XM, where M is the data length. In particular, I'm not sure how to update the mean of the cluster for time series data.

I have a set of labelled time series, and I want to use the K-means algorithm to check whether I will get back a similar label or not. My X matrix will be N X M, where N is number of time series and M is data length as mentioned above.

Does anyone know how to do this? For example, how could I modify this k-means MATLAB code so that it would work for time series data? Also, I would like to be able to use different distance metrics besides Euclidean distance.

To better illustrate my doubts, here is the code I modified for time series data:

% Check if second input is centroids
if ~isscalar(k) 
    c=k;
    k=size(c,1);
else
    c=X(ceil(rand(k,1)*n),:); % assign centroid randomly at start
end

% allocating variables
g0=ones(n,1); 
gIdx=zeros(n,1);
D=zeros(n,k);

% Main loop converge if previous partition is the same as current
while any(g0~=gIdx)
%     disp(sum(g0~=gIdx))
    g0=gIdx;
    % Loop for each centroid
    for t=1:k
        %  d=zeros(n,1);
        % Loop for each dimension
        for s=1:n
            D(s,t) = sqrt(sum((X(s,:)-c(t,:)).^2)); 
        end
    end
    % Partition data to closest centroids
    [z,gIdx]=min(D,[],2);
    % Update centroids using means of partitions
    for t=1:k

        % Is this how we calculate new mean of the time series?
        c(t,:)=mean(X(gIdx==t,:));

    end
end

分享到QQ

分享到微博