如何在包括时间序列和离散点变量在内的数据集上执行聚类?
我正在尝试在数据集上执行聚类,包括时间序列(例如传感器录制几秒钟)和离散的有价值变量(例如年龄)。我已经尝试使用PCA来组合原始变量,然后使用标准聚类,该变量有效地解决了具有时间序列和离散值变量的问题。我现在想使用动态时扭曲(DTW)距离执行时间序列聚类,但我不确定如何合并离散的有价值变量。
我的第一个尝试是计算时间序变量的DTW距离,离散变量的欧几里得距离,然后将这些距离组合为单个相似性矩阵。问题是,由于计算DTW的方式(两个时间序列中最佳匹配点之间的所有欧几里得距离的总和),即使在标准化变量后,DTW距离的尺度也比离散变量的尺度大得多。 。如果我然后将群集应用于生成的距离矩阵,那么离散变量将是毫无意义的,在现实世界中并非如此。
我试图在文献中找到类似的例子和所有堆栈中的案例,但我并不是很幸运。我考虑了:
- 将DTW距离缩放到系列的长度,但是对于具有不同长度和初始尝试的时间序列而言,这可能会有些棘手,似乎在时间序列变量中缩小了很多距离。
- 将离散变量转换为一个恒定值的时间序列,但我不确定这也是一个好主意。
有人知道任何例子还是有人有任何聪明的想法?
谢谢
I am trying to perform clustering on a dataset including time series (e.g. sensor recording over a few seconds) and discrete valued variables (e.g. age). I have already tried PCA to combine the original variables and then standard clustering which effectively solves the problem of having time series and discrete valued variables. I would now like to perform time-series clustering using dynamic time warping (DTW) distance but I am not sure how I can incorporate the discrete valued variables.
My first attempt was to calculate DTW distance for the time-series variables, Euclidean distance for the discrete variables and then combine these distances into a single similarity matrix. The issue is that, because of the way DTW is calculated (sum of all the Euclidean distances between optimal matched points in two time series), the scale of the DTW distance is much larger than that of the discrete variables, even after standardising the variables. If I then apply clustering on the resulting distance matrix, the discrete variables would be pretty meaningless, which is not the case in the real world.
I am trying to find similar examples in the literature and cases in all the Stacks but I've not been very lucky. I thought about:
- scaling the DTW distance by the length of the series but that can be a bit tricky with time series with different lengths and on initial attempts, it seems it shrinks the distance in the time series variables a lot.
- converting the discrete variable into a time series of constant values but I am not sure this is a great idea either.
Does anyone know of any examples or has anyone got any clever ideas?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您应该能够利用任何通用股票股票分析来获得所需的东西。这是一个链接,显示了简单的时间序列分析库存数据以及一些聚类练习。
https://github.com/ash-wicus/notebooks/blob/master/clustering%20-phistorical%20-20stock%20prices.ipynb
You should be able to leverage any generic stock ticker analysis to get what you want. Here is a link that shows a simple time series analysis of stock data, as well as a few clustering exercises.
https://github.com/ASH-WICUS/Notebooks/blob/master/Clustering%20-%20Historical%20Stock%20Prices.ipynb