one 'good' advice (:P) I can give you is that (based on my experience) it is NOT a good idea to treat time similar to spatial features. So beware of solutions that do this. You probably can start with searching the literature in outlier detection for time-series data.
除此之外,只需阅读一些文献即可。例如,已知 k 均值存在异常值问题。另一方面,DBSCAN 设计用于具有“噪声”(DBSCAN 中的 N)的数据,这些数据本质上是异常值。
尽管如此,您表示数据的方式仍然会让这些都不能很好地发挥作用。
You really should use a different repesentation for your data.
Why don't you use an actual outlier detection method, if you want to detect outliers?
Other than that, just read through some literature. k-means for example is known to have problems with outliers. DBSCAN on the other hand is designed to be used on data with "Noise" (the N in DBSCAN), which essentially are outliers.
Still, the way you are representing your data will make none of these work very well.
使用预测包的 auto.arima 函数导出最适合您的数据的模型,并将这些变量与您的数据一起传递以检测 AO 和检测 TSA 功能的 IO。这些函数将弹出数据中存在的任何异常值及其时间索引。
R 也很容易与其他应用程序集成,或者只是简单地运行批处理作业......希望有所帮助......
You should use time series based outlier detection method because of the nature of your data (it has its own seasonality, trend, autocorrelation etc.). Time series based outliers are of different kinds (AO, IO etc.) and it's kind of complicated but there are applications which make it easy to implement.
Download the latest build of R from http://cran.r-project.org/. Install the packages "forecast" & "TSA".
Use the auto.arima function of forecast package to derive the best model fit for your data amd pass on those variables along with your data to detectAO & detectIO of TSA functions. These functions will pop up any outlier which is present in the data with their time indexes.
R is also easy to integrate with other applications or just simply run a batch job ....Hope that helps...
发布评论
评论(3)
我可以给你的一个“好”建议(:P)是(根据我的经验)将时间视为类似于空间特征并不是一个好主意。因此,请注意执行此操作的解决方案。您可能可以从搜索时间序列数据异常值检测的文献开始。
one 'good' advice (:P) I can give you is that (based on my experience) it is NOT a good idea to treat time similar to spatial features. So beware of solutions that do this. You probably can start with searching the literature in outlier detection for time-series data.
您确实应该对数据使用不同的表示方式。
如果您想检测异常值,为什么不使用实际的异常值检测方法呢?
除此之外,只需阅读一些文献即可。例如,已知 k 均值存在异常值问题。另一方面,DBSCAN 设计用于具有“噪声”(DBSCAN 中的 N)的数据,这些数据本质上是异常值。
尽管如此,您表示数据的方式仍然会让这些都不能很好地发挥作用。
You really should use a different repesentation for your data.
Why don't you use an actual outlier detection method, if you want to detect outliers?
Other than that, just read through some literature. k-means for example is known to have problems with outliers. DBSCAN on the other hand is designed to be used on data with "Noise" (the N in DBSCAN), which essentially are outliers.
Still, the way you are representing your data will make none of these work very well.
由于数据的性质(它有自己的季节性、趋势、自相关等),您应该使用基于时间序列的异常值检测方法。基于时间序列的异常值有不同类型(AO、IO 等),并且有点复杂,但有一些应用程序可以轻松实现。
从 http://cran.r-project.org/ 下载最新版本的 R。安装软件包“forecast”和“forecast” “运输安全管理局”。
使用预测包的 auto.arima 函数导出最适合您的数据的模型,并将这些变量与您的数据一起传递以检测 AO 和检测 TSA 功能的 IO。这些函数将弹出数据中存在的任何异常值及其时间索引。
R 也很容易与其他应用程序集成,或者只是简单地运行批处理作业......希望有所帮助......
You should use time series based outlier detection method because of the nature of your data (it has its own seasonality, trend, autocorrelation etc.). Time series based outliers are of different kinds (AO, IO etc.) and it's kind of complicated but there are applications which make it easy to implement.
Download the latest build of R from http://cran.r-project.org/. Install the packages "forecast" & "TSA".
Use the auto.arima function of forecast package to derive the best model fit for your data amd pass on those variables along with your data to detectAO & detectIO of TSA functions. These functions will pop up any outlier which is present in the data with their time indexes.
R is also easy to integrate with other applications or just simply run a batch job ....Hope that helps...