高效多时间序列分析的最佳实践

发布于 2024-12-11 01:41:59 字数 412 浏览 0 评论 0原文

我有大量时间序列(> 100),它们的采样频率和可用时间段不同。每个时间序列都需要进行单位根检验、季节调整等初步的数据转换和检查等。

由于需要定期检查大量的序列,有什么办法可以高效地做到这一点呢?关注的是节省常规方面的时间并跟踪系列和分析结果。例如,该系列的单位根测试是主观的。这种类型的分析有多少可以自动化以及如何自动化?

我已经阅读了有关统计工作流程的问题,建议有一个在每个系列上运行的通用脚本。

我要问一些更具体的问题,并且基于处理多个时间序列数据集的经验。重点更多地是在处理如此多的系列时最大限度地减少错误以及自动化重复任务。

I have a large number of time series (>100) which differ in the sampling frequency and the time period for which they are available. Each time series has to be tested for unit roots and seasonally adjusted and other preliminary data transformations and checking etc.

As a large number of series have to be routinely checked, what is the solution to do it efficiently? The concern is to save time in the routine aspects and keep track of the series and analysis results. Unit root testing of the series for example is something subjective. How much of this type of analysis can be automated and how?

I have already read the questions regarding the statistical workflow which suggests having a common script to run on each series.

I am asking something more specific and based on experience of handling a multiple time series dataset. The focus is more on minimizing errors while dealing with so many series and also automating repetitive tasks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

丶情人眼里出诗心の 2024-12-18 01:42:00

我认为该系列将被独立审查,因为您没有提到模型中的任何相互关系。我不确定您要使用哪种对象或进行哪些测试,但“最佳实践”的基本目标独立于要使用的实际包。

最简单的方法包括将对象加载到列表中,并通过简单的迭代器(例如lapply)或通过多核方法(例如mclapply或foreach)分析每个系列,在 R 中。对于 Matlab,您可以对元胞数组进行操作。并行计算工具箱有一个名为 parfor 的函数,意为“并行 for”,它类似于 R 中的 foreach 函数。就我而言,我建议使用 R因为它更便宜(免费)并且具有更丰富的统计分析功能。 Matlab 拥有更好的文档和帮助工具,但随着时间的推移,随着您对研究工具和方法的更加熟悉(以及参考书架的增加),这些往往变得不那么重要。

一般来说,习惯使用多核工具是件好事,因为这可以大大减少对一堆独立小对象进行分析所需的时间。

I assume the series will be examined independently, as you've not mentioned any inter-relationships in the models. I'm not sure what kind of object you're looking to use or which tests, but the basic goal of "best practices" is independent of the actual package to be used.

The simplest approaches involve loading objects into a list and analyzing each series via simple iterators such as lapply or via multicore methods such as mclapply or foreach, in R. For Matlab, you can operate over cell arrays. The parallel computing toolbox has a function called parfor, for "parallel for", which is similar to the foreach function in R. For my money, I'd recommend using R as it's cheaper (free) and has a much richer functionality for statistical analyses. Matlab has better documentation and help tools, but these tend to matter less over time as you become more familiar with the tools and methods of your research (and as your bookshelf of references grows).

It's good to become accustomed to using multicore tools in general, as this can substantially decrease the time it takes to do analyses on a bunch of independent small objects.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文