C# 时间序列的移动中位数计算SortedList- 提高性能?

发布于 2024-10-21 00:54:32 字数 3720 浏览 9 评论 0原文

我有一种计算时间序列的移动中值的方法。与移动平均线一样,它使用固定的窗口或周期(有时称为回顾周期)。 如果周期为 10,它将创建前 10 个值(0-9)的数组,然后找到它们的中值。它将重复此操作,将窗口增加 1 步(现在值为 1-10),依此类推......因此是其移动部分。这个过程与移动平均线完全相同。

通过以下方式找到中值:

  1. 对数组的值进行排序
  2. 如果数组中的值数量为奇数,则取中间值。由 5 个值组成的排序数组的中位数将是第三个值。
  3. 如果数组中有偶数个值,则取中间两侧的两个值并对其进行平均。 6 个值的排序数组的中位数为 (2nd + 3rd) / 2。

我创建了一个函数,通过填充 List 并调用 List<> 来计算此值;.Sort(),然后找到合适的值。

计算它是正确的,但我想知道是否有办法提高这个计算的性能。也许通过在 double[] 上手动排序而不是使用列表。

我的实现如下:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Moving_Median_TimeSeries
{
    class Program
    {
        static void Main(string[] args)
        {
            // created a a sample test time series of 10 days
            DateTime Today = DateTime.Now;
            var TimeSeries = new SortedList<DateTime, double>();
            for (int i = 0; i < 10; i++)
                TimeSeries.Add(Today.AddDays(i), i * 10);

            // write out the time series
            Console.WriteLine("Our time series contains...");
            foreach (var item in TimeSeries) 
                Console.WriteLine("   {0}, {1}", item.Key.ToShortDateString(), item.Value);

            // calculate an even period moving median 
            int period = 6;
            var TimeSeries_MovingMedian = MovingMedian(TimeSeries, period);

            // write out the result of the calculation
            Console.WriteLine("\nThe moving median time series of {0} periods contains...", period);
            foreach (var item in TimeSeries_MovingMedian)
                Console.WriteLine("   {0}, {1}", item.Key.ToShortDateString(), item.Value);

            // calculate an odd period moving median 
            int period2 = 5;
            var TimeSeries_MovingMedian2 = MovingMedian(TimeSeries, period);

            // write out the result of the calculation
            Console.WriteLine("\nThe moving median time series of {0} periods contains...", period2);
            foreach (var item in TimeSeries_MovingMedian2)
                Console.WriteLine("   {0}, {1}", item.Key.ToShortDateString(), item.Value);
        }

        public static SortedList<DateTime, double> MovingMedian(SortedList<DateTime, double> TimeSeries, int period)
        {
            var result = new SortedList<DateTime, double>();

            for (int i = 0; i < TimeSeries.Count(); i++)
            {
                if (i >= period - 1)
                {
                    // add all of the values used in the calc to a list... 
                    var values = new List<double>();
                    for (int x = i; x > i - period; x--)
                        values.Add(TimeSeries.Values[x]);

                    // ... and then sort the list <- there might be a better way than this
                    values.Sort();

                    // If there is an even number of values in the array (example 10 values), take the two mid values
                    // and average them.  i.e. 10 values = (5th value + 6th value) / 2. 
                    double median;
                    if (period % 2 == 0) // is any even number
                        median = (values[(int)(period / 2)] + values[(int)(period / 2 - 1)]) / 2;
                    else // is an odd period
                    // Median equals the middle value of the sorted array, if there is an odd number of values in the array
                        median = values[(int)(period / 2 + 0.5)];

                    result.Add(TimeSeries.Keys[i], median);
                }
            }
            return result;
        }

    }
}

I have a method that calculates the moving median value of a time series. Like a moving average, it use a fixed window or period (sometimes referred to as the look back period).
If the period is 10, it will created an array of the first 10 values (0-9), then find the median value of them. It will repeat this, incrementing the window by 1 step (values 1-10 now) and so on... hence the moving part of this. This is process is exactly the same as a moving average.

The median value is found by:

  1. Sorting the values of an array
  2. If there is an odd number of values in the array, take the mid value. The median of a sorted array of 5 values would be the 3rd value.
  3. If there is an even number of values in the array, take the two values on each side of the mid and average them. The median of a sorted array of 6 values would be the (2nd + 3rd) / 2.

I have created a function that calculates this by populating a List<double>, calling List<>.Sort(), and then finding the appropriate values.

Computational it is correct, but I was wonder if there ws a way to improve the performance of this calculation. Perhaps by hand-rolling a sort on an double[] rather than using a list.

My implementation is as follows:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Moving_Median_TimeSeries
{
    class Program
    {
        static void Main(string[] args)
        {
            // created a a sample test time series of 10 days
            DateTime Today = DateTime.Now;
            var TimeSeries = new SortedList<DateTime, double>();
            for (int i = 0; i < 10; i++)
                TimeSeries.Add(Today.AddDays(i), i * 10);

            // write out the time series
            Console.WriteLine("Our time series contains...");
            foreach (var item in TimeSeries) 
                Console.WriteLine("   {0}, {1}", item.Key.ToShortDateString(), item.Value);

            // calculate an even period moving median 
            int period = 6;
            var TimeSeries_MovingMedian = MovingMedian(TimeSeries, period);

            // write out the result of the calculation
            Console.WriteLine("\nThe moving median time series of {0} periods contains...", period);
            foreach (var item in TimeSeries_MovingMedian)
                Console.WriteLine("   {0}, {1}", item.Key.ToShortDateString(), item.Value);

            // calculate an odd period moving median 
            int period2 = 5;
            var TimeSeries_MovingMedian2 = MovingMedian(TimeSeries, period);

            // write out the result of the calculation
            Console.WriteLine("\nThe moving median time series of {0} periods contains...", period2);
            foreach (var item in TimeSeries_MovingMedian2)
                Console.WriteLine("   {0}, {1}", item.Key.ToShortDateString(), item.Value);
        }

        public static SortedList<DateTime, double> MovingMedian(SortedList<DateTime, double> TimeSeries, int period)
        {
            var result = new SortedList<DateTime, double>();

            for (int i = 0; i < TimeSeries.Count(); i++)
            {
                if (i >= period - 1)
                {
                    // add all of the values used in the calc to a list... 
                    var values = new List<double>();
                    for (int x = i; x > i - period; x--)
                        values.Add(TimeSeries.Values[x]);

                    // ... and then sort the list <- there might be a better way than this
                    values.Sort();

                    // If there is an even number of values in the array (example 10 values), take the two mid values
                    // and average them.  i.e. 10 values = (5th value + 6th value) / 2. 
                    double median;
                    if (period % 2 == 0) // is any even number
                        median = (values[(int)(period / 2)] + values[(int)(period / 2 - 1)]) / 2;
                    else // is an odd period
                    // Median equals the middle value of the sorted array, if there is an odd number of values in the array
                        median = values[(int)(period / 2 + 0.5)];

                    result.Add(TimeSeries.Keys[i], median);
                }
            }
            return result;
        }

    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

摘星┃星的人 2024-10-28 00:54:32

可能有比这更好的方法

你是对的 - 如果你想要的只是中位数,你不需要对整个列表进行排序。请点击此维基百科页面中的链接了解更多信息。

there might be a better way than this

You are right about this - you don't need to sort the whole list if all you want is the median. Follow links from this wikipedia page for more.

丿*梦醉红颜 2024-10-28 00:54:32

对于包含 N 个项目和周期 P 的列表,为每个项目重新排序列表的算法为 O(N * P lgP)。有一个 O(N * lg P) 算法,它使用 2 个

它使用包含高于中位数的 P/2 个项目的最小堆,以及包含小于或等于中位数的 PP/2 个项目的最大堆。每当您获得新的数据项时,请用新的数据项替换最旧的数据项,然后进行向上筛选或向下筛选以将其移动到正确的位置。如果新项目到达任一堆的根,请将其与另一个堆的根进行比较,并在需要时交换和筛选。对于奇数 P,中位数位于最大堆的根部。对于偶数 P,它是两个根的平均值。

这个问题中有一个c实现。实施过程中一个棘手的部分是
有效地跟踪最旧的项目。该部分的开销可能会使速度增益对于小 P 来说微不足道。

For a list of N items and a period P, your algorithm which re-sorts the list for every item is O(N * P lgP). There is an O(N * lg P) algorithm, which uses 2 heaps.

It uses a min-heap which contains P/2 items above the median, and a max-heap with the P-P/2 items less than or equal to it. Whenever you get a new data item, replace the oldest item with the new one, then do a sift-up or sift-down to move it to the correct place. If the new item reaches the root of either heap, compare it to the root of the other and swap and sift-down when needed. For odd P, the median is at the root of the max-heap. For even P, it is the average of both roots.

There is a c implementation in this question. One tricky part in implementing it is
tracking the oldest item efficiently. The overhead in that part may make the speed gains insignificant for small P.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文