C# 时间序列的移动中位数计算SortedList- 提高性能?
我有一种计算时间序列的移动中值的方法。与移动平均线一样,它使用固定的窗口或周期(有时称为回顾周期)。 如果周期为 10,它将创建前 10 个值(0-9)的数组,然后找到它们的中值。它将重复此操作,将窗口增加 1 步(现在值为 1-10),依此类推......因此是其移动部分。这个过程与移动平均线完全相同。
通过以下方式找到中值:
- 对数组的值进行排序
- 如果数组中的值数量为奇数,则取中间值。由 5 个值组成的排序数组的中位数将是第三个值。
- 如果数组中有偶数个值,则取中间两侧的两个值并对其进行平均。 6 个值的排序数组的中位数为 (2nd + 3rd) / 2。
我创建了一个函数,通过填充 List
并调用 List<> 来计算此值;.Sort()
,然后找到合适的值。
计算它是正确的,但我想知道是否有办法提高这个计算的性能。也许通过在 double[] 上手动排序而不是使用列表。
我的实现如下:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Moving_Median_TimeSeries
{
class Program
{
static void Main(string[] args)
{
// created a a sample test time series of 10 days
DateTime Today = DateTime.Now;
var TimeSeries = new SortedList<DateTime, double>();
for (int i = 0; i < 10; i++)
TimeSeries.Add(Today.AddDays(i), i * 10);
// write out the time series
Console.WriteLine("Our time series contains...");
foreach (var item in TimeSeries)
Console.WriteLine(" {0}, {1}", item.Key.ToShortDateString(), item.Value);
// calculate an even period moving median
int period = 6;
var TimeSeries_MovingMedian = MovingMedian(TimeSeries, period);
// write out the result of the calculation
Console.WriteLine("\nThe moving median time series of {0} periods contains...", period);
foreach (var item in TimeSeries_MovingMedian)
Console.WriteLine(" {0}, {1}", item.Key.ToShortDateString(), item.Value);
// calculate an odd period moving median
int period2 = 5;
var TimeSeries_MovingMedian2 = MovingMedian(TimeSeries, period);
// write out the result of the calculation
Console.WriteLine("\nThe moving median time series of {0} periods contains...", period2);
foreach (var item in TimeSeries_MovingMedian2)
Console.WriteLine(" {0}, {1}", item.Key.ToShortDateString(), item.Value);
}
public static SortedList<DateTime, double> MovingMedian(SortedList<DateTime, double> TimeSeries, int period)
{
var result = new SortedList<DateTime, double>();
for (int i = 0; i < TimeSeries.Count(); i++)
{
if (i >= period - 1)
{
// add all of the values used in the calc to a list...
var values = new List<double>();
for (int x = i; x > i - period; x--)
values.Add(TimeSeries.Values[x]);
// ... and then sort the list <- there might be a better way than this
values.Sort();
// If there is an even number of values in the array (example 10 values), take the two mid values
// and average them. i.e. 10 values = (5th value + 6th value) / 2.
double median;
if (period % 2 == 0) // is any even number
median = (values[(int)(period / 2)] + values[(int)(period / 2 - 1)]) / 2;
else // is an odd period
// Median equals the middle value of the sorted array, if there is an odd number of values in the array
median = values[(int)(period / 2 + 0.5)];
result.Add(TimeSeries.Keys[i], median);
}
}
return result;
}
}
}
I have a method that calculates the moving median value of a time series. Like a moving average, it use a fixed window or period (sometimes referred to as the look back period).
If the period is 10, it will created an array of the first 10 values (0-9), then find the median value of them. It will repeat this, incrementing the window by 1 step (values 1-10 now) and so on... hence the moving part of this. This is process is exactly the same as a moving average.
The median value is found by:
- Sorting the values of an array
- If there is an odd number of values in the array, take the mid value. The median of a sorted array of 5 values would be the 3rd value.
- If there is an even number of values in the array, take the two values on each side of the mid and average them. The median of a sorted array of 6 values would be the (2nd + 3rd) / 2.
I have created a function that calculates this by populating a List<double>
, calling List<>.Sort()
, and then finding the appropriate values.
Computational it is correct, but I was wonder if there ws a way to improve the performance of this calculation. Perhaps by hand-rolling a sort on an double[]
rather than using a list.
My implementation is as follows:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Moving_Median_TimeSeries
{
class Program
{
static void Main(string[] args)
{
// created a a sample test time series of 10 days
DateTime Today = DateTime.Now;
var TimeSeries = new SortedList<DateTime, double>();
for (int i = 0; i < 10; i++)
TimeSeries.Add(Today.AddDays(i), i * 10);
// write out the time series
Console.WriteLine("Our time series contains...");
foreach (var item in TimeSeries)
Console.WriteLine(" {0}, {1}", item.Key.ToShortDateString(), item.Value);
// calculate an even period moving median
int period = 6;
var TimeSeries_MovingMedian = MovingMedian(TimeSeries, period);
// write out the result of the calculation
Console.WriteLine("\nThe moving median time series of {0} periods contains...", period);
foreach (var item in TimeSeries_MovingMedian)
Console.WriteLine(" {0}, {1}", item.Key.ToShortDateString(), item.Value);
// calculate an odd period moving median
int period2 = 5;
var TimeSeries_MovingMedian2 = MovingMedian(TimeSeries, period);
// write out the result of the calculation
Console.WriteLine("\nThe moving median time series of {0} periods contains...", period2);
foreach (var item in TimeSeries_MovingMedian2)
Console.WriteLine(" {0}, {1}", item.Key.ToShortDateString(), item.Value);
}
public static SortedList<DateTime, double> MovingMedian(SortedList<DateTime, double> TimeSeries, int period)
{
var result = new SortedList<DateTime, double>();
for (int i = 0; i < TimeSeries.Count(); i++)
{
if (i >= period - 1)
{
// add all of the values used in the calc to a list...
var values = new List<double>();
for (int x = i; x > i - period; x--)
values.Add(TimeSeries.Values[x]);
// ... and then sort the list <- there might be a better way than this
values.Sort();
// If there is an even number of values in the array (example 10 values), take the two mid values
// and average them. i.e. 10 values = (5th value + 6th value) / 2.
double median;
if (period % 2 == 0) // is any even number
median = (values[(int)(period / 2)] + values[(int)(period / 2 - 1)]) / 2;
else // is an odd period
// Median equals the middle value of the sorted array, if there is an odd number of values in the array
median = values[(int)(period / 2 + 0.5)];
result.Add(TimeSeries.Keys[i], median);
}
}
return result;
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你是对的 - 如果你想要的只是中位数,你不需要对整个列表进行排序。请点击此维基百科页面中的链接了解更多信息。
You are right about this - you don't need to sort the whole list if all you want is the median. Follow links from this wikipedia page for more.
对于包含 N 个项目和周期 P 的列表,为每个项目重新排序列表的算法为 O(N * P lgP)。有一个 O(N * lg P) 算法,它使用 2 个 堆 。
它使用包含高于中位数的 P/2 个项目的最小堆,以及包含小于或等于中位数的 PP/2 个项目的最大堆。每当您获得新的数据项时,请用新的数据项替换最旧的数据项,然后进行向上筛选或向下筛选以将其移动到正确的位置。如果新项目到达任一堆的根,请将其与另一个堆的根进行比较,并在需要时交换和筛选。对于奇数 P,中位数位于最大堆的根部。对于偶数 P,它是两个根的平均值。
这个问题中有一个c实现。实施过程中一个棘手的部分是
有效地跟踪最旧的项目。该部分的开销可能会使速度增益对于小 P 来说微不足道。
For a list of N items and a period P, your algorithm which re-sorts the list for every item is O(N * P lgP). There is an O(N * lg P) algorithm, which uses 2 heaps.
It uses a min-heap which contains P/2 items above the median, and a max-heap with the P-P/2 items less than or equal to it. Whenever you get a new data item, replace the oldest item with the new one, then do a sift-up or sift-down to move it to the correct place. If the new item reaches the root of either heap, compare it to the root of the other and swap and sift-down when needed. For odd P, the median is at the root of the max-heap. For even P, it is the average of both roots.
There is a c implementation in this question. One tricky part in implementing it is
tracking the oldest item efficiently. The overhead in that part may make the speed gains insignificant for small P.