LINQ 计算 SortedList的移动平均值

发布于 2024-10-20 01:53:53 字数 1749 浏览 10 评论 0原文

我有一个 SortedList 形式的时间序列。我想计算这个系列的移动平均值。我可以使用简单的 for 循环来做到这一点。我想知道是否有更好的方法使用 linq 来做到这一点。

我的版本：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            var mySeries = new SortedList<DateTime, double>();
            mySeries.Add(new DateTime(2011, 01, 1), 10);
            mySeries.Add(new DateTime(2011, 01, 2), 25);
            mySeries.Add(new DateTime(2011, 01, 3), 30);
            mySeries.Add(new DateTime(2011, 01, 4), 45);
            mySeries.Add(new DateTime(2011, 01, 5), 50);
            mySeries.Add(new DateTime(2011, 01, 6), 65);

            var calcs = new calculations();
            var avg = calcs.MovingAverage(mySeries, 3);
            foreach (var item in avg)
            {
                Console.WriteLine("{0} {1}", item.Key, item.Value);                
            }
        }
    }
    class calculations
    {
        public SortedList<DateTime, double> MovingAverage(SortedList<DateTime, double> series, int period)
        {
            var result = new SortedList<DateTime, double>();

            for (int i = 0; i < series.Count(); i++)
            {
                if (i >= period - 1)
                {
                    double total = 0;
                    for (int x = i; x > (i - period); x--)
                        total += series.Values[x];
                    double average = total / period;
                    result.Add(series.Keys[i], average);  
                }

            }
            return result;
        }
    }
}

原文

I have a time series in the form of a SortedList<dateTime,double>. I would like to calculate a moving average of this series. I can do this using simple for loops. I was wondering if there is a better way to do this using linq.

my version:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            var mySeries = new SortedList<DateTime, double>();
            mySeries.Add(new DateTime(2011, 01, 1), 10);
            mySeries.Add(new DateTime(2011, 01, 2), 25);
            mySeries.Add(new DateTime(2011, 01, 3), 30);
            mySeries.Add(new DateTime(2011, 01, 4), 45);
            mySeries.Add(new DateTime(2011, 01, 5), 50);
            mySeries.Add(new DateTime(2011, 01, 6), 65);

            var calcs = new calculations();
            var avg = calcs.MovingAverage(mySeries, 3);
            foreach (var item in avg)
            {
                Console.WriteLine("{0} {1}", item.Key, item.Value);                
            }
        }
    }
    class calculations
    {
        public SortedList<DateTime, double> MovingAverage(SortedList<DateTime, double> series, int period)
        {
            var result = new SortedList<DateTime, double>();

            for (int i = 0; i < series.Count(); i++)
            {
                if (i >= period - 1)
                {
                    double total = 0;
                    for (int x = i; x > (i - period); x--)
                        total += series.Values[x];
                    double average = total / period;
                    result.Add(series.Keys[i], average);  
                }

            }
            return result;
        }
    }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

狼性发作 2024-10-27 01:53:53

为了实现 O(n) 的渐近性能（如手动编码解决方案所做的那样），您可以使用 Aggregate 函数，如

series.Skip(period-1).Aggregate(
  new {
    Result = new SortedList<DateTime, double>(), 
    Working = List<double>(series.Take(period-1).Select(item => item.Value))
  }, 
  (list, item)=>{
     list.Working.Add(item.Value); 
     list.Result.Add(item.Key, list.Working.Average()); 
     list.Working.RemoveAt(0);
     return list;
  }
).Result;

累积值中的函数（实现为匿名类型）包含两个字段： Result 包含到目前为止构建的结果列表。 Working 包含最后一个 period-1 元素。聚合函数将当前值添加到工作列表中，构建当前平均值并将其添加到结果中，然后从工作列表中删除第一个（即最旧的）值。

“种子”（即累积的起始值）是通过将第一个 period-1 元素放入 Working 并将 Result 初始化为一个空列表。

因此，聚合从元素 period 开始（通过在开头跳过 (period-1) 元素）

在函数式编程中，这是聚合（或 >fold) 函数，顺便说一句。

两点注释：

该解决方案在“功能上”并不干净，因为在每个步骤中都重复使用相同的列表对象（Working 和 Result）。我不确定如果未来的编译器尝试自动并行化聚合函数是否会导致问题（另一方面我也不确定，这是否可能......）。纯功能性解决方案应该在每一步“创建”新列表。

另请注意，C# 缺乏强大的列表表达式。在一些假设的 Python-C# 混合伪代码中，我们可以编写这样的聚合函数

(list, item)=>
  new {
    Result = list.Result + [(item.Key, (list.Working+[item.Value]).Average())], 
    Working=list.Working[1::]+[item.Value]
  }

，在我看来，这样会更优雅:)

In order to achieve an asymptotical performance of O(n) (as the hand-coded solution does), you could use the Aggregate function like in

series.Skip(period-1).Aggregate(
  new {
    Result = new SortedList<DateTime, double>(), 
    Working = List<double>(series.Take(period-1).Select(item => item.Value))
  }, 
  (list, item)=>{
     list.Working.Add(item.Value); 
     list.Result.Add(item.Key, list.Working.Average()); 
     list.Working.RemoveAt(0);
     return list;
  }
).Result;

The accumulated value (implemented as anonymous type) contains two fields: Result contains the result list build up so far. Working contains the last period-1 elements. The aggregate function adds the current value to the Working list, builds the current average and adds it to the result and then removes the first (i.e. oldest) value from the working list.

The "seed" (i.e. the starting value for the accumulation) is build by putting the first period-1 elements into Working and initializing Result to an empty list.

Consequently tha aggregation starts with element period (by skipping (period-1) elements at the beginning)

In functional programming this is a typical usage pattern for the aggretate (or fold) function, btw.

Two remarks:

The solution is not "functionally" clean in that the same list objects (Working and Result) are reused in every step. I'm not sure if that might cause problems if some future compilers try to parallellize the Aggregate function automatically (on the other hand I'm also not sure, if that's possible after all...). A purely functional solution should "create" new lists at every step.

Also note that C# lacks powerful list expressions. In some hypothetical Python-C#-mixed pseudocode one could write the aggregation function like

(list, item)=>
  new {
    Result = list.Result + [(item.Key, (list.Working+[item.Value]).Average())], 
    Working=list.Working[1::]+[item.Value]
  }

which would be a bit more elegant in my humble opinion :)

回复收藏 0 原文

吻风 2024-10-27 01:53:53

对于使用 LINQ 计算移动平均线的最有效方法，您不应该使用 LINQ！

相反，我建议创建一个辅助类，它以最有效的方式计算移动平均值（使用循环缓冲区和因果移动平均滤波器），然后是一个扩展方法它可以通过 LINQ 访问。

首先，移动平均线

public class MovingAverage
{
    private readonly int _length;
    private int _circIndex = -1;
    private bool _filled;
    private double _current = double.NaN;
    private readonly double _oneOverLength;
    private readonly double[] _circularBuffer;
    private double _total;

    public MovingAverage(int length)
    {
        _length = length;
        _oneOverLength = 1.0 / length;
        _circularBuffer = new double[length];
    }       

    public MovingAverage Update(double value)
    {
        double lostValue = _circularBuffer[_circIndex];
        _circularBuffer[_circIndex] = value;

        // Maintain totals for Push function
        _total += value;
        _total -= lostValue;

        // If not yet filled, just return. Current value should be double.NaN
        if (!_filled)
        {
            _current = double.NaN;
            return this;
        }

        // Compute the average
        double average = 0.0;
        for (int i = 0; i < _circularBuffer.Length; i++)
        {
            average += _circularBuffer[i];
        }

        _current = average * _oneOverLength;

        return this;
    }

    public MovingAverage Push(double value)
    {
        // Apply the circular buffer
        if (++_circIndex == _length)
        {
            _circIndex = 0;
        }

        double lostValue = _circularBuffer[_circIndex];
        _circularBuffer[_circIndex] = value;

        // Compute the average
        _total += value;
        _total -= lostValue;

        // If not yet filled, just return. Current value should be double.NaN
        if (!_filled && _circIndex != _length - 1)
        {
            _current = double.NaN;
            return this;
        }
        else
        {
            // Set a flag to indicate this is the first time the buffer has been filled
            _filled = true;
        }

        _current = _total * _oneOverLength;

        return this;
    }

    public int Length { get { return _length; } }
    public double Current { get { return _current; } }
}

该类提供了一个非常快速且轻量级的 MovingAverage 过滤器实现。它创建一个长度为 N 的循环缓冲区，并对每个附加数据点计算一次加法、一次减法和一次乘法，这与暴力实现中每个点计算 N 次乘法加法相反。

接下来，对其进行 LINQ 化！

internal static class MovingAverageExtensions
{
    public static IEnumerable<double> MovingAverage<T>(this IEnumerable<T> inputStream, Func<T, double> selector, int period)
    {
        var ma = new MovingAverage(period);
        foreach (var item in inputStream)
        {
            ma.Push(selector(item));
            yield return ma.Current;
        }
    }

    public static IEnumerable<double> MovingAverage(this IEnumerable<double> inputStream, int period)
    {
        var ma = new MovingAverage(period);
        foreach (var item in inputStream)
        {
            ma.Push(item);
            yield return ma.Current;
        }
    }
}

上述扩展方法包装 MovingAverage 类并允许插入到 IEnumerable 流中。

现在就使用它！

int period = 50;

// Simply filtering a list of doubles
IEnumerable<double> inputDoubles;
IEnumerable<double> outputDoubles = inputDoubles.MovingAverage(period);   

// Or, use a selector to filter T into a list of doubles
IEnumerable<Point> inputPoints; // assuming you have initialised this
IEnumerable<double> smoothedYValues = inputPoints.MovingAverage(pt => pt.Y, period);

For the most efficient way possible to compute a Moving Average with LINQ, you shouldn't use LINQ!

Instead I propose creating a helper class which computes a moving average in the most efficient way possible (using a circular buffer and causal moving average filter), then an extension method to make it accessible to LINQ.

First up, the moving average

public class MovingAverage
{
    private readonly int _length;
    private int _circIndex = -1;
    private bool _filled;
    private double _current = double.NaN;
    private readonly double _oneOverLength;
    private readonly double[] _circularBuffer;
    private double _total;

    public MovingAverage(int length)
    {
        _length = length;
        _oneOverLength = 1.0 / length;
        _circularBuffer = new double[length];
    }       

    public MovingAverage Update(double value)
    {
        double lostValue = _circularBuffer[_circIndex];
        _circularBuffer[_circIndex] = value;

        // Maintain totals for Push function
        _total += value;
        _total -= lostValue;

        // If not yet filled, just return. Current value should be double.NaN
        if (!_filled)
        {
            _current = double.NaN;
            return this;
        }

        // Compute the average
        double average = 0.0;
        for (int i = 0; i < _circularBuffer.Length; i++)
        {
            average += _circularBuffer[i];
        }

        _current = average * _oneOverLength;

        return this;
    }

    public MovingAverage Push(double value)
    {
        // Apply the circular buffer
        if (++_circIndex == _length)
        {
            _circIndex = 0;
        }

        double lostValue = _circularBuffer[_circIndex];
        _circularBuffer[_circIndex] = value;

        // Compute the average
        _total += value;
        _total -= lostValue;

        // If not yet filled, just return. Current value should be double.NaN
        if (!_filled && _circIndex != _length - 1)
        {
            _current = double.NaN;
            return this;
        }
        else
        {
            // Set a flag to indicate this is the first time the buffer has been filled
            _filled = true;
        }

        _current = _total * _oneOverLength;

        return this;
    }

    public int Length { get { return _length; } }
    public double Current { get { return _current; } }
}

This class provides a very fast and lightweight implementation of a MovingAverage filter. It creates a circular buffer of Length N and computes one add, one subtract and one multiply per data-point appended, as opposed to the N multiply-adds per point for the brute force implementation.

Next, to LINQ-ify it!

internal static class MovingAverageExtensions
{
    public static IEnumerable<double> MovingAverage<T>(this IEnumerable<T> inputStream, Func<T, double> selector, int period)
    {
        var ma = new MovingAverage(period);
        foreach (var item in inputStream)
        {
            ma.Push(selector(item));
            yield return ma.Current;
        }
    }

    public static IEnumerable<double> MovingAverage(this IEnumerable<double> inputStream, int period)
    {
        var ma = new MovingAverage(period);
        foreach (var item in inputStream)
        {
            ma.Push(item);
            yield return ma.Current;
        }
    }
}

The above extension methods wrap the MovingAverage class and allow insertion into an IEnumerable stream.

Now to use it!

int period = 50;

// Simply filtering a list of doubles
IEnumerable<double> inputDoubles;
IEnumerable<double> outputDoubles = inputDoubles.MovingAverage(period);   

// Or, use a selector to filter T into a list of doubles
IEnumerable<Point> inputPoints; // assuming you have initialised this
IEnumerable<double> smoothedYValues = inputPoints.MovingAverage(pt => pt.Y, period);

回复收藏 0 原文

合久必婚 2024-10-27 01:53:53

您已经有了一个答案，告诉您如何可以使用 LINQ，但坦率地说，我不会在这里使用 LINQ，因为与您当前的解决方案相比，它很可能表现不佳，而且您现有的代码已经很清晰了。

不过，您可以保留运行总计并在每次迭代时进行调整，而不是在每个步骤中计算之前 period 元素的总和。也就是说，将以下内容更改

total = 0;
for (int x = i; x > (i - period); x--)
    total += series.Values[x];

为：

if (i >= period) {
    total -= series.Values[i - period];
}
total += series.Values[i];

这将意味着无论 period 的大小如何，您的代码都将花费相同的时间来执行。

You already have an answer showing you how you can use LINQ but frankly I wouldn't use LINQ here as it will most likely perform poorly compared to your current solution and your existing code already is clear.

However instead of calculating the total of the previous period elements on every step, you can keep a running total and adjust it on each iteration. That is, change this:

total = 0;
for (int x = i; x > (i - period); x--)
    total += series.Values[x];

to this:

if (i >= period) {
    total -= series.Values[i - period];
}
total += series.Values[i];

This will mean that your code will take the same amount of time to execute regardless of the size of period.

回复收藏 0 原文

情绪失控 2024-10-27 01:53:53

该块

double total = 0;
for (int x = i; x > (i - period); x--)
    total += series.Values[x];
double average = total / period;

可以重写为：

double average = series.Values.Skip(i - period + 1).Take(period).Sum() / period;

您的方法可能如下所示：

series.Skip(period - 1)
    .Select((item, index) =>
        new 
        {
            item.Key,            
            series.Values.Skip(index).Take(period).Sum() / period
        });

正如您所看到的，linq 非常具有表现力。我建议从一些教程开始，例如 LINQ 简介和 101 个 LINQ 示例。

This block

double total = 0;
for (int x = i; x > (i - period); x--)
    total += series.Values[x];
double average = total / period;

can be rewritten as:

double average = series.Values.Skip(i - period + 1).Take(period).Sum() / period;

Your method may look like:

series.Skip(period - 1)
    .Select((item, index) =>
        new 
        {
            item.Key,            
            series.Values.Skip(index).Take(period).Sum() / period
        });

As you can see, linq is very expressive. I recommend to start with some tutorial like Introducing LINQ and 101 LINQ Samples.

回复收藏 0 原文

九歌凝 2024-10-27 01:53:53

要以更实用的方式执行此操作，您需要一个存在于 Rx 中但不存在于 LINQ 中的 Scan 方法。

让我们看看如果我们有一个扫描方法，它会是什么样子

var delta = 3;
var series = new [] {1.1, 2.5, 3.8, 4.8, 5.9, 6.1, 7.6};

var seed = series.Take(delta).Average();
var smas = series
    .Skip(delta)
    .Zip(series, Tuple.Create)
    .Scan(seed, (sma, values)=>sma - (values.Item2/delta) + (values.Item1/delta));
smas = Enumerable.Repeat(0.0, delta-1).Concat(new[]{seed}).Concat(smas);

这是扫描方法，取自并调整自此处：

public static IEnumerable<TAccumulate> Scan<TSource, TAccumulate>(
    this IEnumerable<TSource> source,
    TAccumulate seed,
    Func<TAccumulate, TSource, TAccumulate> accumulator
)
{
    if (source == null) throw new ArgumentNullException("source");
    if (seed == null) throw new ArgumentNullException("seed");
    if (accumulator == null) throw new ArgumentNullException("accumulator");

    using (var i = source.GetEnumerator())
    {
        if (!i.MoveNext())
        {
            throw new InvalidOperationException("Sequence contains no elements");
        }
        var acc = accumulator(seed, i.Current);

        while (i.MoveNext())
        {
            yield return acc;
            acc = accumulator(acc, i.Current);
        }
        yield return acc;
    }
}

这应该比具有更好的性能暴力法，因为我们使用累计来计算 SMA。

这里发生了什么？

首先，我们需要计算第一个周期，我们在此将其称为种子。然后，我们根据累积的种子值计算每个后续值。为此，我们需要旧值（即 t-delta）和最新值，我们将序列压缩在一起，一次从头开始，一次按 delta 移动。

最后，我们通过为第一个周期的长度添加零并添加初始种子值来进行一些清理。

To do this in a more functional way, you'd need a Scan method which exists in Rx but not in LINQ.

Let's look how it would look like if we'd have a scan method

var delta = 3;
var series = new [] {1.1, 2.5, 3.8, 4.8, 5.9, 6.1, 7.6};

var seed = series.Take(delta).Average();
var smas = series
    .Skip(delta)
    .Zip(series, Tuple.Create)
    .Scan(seed, (sma, values)=>sma - (values.Item2/delta) + (values.Item1/delta));
smas = Enumerable.Repeat(0.0, delta-1).Concat(new[]{seed}).Concat(smas);

And here's the scan method, taken and adjusted from here:

public static IEnumerable<TAccumulate> Scan<TSource, TAccumulate>(
    this IEnumerable<TSource> source,
    TAccumulate seed,
    Func<TAccumulate, TSource, TAccumulate> accumulator
)
{
    if (source == null) throw new ArgumentNullException("source");
    if (seed == null) throw new ArgumentNullException("seed");
    if (accumulator == null) throw new ArgumentNullException("accumulator");

    using (var i = source.GetEnumerator())
    {
        if (!i.MoveNext())
        {
            throw new InvalidOperationException("Sequence contains no elements");
        }
        var acc = accumulator(seed, i.Current);

        while (i.MoveNext())
        {
            yield return acc;
            acc = accumulator(acc, i.Current);
        }
        yield return acc;
    }
}

This should have better performance than the brute force method since we are using a running total to calculate the SMA.

What's going on here?

To start we need to calculate the first period which we call seed here. Then, every subsequent value we calculate from the accumulated seed value. To do that we need the old value (that is t-delta) and the newest value for which we zip together the series, once from the beginning and once shifted by the delta.

At the end we do some cleanup by adding zeroes for the length of the first period and adding the initial seed value.

回复收藏 0 原文

情话已封尘 2024-10-27 01:53:53

另一种选择是使用 MoreLINQ 的 Windowed 方法，这可以简化代码显著地：

var averaged = mySeries.Windowed(period).Select(window => window.Average(keyValuePair => keyValuePair.Value));

Another option is to use MoreLINQ's Windowed method, which simplifies the code significantly:

var averaged = mySeries.Windowed(period).Select(window => window.Average(keyValuePair => keyValuePair.Value));

回复收藏 0 原文

苄①跕圉湢 2024-10-27 01:53:53

我使用此代码来计算 SMA：

private void calculateSimpleMA(decimal[] values, out decimal[] buffer)
{
    int period = values.Count();     // gets Period (assuming Period=Values-Array-Size)
    buffer = new decimal[period];    // initializes buffer array
    var sma = SMA(period);           // gets SMA function
    for (int i = 0; i < period; i++)
        buffer[i] = sma(values[i]);  // fills buffer with SMA calculation
}

static Func<decimal, decimal> SMA(int p)
{
    Queue<decimal> s = new Queue<decimal>(p);
    return (x) =>
    {
        if (s.Count >= p)
        {
            s.Dequeue();
        }
        s.Enqueue(x);
        return s.Average();
    };
}

I use this code to calculate SMA:

private void calculateSimpleMA(decimal[] values, out decimal[] buffer)
{
    int period = values.Count();     // gets Period (assuming Period=Values-Array-Size)
    buffer = new decimal[period];    // initializes buffer array
    var sma = SMA(period);           // gets SMA function
    for (int i = 0; i < period; i++)
        buffer[i] = sma(values[i]);  // fills buffer with SMA calculation
}

static Func<decimal, decimal> SMA(int p)
{
    Queue<decimal> s = new Queue<decimal>(p);
    return (x) =>
    {
        if (s.Count >= p)
        {
            s.Dequeue();
        }
        s.Enqueue(x);
        return s.Average();
    };
}

回复收藏 0 原文

野却迷人 2024-10-27 01:53:53

这是一个扩展方法：

public static IEnumerable<double> MovingAverage(this IEnumerable<double> source, int period)
{
    if (source is null)
    {
        throw new ArgumentNullException(nameof(source));
    }

    if (period < 1)
    {
        throw new ArgumentOutOfRangeException(nameof(period));
    }

    return Core();

    IEnumerable<double> Core()
    {
        var sum = 0.0;
        var buffer = new double[period];
        var n = 0;
        foreach (var x in source)
        {
            n++;
            sum += x;
            var index = n % period;
            if (n >= period)
            {
                sum -= buffer[index];
                yield return sum / period;
            }

            buffer[index] = x;
        }
    }
}

Here is an extension method:

public static IEnumerable<double> MovingAverage(this IEnumerable<double> source, int period)
{
    if (source is null)
    {
        throw new ArgumentNullException(nameof(source));
    }

    if (period < 1)
    {
        throw new ArgumentOutOfRangeException(nameof(period));
    }

    return Core();

    IEnumerable<double> Core()
    {
        var sum = 0.0;
        var buffer = new double[period];
        var n = 0;
        foreach (var x in source)
        {
            n++;
            sum += x;
            var index = n % period;
            if (n >= period)
            {
                sum -= buffer[index];
                yield return sum / period;
            }

            buffer[index] = x;
        }
    }
}

回复收藏 0 原文

~没有更多了~