通用列表的标准差?
我需要计算通用列表的标准差。我会尝试包含我的代码。它是一个包含数据的通用列表。数据主要是浮点数和整数。这是我的代码,与它相关,但没有详细介绍:
namespace ValveTesterInterface
{
public class ValveDataResults
{
private List<ValveData> m_ValveResults;
public ValveDataResults()
{
if (m_ValveResults == null)
{
m_ValveResults = new List<ValveData>();
}
}
public void AddValveData(ValveData valve)
{
m_ValveResults.Add(valve);
}
这是需要计算标准差的函数:
public float LatchStdev()
{
float sumOfSqrs = 0;
float meanValue = 0;
foreach (ValveData value in m_ValveResults)
{
meanValue += value.LatchTime;
}
meanValue = (meanValue / m_ValveResults.Count) * 0.02f;
for (int i = 0; i <= m_ValveResults.Count; i++)
{
sumOfSqrs += Math.Pow((m_ValveResults - meanValue), 2);
}
return Math.Sqrt(sumOfSqrs /(m_ValveResults.Count - 1));
}
}
}
忽略 LatchStdev() 函数内部的内容,因为我确定它的值不对。这只是我计算标准偏差的糟糕尝试。我知道如何处理双精度列表,但不知道如何处理通用数据列表列表。如果有人有这方面的经验,请帮忙。
I need to calculate the standard deviation of a generic list. I will try to include my code. Its a generic list with data in it. The data is mostly floats and ints. Here is my code that is relative to it without getting into to much detail:
namespace ValveTesterInterface
{
public class ValveDataResults
{
private List<ValveData> m_ValveResults;
public ValveDataResults()
{
if (m_ValveResults == null)
{
m_ValveResults = new List<ValveData>();
}
}
public void AddValveData(ValveData valve)
{
m_ValveResults.Add(valve);
}
Here is the function where the standard deviation needs to be calculated:
public float LatchStdev()
{
float sumOfSqrs = 0;
float meanValue = 0;
foreach (ValveData value in m_ValveResults)
{
meanValue += value.LatchTime;
}
meanValue = (meanValue / m_ValveResults.Count) * 0.02f;
for (int i = 0; i <= m_ValveResults.Count; i++)
{
sumOfSqrs += Math.Pow((m_ValveResults - meanValue), 2);
}
return Math.Sqrt(sumOfSqrs /(m_ValveResults.Count - 1));
}
}
}
Ignore whats inside the LatchStdev() function because I'm sure its not right. Its just my poor attempt to calculate the st dev. I know how to do it of a list of doubles, however not of a list of generic data list. If someone had experience in this, please help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
上面的示例稍有不正确,如果您的总体集为 1,则可能会出现除以零的错误。以下代码稍微简单一些,并给出“总体标准差”结果。 (http://en.wikipedia.org/wiki/Standard_deviation)
The example above is slightly incorrect and could have a divide by zero error if your population set is 1. The following code is somewhat simpler and gives the "population standard deviation" result. (http://en.wikipedia.org/wiki/Standard_deviation)
这篇文章应该对您有帮助。它创建一个计算双精度值序列偏差的函数。您所要做的就是提供一系列适当的数据元素。
结果函数是:
只要我们为正在计算的值提供一个选择器,这很容易适应任何泛型类型。 LINQ 对此非常有用,
Select
函数允许您从自定义类型的通用列表中投影一系列数值来计算标准差:This article should help you. It creates a function that computes the deviation of a sequence of
double
values. All you have to do is supply a sequence of appropriate data elements.The resulting function is:
This is easy enough to adapt for any generic type, so long as we provide a selector for the value being computed. LINQ is great for that, the
Select
funciton allows you to project from your generic list of custom types a sequence of numeric values for which to compute the standard deviation:尽管接受的答案在数学上看起来是正确的,但从编程的角度来看它是错误的 - 它枚举了相同的序列 4 次。如果底层对象是列表或数组,这可能没问题,但如果输入是过滤/聚合/等 linq 表达式,或者数据直接来自数据库或网络流,这将导致性能低得多。
我强烈建议不要重新发明轮子,而使用更好的开源数学库 Math.NET 之一。我们一直在我们公司使用该库,并且对其性能非常满意。
请参阅 http://numerics.mathdotnet.com/docs/DescriptiveStatistics.html< /a> 了解更多信息。
最后,对于那些想要获得尽可能快的结果并牺牲一些精度的人,请阅读“一次性”算法 https://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods
Even though the accepted answer seems mathematically correct, it is wrong from the programming perspective - it enumerates the same sequence 4 times. This might be ok if the underlying object is a list or an array, but if the input is a filtered/aggregated/etc linq expression, or if the data is coming directly from the database or network stream, this would cause much lower performance.
I would highly recommend not to reinvent the wheel and use one of the better open source math libraries Math.NET. We have been using that lib in our company and are very happy with the performance.
See http://numerics.mathdotnet.com/docs/DescriptiveStatistics.html for more information.
Lastly, for those who want to get the fastest possible result and sacrifice some precision, read "one-pass" algorithm https://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods
我看到你在做什么,我也使用类似的东西。在我看来,你走得还不够远。我倾向于将所有数据处理封装到一个类中,这样我就可以缓存计算的值,直到列表发生更改。
例如:
您会注意到,使用此方法,只有第一个平均请求才实际计算平均值。之后,只要我们不从列表中添加(或删除或修改,但那些未显示的内容)任何内容,我们基本上就可以获得平均值。
此外,由于标准偏差算法中使用平均值,因此首先计算标准偏差将免费为我们提供平均值,并且首先计算平均值将为我们在标准偏差计算中带来一点性能提升,假设我们记得检查标志。
此外!像平均函数这样的地方,无论如何你已经循环遍历每个值,是缓存最小值和最大值之类的东西的好时机。当然,对此信息的请求需要首先检查它们是否已被缓存,与仅使用列表查找最大值相比,这可能会导致相对缓慢的速度,因为它完成了设置所有相关缓存的所有额外工作,而不仅仅是设置您访问的一个。
I see what you're doing, and I use something similar. It seems to me you're not going far enough. I tend to encapsulate all data processing into a single class, that way I can cache the values that are calculated until the list changes.
for instance:
You'll notice that using this method, only the first request for average actually computes the average. After that, as long as we don't add (or remove, or modify at all, but those arnt shown) anything from the list, we can get the average for basically nothing.
Additionally, since the average is used in the algorithm for the standard deviation, computing the standard deviation first will give us the average for free, and computing the average first will give us a little performance boost in the standard devation calculation, assuming we remember to check the flag.
Furthermore! places like the average function, where you're looping through every value already anyway, is a great time to cache things like the minimum and maximum values. Of course, requests for this information need to first check whether theyve been cached, and that can cause a relative slowdown compared to just finding the max using the list, since it does all the extra work setting up all the concerned caches, not just the one your accessing.