尖峰去除算法
我有一组范围从 30 到 300 的值。我想以某种方式进行加权平均值,其中,如果我有 5 个值,并且其中一个值比其余值(尖峰)大很多,那么它不会对平均值产生太大影响就像我简单地进行算术平均一样:例如:(n1+n2+n3+n4+n5)/5
。
有谁知道如何制作一个简单的算法来做到这一点,或者在哪里寻找?
I have an array of values ranging from 30 to 300. I want to somehow make an weighted average, where, if I have 5 values and one is a lot bigger than the rest(spike), it won't influence the average that much as it would if I simply make a arithmetic average: eg: (n1+n2+n3+n4+n5)/5
.
Does anyone has an idea how to make an simple algorithm that does just that, or where to look?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
听起来您想要丢弃超出您指定的某些参数范围的数据。您可以通过计算中位数/众数并在计算平均值时忽略此范围之外的值来完成此操作。当然,您必须相应地调整除数,以考虑丢弃值的数量。这个“可容忍”范围最终由您决定,并且可能取决于您的特定应用需求。
或者,您可以尝试消除超出总平均值范围 r% 的项目。像这样的东西(在javascript中):
Sounds like you're looking to discard data that falls outside some parameter range you've specified. You could do it by computing the median/mode and ignoring values outside of this range when computing your mean. You'll have to adjust the divisor accordingly, of course, to account for the number of discarded values. What this "tolerable" range should be is ultimately up to you to decide, and will likely depend on your specific application needs.
Alternatively, you could try something like eliminating items r% out of range of your total average. Something like this (in javascript):
您可以尝试使用中值过滤器而不是均值过滤器。它通常用于图像处理中以减少虚假像素值(与白噪声相反)。
You could try a median filter rather than a mean filter. It's often used in image processing to mitigate spurious pixel values (as opposed to white noise).
正如您所注意到的,均值很容易受到尖峰的影响。也许中位数或众数可能是更好的统计数据,因为它们往往不那么倾斜?
这应该是一条评论,但 js 对我来说似乎被破坏了:不太清楚您是否在追求数组特征的单个数字(即平均值)或删除了尖峰的新数组(中值滤波器
)对此,我建议您首先看看中位数或众数是否更适合作为统计数据。如果不是,则应用中值滤波器(非常擅长消除尖峰),然后求平均值
As you have noticed the mean is susceptible to skewing by spikes. perhaps median or mode may be a better statistic as they tend to be less skewed?
this should be a comment but js seems to be broken for me atm: its not quite clear whether you are after a single number that is characteristic of your array (i.e. an average) or a new array with the spikes removed (median filter)
in response to that then i'd suggest you first look at if median or mode is more appropriate as a statistic. if not then apply a median filter (very good at removing spikes) then average
卡尔曼滤波器经常用于类似的应用中。我不知道它是否符合“简单”的条件,但它很强大并且众所周知。
A Kalman filter is often used in similar applications. I don't know if it qualifies as "simple," but it's robust and well known.
有很多方法可以做到这一点:您可以实现低通数字滤波器。
或者,如果您如果您只关心从统计摘要中删除异常值,您可以在平均之前从数据集中删除最高和最低 N% 的数据值。
Lots of ways of doing this: You could implement a low-pass digital filter.
Or, if you're just concerned about removing outliers from a statistical summary, you could just remove the highest and lowest N% of your data values from the dataset before averaging.
“稳健的统计数据”是一个可以让您进入文献的搜索词。卡尔曼滤波器的一个优点是,您可以对数据的变异性进行运行估计,这使您最终可以“放弃迄今为止的整组观测值中可能是虚假的超过 x% 的观测值”。
"Robust statistics" is the search term that will get you into the literature. An advantage of a Kalman filter is that you have a running estimate of the variability of the data, and this allows you eventually to "discard observations that are more than x% likely to be spurious given the whole set of observations so far".