高效保存加权移动平均线的数据结构/算法

发布于 2024-12-16 23:36:56 字数 428 浏览 0 评论 0原文

我想在存储日志记录时总结多个不同类别的移动平均值。想象一下，有一项服务一次保存一个 Web 服务器日志条目。让我们进一步想象一下，我们无权访问记录的记录。所以我们只能看到它们一次，但之后就无法再访问它们了。

对于不同的页面，我想知道

点击总数（简单）
“最近”平均值（例如一个月左右）
“长期”平均值（一年以上）

是否有任何聪明的算法/数据模型可以允许保存这样的移动平均线，而不必通过汇总大量数据来重新计算它们？

我不需要精确的平均值（正好 30 天左右），而只需要趋势指标。所以有些模糊根本不是问题。它应该只是确保新条目的权重高于旧条目。

一种解决方案可能是自动创建每个月的统计记录。然而，我什至不需要过去一个月的统计数据，所以这似乎有点矫枉过正。它不会给我一个移动平均值，而是逐月交换到新值。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

近箐 2024-12-23 23:36:56

一个简单的解决方案是保持指数衰减的总数。

可以使用以下公式计算：

newX = oldX * (p ^ (newT - oldT)) + delta

其中 oldX 是总计的旧值（在时间 oldT），newX 是新值总计（在 newT 时间）； delta 是新事件对总数的贡献（例如今天的点击数）； p 小于或等于 1，是衰减因子。如果我们取p = 1，那么我们就得到了命中总数。通过减少p，我们有效地减少了总数描述的区间。

An easy solution would be to keep an exponentially decaying total.

It can be calculated using the following formula:

newX = oldX * (p ^ (newT - oldT)) + delta

where oldX is the old value of your total (at time oldT), newX is the new value of your total (at time newT); delta is the contribution of new events to the total (for example the number of hits today); p is less or equal to 1 and is the decay factor. If we take p = 1, then we have the total number of hits. By decreasing p, we effectively decrease the interval our total describes.

回复收藏 0 原文

一梦浮鱼 2024-12-23 23:36:56

如果您真正想要的是具有给定时间常数的平滑值，那么最简单的方法是使用单极递归IIR滤波器（又名AR或自动时间序列分析中的回归过滤器）。其形式为：

Xnew = k * X_old + (1 - k) * x

其中 X_old 是先前的平滑值，X_new 是新的平滑值，x 是当前数据点，k 是确定时间常数的因子（通常是一个很小的值，<0.1）。您可能需要根据您的采样率凭经验确定两个 k 值（一个值表示“最近”，较小的值表示“长期”），理想情况下该值应该相当恒定，例如每天更新一次。

If all you really want is a smoothed value with a given time constant then the easiest thing is to use a single pole recursive IIR filter (aka AR or auto-regressive filter in time series analysis). This takes the form:

Xnew = k * X_old + (1 - k) * x

where X_old is the previous smoothed value, X_new is the new smoothed value, x is the current data point and k is a factor which determines the time constant (usually a small value, < 0.1). You may need to determine the two k values (one value for "recent" and a smaller value for "long term") empirically, based on your sample rate, which ideally should be reasonably constant, e.g. one update per day.

回复收藏 0 原文