高效保存加权移动平均线的数据结构/算法

发布于 2024-12-16 23:36:56 字数 428 浏览 0 评论 0原文

我想在存储日志记录时总结多个不同类别的移动平均值。想象一下,有一项服务一次保存一个 Web 服务器日志条目。让我们进一步想象一下,我们无权访问记录的记录。所以我们只能看到它们一次,但之后就无法再访问它们了。

对于不同的页面,我想知道

  • 点击总数(简单)
  • “最近”平均值(例如一个月左右)
  • “长期”平均值(一年以上)

是否有任何聪明的算法/数据模型可以允许保存这样的移动平均线,而不必通过汇总大量数据来重新计算它们?

我不需要精确的平均值(正好 30 天左右),而只需要趋势指标。所以有些模糊根本不是问题。它应该只是确保新条目的权重高于旧条目。

一种解决方案可能是自动创建每个月的统计记录。然而,我什至不需要过去一个月的统计数据,所以这似乎有点矫枉过正。它不会给我一个移动平均值,而是逐月交换到新值。

I'd like to sum up moving averages for a number of different categories when storing log records. Imagine a service that saves web server logs one entry at a time. Let's further imagine, we don't have access to the logged records. So we see them once but don't have access to them later on.

For different pages, I'd like to know

  • the total number of hits (easy)
  • a "recent" average (like one month or so)
  • a "long term" average (over a year)

Is there any clever algorithm/data model that allows to save such moving averages without having to recalculate them by summing up huge quantities of data?

I don't need an exact average (exactly 30 days or so) but just trend indicators. So some fuzziness is not a problem at all. It should just make sure that newer entries are weighted higher than older ones.

One solution probably would be to auto-create statistics records for each month. However, I don't even need past month statistics, so this seems like overkill. And it wouldn't give me a moving average but rather swap to new values from month to month.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

近箐 2024-12-23 23:36:56

一个简单的解决方案是保持指数衰减的总数。

可以使用以下公式计算:

newX = oldX * (p ^ (newT - oldT)) + delta

其中 oldX 是总计的旧值(在时间 oldT),newX 是新值总计(在 newT 时间); delta 是新事件对总数的贡献(例如今天的点击数); p 小于或等于 1,是衰减因子。如果我们取p = 1,那么我们就得到了命中总数。通过减少p,我们有效地减少了总数描述的区间。

An easy solution would be to keep an exponentially decaying total.

It can be calculated using the following formula:

newX = oldX * (p ^ (newT - oldT)) + delta

where oldX is the old value of your total (at time oldT), newX is the new value of your total (at time newT); delta is the contribution of new events to the total (for example the number of hits today); p is less or equal to 1 and is the decay factor. If we take p = 1, then we have the total number of hits. By decreasing p, we effectively decrease the interval our total describes.

一梦浮鱼 2024-12-23 23:36:56

如果您真正想要的是具有给定时间常数的平滑值,那么最简单的方法是使用单极递归IIR滤波器(又名AR自动时间序列分析中的回归过滤器)。其形式为:

Xnew = k * X_old + (1 - k) * x

其中 X_old 是先前的平滑值,X_new 是新的平滑值,x 是当前数据点,k 是确定时间常数的因子(通常是一个很小的值,<0.1)。您可能需要根据您的采样率凭经验确定两个 k 值(一个值表示“最近”,较小的值表示“长期”),理想情况下该值应该相当恒定,例如每天更新一次。

If all you really want is a smoothed value with a given time constant then the easiest thing is to use a single pole recursive IIR filter (aka AR or auto-regressive filter in time series analysis). This takes the form:

Xnew = k * X_old + (1 - k) * x

where X_old is the previous smoothed value, X_new is the new smoothed value, x is the current data point and k is a factor which determines the time constant (usually a small value, < 0.1). You may need to determine the two k values (one value for "recent" and a smaller value for "long term") empirically, based on your sample rate, which ideally should be reasonably constant, e.g. one update per day.

走过海棠暮 2024-12-23 23:36:56

这可能是您的解决方案。

您可以将数据聚合到按小时或天分组的中间存储。分组功能将工作得非常快,因为您需要对少量记录进行分组,并且插入也会很快。精确决策由您决定。

它比自相关指数算法更好,因为您可以更轻松地理解您计算的内容,并且不需要每一步都进行数学计算。

对于最后学期的数据,您可以使用记录数量有限的上限集合。它们受到一些数据库(例如 MongoDB)的本地支持。

It may be solution for you.

You can aggregate data to intermediate storage grouped by hour or day. Than grouping function will work very fast, because you will need to group small amount of records and inserts will be fast as well. Precision decisions up to you.

It can be better than auto-correlated exponential algorithms because you can understand what you calculate easier and it doesn't require math each step.

For last term data you can use capped collections with limited amount of records. They supported natively by some DBs for example MongoDB.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文