当前位置：文江博客话题详情

计算滚动窗口中每秒的消息数？

发布于 2024-08-21 20:24:50 字数 383 浏览 5 评论 0原文

我的程序中有消息以毫秒分辨率进入（从零到几百条消息/毫秒）。

我想做一些分析。具体来说，我想维护消息计数的多个滚动窗口，并在消息进入时进行更新。例如，

最后一秒的消息数、
最后一分钟的消息数、
最后半小时的消息数除以过去一小时内的消息数

我不能只维护一个简单的计数，例如 “最后一秒有 1,017 条消息”，因为我不知道消息何时早于 1 秒，因此应该不再在计数中...

我想到维护所有消息的队列，搜索超过一秒的最年轻的消息，并从索引推断计数。然而，这似乎太慢了，并且会消耗大量内存。

我可以做什么来跟踪程序中的这些计数，以便我可以有效地实时获取这些值？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

北斗星光 2024-08-28 20:24:50

这最容易通过循环缓冲区来处理。

循环缓冲区具有固定数量的元素和指向它的指针。您可以将一个元素添加到缓冲区，当您这样做时，您将增加指向下一个元素的指针。如果超过了固定长度缓冲区，则从头开始。这是存储“最后 N”个项目的一种节省空间和时间的方式。

现在，在您的情况下，您可以拥有一个包含 1,000 个计数器的循环缓冲区，每个计数器计算一毫秒内的消息数量。将所有 1,000 个计数器相加即可得出最后一秒的总计数。当然，您可以通过增量更新计数来优化报告部分，即从计数中扣除插入时覆盖的数字，然后添加新数字。

然后，您可以拥有另一个具有 60 个槽的循环缓冲区，并计算整秒内的消息总数；每秒一次，获取毫秒缓冲区的总计数并将计数写入具有秒分辨率的缓冲区等。

这里是类似 C 的伪代码：

int msecbuf[1000]; // initialized with zeroes
int secbuf[60]; // ditto
int msecptr = 0, secptr = 0;
int count = 0;
int msec_total_ctr = 0;
void msg_received() { count++; }
void every_msec() {
  msec_total_ctr -= msecbuf[msecptr];
  msecbuf[msecptr] = count;
  msec_total_ctr += msecbuf[msecptr];
  count = 0;
  msecptr = (msecptr + 1) % 1000;
}
void every_sec() {
  secbuf[secptr] = msec_total_ctr;
  secptr = (secptr + 1) % 60;
}

This is easiest handled by a cyclic buffer.

A cyclic buffer has a fixed number of elements, and a pointer to it. You can add an element to the buffer, and when you do, you increment the pointer to the next element. If you get past the fixed-length buffer you start from the beginning. It's a space and time efficient way to store "last N" items.

Now in your case you could have one cyclic buffer of 1,000 counters, each one counting the number of messages during one millisecond. Adding all the 1,000 counters gives you the total count during last second. Of course you can optimize the reporting part by incrementally updating the count, i.e. deduct form the count the number you overwrite when you insert and then add the new number.

You can then have another cyclic buffer that has 60 slots and counts the aggregate number of messages in whole seconds; once a second, you take the total count of the millisecond buffer and write the count to the buffer having resolution of seconds, etc.

Here C-like pseudocode:

int msecbuf[1000]; // initialized with zeroes
int secbuf[60]; // ditto
int msecptr = 0, secptr = 0;
int count = 0;
int msec_total_ctr = 0;
void msg_received() { count++; }
void every_msec() {
  msec_total_ctr -= msecbuf[msecptr];
  msecbuf[msecptr] = count;
  msec_total_ctr += msecbuf[msecptr];
  count = 0;
  msecptr = (msecptr + 1) % 1000;
}
void every_sec() {
  secbuf[secptr] = msec_total_ctr;
  secptr = (secptr + 1) % 60;
}

回复收藏 0 原文

铜锣湾横着走 2024-08-28 20:24:50

您需要指数平滑，也称为指数加权移动平均线。获取自上一条消息到达以来的时间的 EWMA，然后将该时间划分为一秒。您可以使用不同的重量运行其中几个，以有效覆盖更长的时间间隔。实际上，您使用的是无限长的窗口，因此您不必担心数据过期；减肥可以为你做到这一点。

回复收藏 0 原文

灯角 2024-08-28 20:24:50

对于最后一个毫秒，保留计数。当毫秒切片转到下一个时，重置计数并将计数添加到毫秒滚动缓冲区数组。如果保持这个累积值，则可以使用固定的内存量提取每秒的消息数。

当 0.1 秒的切片（或紧邻 1 分钟的其他一些小值）完成时，将滚动缓冲区数组中的最后 0.1*1000 个项目相加，并将其放入下一个滚动缓冲区中。通过这种方式，您可以保持较小的毫秒滚动缓冲区（最多 1 秒查找 1000 个项目）和每分钟查找缓冲区（600 个项目）。

您可以以 0.1 分钟为间隔进行整分钟的下一个技巧。所有提出的问题都可以通过对几个整数求和（或者使用 cummulative 时，减去两个值）来查询。

唯一的缺点是最后的秒值每毫秒改变一次，分钟值仅每 0.1 秒改变一次，小时值（以及最后 1/2 小时的 % 的导数）每 0.1 分钟改变一次。但至少你可以控制内存的使用。

回复收藏 0 原文

野生奥特曼 2024-08-28 20:24:50

您的滚动显示窗口只能更新这么快，假设您想每秒更新 10 次，因此对于 1 秒的数据，您将需要 10 个值。每个值将包含 1/10 秒内显示的消息数。我们将这些值称为 bin，每个 bin 保存 1/10 秒的数据。每 100 毫秒，其中一个 bin 就会被丢弃，并且一个新的 bin 会被设置为这 100 毫秒内显示的消息数。

如果您想在整个小时内保持 1/10 秒的精度，则需要一组 36K bin 来保存一小时的有关消息速率的信息。但这似乎有些过分了。

但我认为随着时间间隔变大，精度下降会更合理。

也许您将 1 秒的数据精确到 100 毫秒，将 1 分钟的数据精确到秒，将 1 小时的数据精确到分钟，等等。

回复收藏 0 原文