是否有可能通过Prometheous获得准确的每分钟指标

发布于 2025-02-13 20:37:09 字数 1033 浏览 0 评论 0原文

目标

跟踪RPM和通过Grafana& Prometheus

情况

我们正在使用

django-prometheus -> To emit metrics 
fluent-bit -> Scrapes django metrics every 15s and pushes to prometheus 
prometheus -> 2 shards running via prometheus operator on k8s

问题

当我们将Grafana仪表板与AWS目标组请求指标进行比较时,它不匹配。 尝试了所有以下选项

Expr: sum by(service) (irate(django_http_requests_before_middlewares_total{namespace="name"}[5m]))
Expr: sum by(service) (increase(django_http_requests_before_middlewares_total{namespace="name"}[5m]))
Expr: sum by(service) (rate(django_http_requests_before_middlewares_total{namespace="name"}[5m]))
django_http_requests_before_middlewares_total -> This is Counter data type.
This counter never resets because we have unique dimension
- container_id
- service_name
- namespace   

q。是否可以在Grafana上创建类似于AWS目标组指标的仪表板?

理想情况下,<代码>增加应该有效,但它需要持续的差异,这可能给出错误的结果。

提前致谢。

Goal

Track RPM and Up time via grafana & prometheus

Situation

We are using

django-prometheus -> To emit metrics 
fluent-bit -> Scrapes django metrics every 15s and pushes to prometheus 
prometheus -> 2 shards running via prometheus operator on k8s

Problem

When we compare grafana dashboard with aws target group request metrics it isn't matching.
Tried all below options

Expr: sum by(service) (irate(django_http_requests_before_middlewares_total{namespace="name"}[5m]))
Expr: sum by(service) (increase(django_http_requests_before_middlewares_total{namespace="name"}[5m]))
Expr: sum by(service) (rate(django_http_requests_before_middlewares_total{namespace="name"}[5m]))
django_http_requests_before_middlewares_total -> This is Counter data type.
This counter never resets because we have unique dimension
- container_id
- service_name
- namespace   

Q. Is it possible to create dashboard on grafana which resembles aws target group metrics ?

Ideally increase should work but it takes diff continuously and that might be giving incorrect result.

Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

鹤舞 2025-02-20 20:37:09

从理论上讲,以下查询应返回最后一分钟的确切每服务请求的确切数量:

sum(
  increase(django_http_requests_before_middlewares_total[1m])
) by (service)

但是实际上,Prometheus可能会返回此查询的意外结果:

  • 由于推断外推,它可以通过整数计数器返回分数结果。请参阅此问题 ,有关详细信息。
  • 它的返回可能低于预期的结果,因为Prometheus忽略了在方括号中指定的loodBehind窗口之前的最后一个原始样本之间的计数器增加(例如,在上面的查询中[1M] )和第一个原始示例在LookBehind窗口中。
  • 如果方括号中指定的lookBehind窗口包含少于两个原始样本,则可以返回空结果。例如,如果原始样本之间的间隔不超过一分钟,则增加(m [d])将返回d&lt; = 1m的空结果。

Prometheus开发人员知道这些问题并将解决这些问题 - 请参阅

同时,您可以尝试使用增加() victoriametrics - 这是我使用的类似Prometheus的监视解决方案。它的'增加函数与上述问题免费。

一个重要的说明:Prometheus和Victoriametrics comle courter courter courter courcal courcle和Victoriametrics comle counter courcal courcal courcle courcal courcle和victoriametrics courter cours counterical courcle courts cours courthe cours cy cous cous ciles cour cy ciles cours cy cile cluctiage的均独立计算图表上显示的每个点。因此,如果您需要使用上面的查询显示每分钟的请求,则需要将图表上的点之间的间隔设置为一分钟。

In theory the following query should return the exact number of per-service requests for the last minute:

sum(
  increase(django_http_requests_before_middlewares_total[1m])
) by (service)

But in practice Prometheus may return unexpected results for this query:

  • It can return fractional results over the integer counter because of extrapolation. See this issue for details.
  • It can return lower than expected results, since Prometheus ignores the counter increase between the last raw sample just before the lookbehind window specified in square brackets (e.g. [1m] in the query above) and the first raw sample in the lookbehind window.
  • It can return empty result if the specified lookbehind window in square brackets contains less than two raw samples. For example, if the interval between raw samples doesn't exceed one minute, then the increase(m[d]) would return empty results for d <= 1m.

Prometheus developers are aware of these issues and are going to fix them - see this design doc.

In the mean time you can try using increase() function in VictoriaMetrics - this is Prometheus-like monitoring solution I work on. Its' increase function is free from issues mentioned above.

An important note: both Prometheus and VictoriaMetrics calculate query results independently per each point displayed on the graph. So, if you need displaying per-minute number of requests using the query above, you need to set the interval between points on the graph (aka step) to one minute.

横笛休吹塞上声 2025-02-20 20:37:09

tl; dr-不,Prometheus并没有保留足够的数据来提供完全精确的值。

要了解为什么,让我们假设1分钟前Prometheus刮擦了10的值 http_requests,现在它已更新为40

很明显,使用1M对您不完全知道这30个请求的最后一刻。这是短尖峰还是分布均匀?无论如何,rate(http_requests [1M])将为您提供(40-10)/60s = 0.5每秒请求。 增加()以相同的方式工作,它是rate()*Interval0.5*60 = 30 = 30

尽管上面的示例显示了精确的值,但应该很明显,您将无法通过此数学实现完美的精度。除非您要处理缓慢的计数器(几分钟更新一次),否则错误通常是微不足道的。

tl;dr - no, Prometheus does not keep enough data to give perfectly precise values.

To see why, let's assume that 1 minute ago Prometheus has scraped a value of 10 for metric http_requests and just now it has been updated to 40.

It's already clear that with 1m sampling you don't exactly know when during the last minute these 30 requests happened. Was it a short spike or were they distributed evenly? Regardless of that, rate(http_requests[1m]) will give you (40-10)/60s = 0.5 requests per second. Increase() works in the same fashion, it's rate()*interval or 0.5*60 = 30.

Although, the example above shows precise values, it should be clear that you won't be able to achieve perfect precision with this math. The error is generally insignificant unless you are dealing with slow-moving counters (which update once in several minutes).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文