如何将 prometheus promql 中的查询的所有结果聚合为 1 个平均值

发布于 2025-01-12 06:58:26 字数 1092 浏览 1 评论 0原文

我有一个运行许多应用程序的 kubernetes 集群。每 1 个命名空间 1 个 pod 被视为一个应用程序。每个命名空间仅运行 1 个 pod（应用程序 pod）。一个例子如下：这样的

(Note that id<x> is a complete random string, so id1 it's not what an id looks like)
namespace: app-id1, only-pod-running-in-this-namespace: app-id1   
namespace: app-id2, only-pod-running-in-this-namespace: app-id2   
namespace: app-id3, only-pod-running-in-this-namespace: app-id3  
namespace: app-id4, only-pod-running-in-this-namespace: app-id4

例子不胜枚举。我试图通过检查 Pod 状态来获取每个应用程序的正常运行时间。在普罗米修斯中，我这样做：

kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"}

这会返回一个表，其中包含所有现有应用程序（每条记录是 1 个应用程序状态）及其正常运行时间值（如果 pod 已启动，则为 1，如果 pod 已关闭，则为 0）。

现在我想创建另一个指标，返回所有应用程序组合的平均值。也就是说，我只想返回 1 条记录以及所有应用程序的平均值。所以，假设我有 100 个应用程序，那么如果 1 个应用程序宕机了 5 分钟，我希望 5 分钟的窗口显示 99（实际上是 0.99），而不是 99 个应用程序显示 1，1 个应用程序显示 0。我希望这是有道理的。这就是我正在尝试的方法，但它不起作用，因为它返回每个应用程序有 1 条记录的表。

avg_over_time(kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"}[5m])

原文

I have a kubernetes cluster running many apps. 1 pod per 1 namespace is considered an app. only 1 pod (the app pod) runs per namespace. an example looks like this:

(Note that id<x> is a complete random string, so id1 it's not what an id looks like)
namespace: app-id1, only-pod-running-in-this-namespace: app-id1   
namespace: app-id2, only-pod-running-in-this-namespace: app-id2   
namespace: app-id3, only-pod-running-in-this-namespace: app-id3  
namespace: app-id4, only-pod-running-in-this-namespace: app-id4

The list goes on and on.
I am trying to get the uptime for each app, by checking the pod status. In prometheus, I am doing it like this:

kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"}

this returns a table with all existing apps (each record is 1 app status), and their uptime value (1 if the pod is up, 0 is the pod is down).

Now I want to create another metric, that returns the average for all apps combined. This is, I want only 1 record returned with the average for all apps. So, let's say I have 100 apps, then if 1 went down for 5 min, I want that 5 min window to show 99 (or 0.99 actually) as a result, instead of 99 apps showing 1, and 1 app showing 0.
I hope that make sense.
This is how I am trying, but it's not working, as it's returning a table with 1 record per app.

avg_over_time(kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"}[5m])

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

坠似风落 2025-01-19 06:58:26

我的理解是您希望所有应用程序实例的百分比上升？ kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"} 返回 1 启动时且 0 当关闭时？如果是这样，可能是一个 sum(kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"}) / count(kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"}) 可以吗？