如何将 prometheus promql 中的查询的所有结果聚合为 1 个平均值
我有一个运行许多应用程序的 kubernetes 集群。每 1 个命名空间 1 个 pod 被视为一个应用程序。每个命名空间仅运行 1 个 pod(应用程序 pod)。一个例子如下: 这样的
(Note that id<x> is a complete random string, so id1 it's not what an id looks like)
namespace: app-id1, only-pod-running-in-this-namespace: app-id1
namespace: app-id2, only-pod-running-in-this-namespace: app-id2
namespace: app-id3, only-pod-running-in-this-namespace: app-id3
namespace: app-id4, only-pod-running-in-this-namespace: app-id4
例子不胜枚举。 我试图通过检查 Pod 状态来获取每个应用程序的正常运行时间。在普罗米修斯中,我这样做:
kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"}
这会返回一个表,其中包含所有现有应用程序(每条记录是 1 个应用程序状态)及其正常运行时间值(如果 pod 已启动,则为 1,如果 pod 已关闭,则为 0)。
现在我想创建另一个指标,返回所有应用程序组合的平均值。也就是说,我只想返回 1 条记录以及所有应用程序的平均值。所以,假设我有 100 个应用程序,那么如果 1 个应用程序宕机了 5 分钟,我希望 5 分钟的窗口显示 99(实际上是 0.99),而不是 99 个应用程序显示 1,1 个应用程序显示 0。 我希望这是有道理的。 这就是我正在尝试的方法,但它不起作用,因为它返回每个应用程序有 1 条记录的表。
avg_over_time(kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"}[5m])
I have a kubernetes cluster running many apps. 1 pod per 1 namespace is considered an app. only 1 pod (the app pod) runs per namespace. an example looks like this:
(Note that id<x> is a complete random string, so id1 it's not what an id looks like)
namespace: app-id1, only-pod-running-in-this-namespace: app-id1
namespace: app-id2, only-pod-running-in-this-namespace: app-id2
namespace: app-id3, only-pod-running-in-this-namespace: app-id3
namespace: app-id4, only-pod-running-in-this-namespace: app-id4
The list goes on and on.
I am trying to get the uptime for each app, by checking the pod status. In prometheus, I am doing it like this:
kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"}
this returns a table with all existing apps (each record is 1 app status), and their uptime value (1 if the pod is up, 0 is the pod is down).
Now I want to create another metric, that returns the average for all apps combined. This is, I want only 1 record returned with the average for all apps. So, let's say I have 100 apps, then if 1 went down for 5 min, I want that 5 min window to show 99 (or 0.99 actually) as a result, instead of 99 apps showing 1, and 1 app showing 0.
I hope that make sense.
This is how I am trying, but it's not working, as it's returning a table with 1 record per app.
avg_over_time(kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"}[5m])
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我的理解是您希望所有应用程序实例的百分比上升?
kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"}
返回1
启动时且0
当关闭时?如果是这样,可能是一个sum(kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"}) / count(kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"})
可以吗?My understanding is you want all your app instances' up percent? is
kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"}
return1
when up and0
when down?If so,may be asum(kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"}) / count(kube_pod_status_ready{condition="true", namespace=~"app-.*", pod=~"app-.*"})
can do?