当ArgoCD应用程序不健康的X分钟不健康时,请使用Prometheus和Grafana
我正在尝试建立一个Grafana面板,该面板显示AgroCD应用程序多长时间不健康,并且是否保持不健康15分钟。到目前为止,我的promql查询是:
sum(count_over_time(argocd_app_info{health_status!="Healthy"}[20m])) by (name)
这使我非常接近。该应用程序不健康的每一分钟每分钟都会增加线图,最多可达20分钟。我可以将限制设置为15分钟以提醒。
问题在于,该应用程序健康的每一分钟都会减少。这意味着该应用程序可以在过去20分钟中处于进步状态15,即使在此期间完成并恢复了几次健康状态。
我希望在应用程序健康后立即降至零,而不是减少每一分钟的应用程序。如何更改promql查询来做到这一点?
I am trying to set up a grafana panel that shows how long an ArgoCD app has been unhealthy and alert if it stays unhealthy for 15 minutes. My PromQL query so far is:
sum(count_over_time(argocd_app_info{health_status!="Healthy"}[20m])) by (name)
This gets me pretty close. The line graph increments every minute that the app is unhealthy, up to a maximum of 20 minutes. I can set a limit at 15 minutes to alert on.
The problem is that it decrements every minute the app is healthy. This means the app can be in a progressing state for 15 out of the past 20 minutes and alert, even if it finished progressing and went back to healthy several times in that period.
Instead of decrementing every minute the app is healthy, I want the line to drop to zero as soon as the app becomes healthy. How do I change the PromQL query to do that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我发现了。似乎您需要在应用程序同步时乘以值为0的向量。这是查询:
查询有点长且令人困惑,但可以工作。
I figured it out. Seems like you need to multiply by a vector that has a value of 0 whenever the app is in sync. Here's the query:
The query is a bit long and confusing, but it works.