创建警报和最佳实践
一些背景知识: 我们已经设置了新的 grafana 安装并拥有 AWS Cloudwatch 和 AWS Cloudwatch。添加了 Prometheus 数据源。我导入了一些仪表板并根据我们的需要定制了它们。
其中之一是这个: https://grafana.com/grafana/dashboards/7587
在仪表板上我有显示器设置如下 - probe_success{instance=~"$target", job="$App"}
我们的环境由各种生产、登台和测试服务器组成(它们的主机名表明它们属于哪个环境)
例如: srv01-分期 srv01-生产
我正在尝试创建一个警报来仅监视生产服务器的 HTTP 响应。
我的警报代码如下: probe_success{job = “nameofjob”}
我的问题是,即使在我不想要的临时/测试环境中,这也会对所有失败发出警报。
我不相信我们可以在警报中使用变量 - 或者即使可以,我也无法让它工作。
总而言之: 分段警报的最佳方法是什么,这样我就不会收到关于我们的暂存/测试环境问题的通知?
非常感谢!
A bit of background:
We have set up a new grafana install and have AWS Cloudwatch & Prometheus data sources added. I have imported a few dashboards and customised them as we would like.
One of which is this one:
https://grafana.com/grafana/dashboards/7587
On the dashboard the monitor I have set up is as follows -
probe_success{instance=~"$target", job="$App"}
our environment consists of various production, staging and test servers (their host names indicate which environment they are part of)
For example:
srv01-staging
srv01-production
I’m trying to create an alert to monitor the HTTP response for ONLY the production servers.
My alert code is as below:
probe_success{job = “nameofjob”}
My issue is that this will alert on ALL failures even on our staging/test environments which I do not want.
I don’t believe we can use variables in alerts - or if we can I havn’t been able to get it working.
TLDR:
What is the best way to segment alerts so that I am not notified of issues with our staging/test environments?
Many Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以通过显式设置与生产匹配的标签来仅触发生产指标:
probe_success{job="nameofjob", namespace="product"}
或probe_success{job="nameofjob", server =~"prod-.*"}
(服务器和命名空间是任意标签。命名空间在 kubernetes 环境中很常见)。或者排除您不想要的环境,例如:
probe_success{job="nameofjob", namespace!~"staging|test"}
使用任何可以区分生产环境和其他环境的标签。如果您没有这样的标签,那么您应该添加一个。
一些可以帮助您的资源:
You can just either trigger only production metrics by explicitly set a label matching production:
probe_success{job="nameofjob", namespace="production"}
orprobe_success{job="nameofjob", server=~"prod-.*"}
(server and namespace are arbitraty labels. namespace is common in kubernetes environments).or exclude the environments you don't want such:
probe_success{job="nameofjob", namespace!~"staging|test"}
Use whatever label can make a distinction between production and the others. And if you don't have such label, you should add one then.
Some resources that could help you: