GCP根据百分比提醒政策

发布于 2025-02-03 06:43:37 字数 383 浏览 2 评论 0原文

我正在尝试在GCP中为kubernetes cluster中托管的应用程序创建一些警报策略。 我们有一个服务流量服务的云负载平衡器,我可以看到2xx5xx等的HTTP状态代码

等比这样的绝对值(((numberOffailures/total) * 100),如果我的错误百分比超过50%,则触发警报。

我在Google文档上找不到任何东西。它只是告诉您使用counter,就像使用绝对值一样。我正在寻找类似故障率在15分钟的滚动窗口中超过50%,然后触发警报。

gcp中,这甚至可以在本地做到这一点吗?

I am trying to create some alerting policies in GCP for my application hosted in Kubernetes cluster.
We have a Cloud load balancer serving the traffic and I can see the HTTP status codes like 2XX, 5XX etc.

I need to create some alerting policies based on the error percentage rather than the absolute value like ((NumberOfFailures/Total) * 100) so that if my error percentage goes above say 50% then trigger an alert.

I couldn't find anything on the google documentation. It just tells you to use counter which is like using an absolute value. I am looking for something like if the failure rate goes beyond 50% in a rolling window of 15 minutes then trigger the alert.

Is that even possible to do that natively in GCP?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

青芜 2025-02-10 06:43:37

是的,我认为这是有可能的, mql 。我最近创建了类似于您的用例的东西。

fetch api
    | metric 'serviceruntime.googleapis.com/api/request_count'
    | filter
        (resource.service == 'my-service.com')
    | group_by 10m, [value_request_count_aggregate: aggregate(value.request_count)]
    | every 10m
    | { group_by [metric.response_code_class],
        [response_code_count_aggregate: aggregate(value_request_count_aggregate)]
    | filter (metric.response_code_class = '5xx')
        ; group_by [],
    [value_request_count_aggregate_aggregate:
        aggregate(value_request_count_aggregate)] }
    | join
    | value [response_code_ratio: val(0) / val(1)]
    | condition gt(val(), 0.1)

在此示例中,我正在使用请求计数my-service.com。我在过去的10分钟内汇总了请求计数,并使用响应代码5xx进行响应。此外,我正在同一时期汇总请求计数,但所有响应代码。然后,在最后两行中,我正在计算所有响应代码的数量的5xx状态代码的比例。最后,我创建一个布尔值,当比率高于0.1时,我可以用来触发警报。

我希望这使您对如何根据百分比创建自己的警报策略有一个粗略的了解。

Yes, I think this is possible with MQL. I have recently created something similar to your use case.

fetch api
    | metric 'serviceruntime.googleapis.com/api/request_count'
    | filter
        (resource.service == 'my-service.com')
    | group_by 10m, [value_request_count_aggregate: aggregate(value.request_count)]
    | every 10m
    | { group_by [metric.response_code_class],
        [response_code_count_aggregate: aggregate(value_request_count_aggregate)]
    | filter (metric.response_code_class = '5xx')
        ; group_by [],
    [value_request_count_aggregate_aggregate:
        aggregate(value_request_count_aggregate)] }
    | join
    | value [response_code_ratio: val(0) / val(1)]
    | condition gt(val(), 0.1)

In this example, I am using the request count for a service my-service.com. I am aggregating the request count over the last 10 minutes and responses with response code 5xx. Additionally, I am aggregating the request count over the same time period, but all response codes. Then in the last two lines, I am computing the ratio of the number of 5xx status codes with the number of all response codes. Finally, I create a boolean value that is true when the ratio is above 0.1 and that I can use to trigger an alert.

I hope this gives you a rough idea of how you can create your own alerting policy based on percentages.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文