GCP根据百分比提醒政策

发布于 2025-02-03 06:43:37 字数 383 浏览 2 评论 0原文

我正在尝试在GCP中为kubernetes cluster中托管的应用程序创建一些警报策略。我们有一个服务流量服务的云负载平衡器，我可以看到2xx，5xx等的HTTP状态代码

等比这样的绝对值（（（numberOffailures/total） * 100），如果我的错误百分比超过50％，则触发警报。

我在Google文档上找不到任何东西。它只是告诉您使用counter，就像使用绝对值一样。我正在寻找类似故障率在15分钟的滚动窗口中超过50％，然后触发警报。

在gcp中，这甚至可以在本地做到这一点吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

青芜 2025-02-10 06:43:37

是的，我认为这是有可能的， mql 。我最近创建了类似于您的用例的东西。

fetch api
    | metric 'serviceruntime.googleapis.com/api/request_count'
    | filter
        (resource.service == 'my-service.com')
    | group_by 10m, [value_request_count_aggregate: aggregate(value.request_count)]
    | every 10m
    | { group_by [metric.response_code_class],
        [response_code_count_aggregate: aggregate(value_request_count_aggregate)]
    | filter (metric.response_code_class = '5xx')
        ; group_by [],
    [value_request_count_aggregate_aggregate:
        aggregate(value_request_count_aggregate)] }
    | join
    | value [response_code_ratio: val(0) / val(1)]
    | condition gt(val(), 0.1)

在此示例中，我正在使用请求计数my-service.com。我在过去的10分钟内汇总了请求计数，并使用响应代码5xx进行响应。此外，我正在同一时期汇总请求计数，但所有响应代码。然后，在最后两行中，我正在计算所有响应代码的数量的5xx状态代码的比例。最后，我创建一个布尔值，当比率高于0.1时，我可以用来触发警报。

我希望这使您对如何根据百分比创建自己的警报策略有一个粗略的了解。

Yes, I think this is possible with MQL. I have recently created something similar to your use case.

fetch api
    | metric 'serviceruntime.googleapis.com/api/request_count'
    | filter
        (resource.service == 'my-service.com')
    | group_by 10m, [value_request_count_aggregate: aggregate(value.request_count)]
    | every 10m
    | { group_by [metric.response_code_class],
        [response_code_count_aggregate: aggregate(value_request_count_aggregate)]
    | filter (metric.response_code_class = '5xx')
        ; group_by [],
    [value_request_count_aggregate_aggregate:
        aggregate(value_request_count_aggregate)] }
    | join
    | value [response_code_ratio: val(0) / val(1)]
    | condition gt(val(), 0.1)

In this example, I am using the request count for a service my-service.com. I am aggregating the request count over the last 10 minutes and responses with response code 5xx. Additionally, I am aggregating the request count over the same time period, but all response codes. Then in the last two lines, I am computing the ratio of the number of 5xx status codes with the number of all response codes. Finally, I create a boolean value that is true when the ratio is above 0.1 and that I can use to trigger an alert.

I hope this gives you a rough idea of how you can create your own alerting policy based on percentages.

回复收藏 0 原文

~没有更多了~