PULUMI和GCP正常运行时间检查部署不时失败

发布于 2025-01-27 06:38:10 字数 2906 浏览 5 评论 0原文

我们最近在Pulumi堆栈中添加了GCP Uptimechecks,我们创建了这样的正常运行时间检查,

ucc, err := monitoring.NewUptimeCheckConfig(ctx, name, &monitoring.UptimeCheckConfigArgs{
        DisplayName: pulumi.String("uptime check example"),
        HttpCheck: &monitoring.UptimeCheckConfigHttpCheckArgs{
            Path:          pulumi.String(fmt.Sprintf("/%s/status", "github")),
            Port:          pulumi.Int(443),
            RequestMethod: pulumi.String("GET"),
            UseSsl:        pulumi.Bool(true),
            ValidateSsl:   pulumi.Bool(true),
        },
        MonitoredResource: &monitoring.UptimeCheckConfigMonitoredResourceArgs{
            Labels: pulumi.StringMap{
                "host": pulumi.String(targetUrl),
            },
            Type: pulumi.String("uptime_url"),
        },
        Period:  pulumi.String("60s"),
        Timeout: pulumi.String("10s"),
    })

然后我决定为此正常运行时间检查添加警报策略

Note :在这里,我们将前创建的正常运行时间检查以前创建了

args := monitoring.AlertPolicyArgs{
        DisplayName: pulumi.String(name),
        Combiner:    pulumi.String("AND"),
        Conditions: monitoring.AlertPolicyConditionArray{
            monitoring.AlertPolicyConditionArgs{
                DisplayName: pulumi.String("Health check alerts for github %s", service.ShortName),
                ConditionThreshold: monitoring.AlertPolicyConditionConditionThresholdArgs{
                    Filter:   pulumi.Sprintf("metric.type=\"monitoring.googleapis.com/uptime_check/check_passed\" AND metric.label.check_id=\"%s\" AND resource.type=\"uptime_url\"", uptimeCheck.UptimeCheckId),
                    Duration: pulumi.String("60s"),
                    Trigger: monitoring.AlertPolicyConditionConditionThresholdTriggerArgs{
                        Count: pulumi.IntPtr(1),
                    },
                    ThresholdValue: pulumi.Float64Ptr(1),
                    Comparison:     pulumi.String("COMPARISON_LT"),
                    Aggregations: monitoring.AlertPolicyConditionConditionThresholdAggregationArray{
                        monitoring.AlertPolicyConditionConditionThresholdAggregationArgs{
                            AlignmentPeriod:  pulumi.String("60s"),
                            PerSeriesAligner: pulumi.String("ALIGN_COUNT_TRUE"),
                        },
                    },
                },
            },
        }
        NotificationChannels: "alerts", 

此功能,第一个部署,但随后的部署开始失败。

error: deleting urn:pulumi:env::company::gcp:monitoring/uptimeCheckConfig:UptimeCheckConfig::uptime-check-github: 1 error occurred:
Error when reading or editing UptimeCheckConfig: googleapi: Error 400: Request contains an invalid argument.

观察到的行为

在我们的帐户中创建了新的正常运行时间检查是什么,但是GCP输入了某种怪异的状态,在某种情况下,它无法删除以前的正常运行时间检查, 我设法修复堆栈的唯一方法是手动删除旧的正常运行时间检查。

有人经历过吗?

We added recently GCP UptimeChecks to our pulumi stack, we create the uptime check like this

ucc, err := monitoring.NewUptimeCheckConfig(ctx, name, &monitoring.UptimeCheckConfigArgs{
        DisplayName: pulumi.String("uptime check example"),
        HttpCheck: &monitoring.UptimeCheckConfigHttpCheckArgs{
            Path:          pulumi.String(fmt.Sprintf("/%s/status", "github")),
            Port:          pulumi.Int(443),
            RequestMethod: pulumi.String("GET"),
            UseSsl:        pulumi.Bool(true),
            ValidateSsl:   pulumi.Bool(true),
        },
        MonitoredResource: &monitoring.UptimeCheckConfigMonitoredResourceArgs{
            Labels: pulumi.StringMap{
                "host": pulumi.String(targetUrl),
            },
            Type: pulumi.String("uptime_url"),
        },
        Period:  pulumi.String("60s"),
        Timeout: pulumi.String("10s"),
    })

Then I decided to add an alert policy for this uptime check

Note: here we forward the uptime check created previously

args := monitoring.AlertPolicyArgs{
        DisplayName: pulumi.String(name),
        Combiner:    pulumi.String("AND"),
        Conditions: monitoring.AlertPolicyConditionArray{
            monitoring.AlertPolicyConditionArgs{
                DisplayName: pulumi.String("Health check alerts for github %s", service.ShortName),
                ConditionThreshold: monitoring.AlertPolicyConditionConditionThresholdArgs{
                    Filter:   pulumi.Sprintf("metric.type=\"monitoring.googleapis.com/uptime_check/check_passed\" AND metric.label.check_id=\"%s\" AND resource.type=\"uptime_url\"", uptimeCheck.UptimeCheckId),
                    Duration: pulumi.String("60s"),
                    Trigger: monitoring.AlertPolicyConditionConditionThresholdTriggerArgs{
                        Count: pulumi.IntPtr(1),
                    },
                    ThresholdValue: pulumi.Float64Ptr(1),
                    Comparison:     pulumi.String("COMPARISON_LT"),
                    Aggregations: monitoring.AlertPolicyConditionConditionThresholdAggregationArray{
                        monitoring.AlertPolicyConditionConditionThresholdAggregationArgs{
                            AlignmentPeriod:  pulumi.String("60s"),
                            PerSeriesAligner: pulumi.String("ALIGN_COUNT_TRUE"),
                        },
                    },
                },
            },
        }
        NotificationChannels: "alerts", 

This worked fine in the first deployment, but the subsequent ones started to fail.

error: deleting urn:pulumi:env::company::gcp:monitoring/uptimeCheckConfig:UptimeCheckConfig::uptime-check-github: 1 error occurred:
Error when reading or editing UptimeCheckConfig: googleapi: Error 400: Request contains an invalid argument.

Observed behavior

What a noticed is the new uptime checks got created in our account, but GCP entered in some weird state where it could not delete the previous uptime check,
the only way I managed to fix the stack was by deleting the old uptime checks manually.

Anyone experienced that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

诠释孤独 2025-02-03 06:38:11

您尚未指定projectId in MenualedResource.Labels
正常运行检查监控资源需要该类型期望的所有标签
您已经用完了UPUNTIME_URL,所以...
https://cloud.google.com/monitoring/monitoring/monitoring/api/resources#tag_uptime_url

You haven't specified a projectId in in monitoredResource.labels
An uptime check monitored resources requires all labels that the type expects
You have used uptime_url so...
https://cloud.google.com/monitoring/api/resources#tag_uptime_url

瘫痪情歌 2025-02-03 06:38:10

只是经历了同样的事情。但是,我不需要删除,我也可以稍微修改并保存支票。之后很好。

Just experienced the same. However, I did not need to delete, I could also just slightly modify and save the checks. Afterwards it was fine.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文