PULUMI和GCP正常运行时间检查部署不时失败
我们最近在Pulumi堆栈中添加了GCP Uptimechecks,我们创建了这样的正常运行时间检查,
ucc, err := monitoring.NewUptimeCheckConfig(ctx, name, &monitoring.UptimeCheckConfigArgs{
DisplayName: pulumi.String("uptime check example"),
HttpCheck: &monitoring.UptimeCheckConfigHttpCheckArgs{
Path: pulumi.String(fmt.Sprintf("/%s/status", "github")),
Port: pulumi.Int(443),
RequestMethod: pulumi.String("GET"),
UseSsl: pulumi.Bool(true),
ValidateSsl: pulumi.Bool(true),
},
MonitoredResource: &monitoring.UptimeCheckConfigMonitoredResourceArgs{
Labels: pulumi.StringMap{
"host": pulumi.String(targetUrl),
},
Type: pulumi.String("uptime_url"),
},
Period: pulumi.String("60s"),
Timeout: pulumi.String("10s"),
})
然后我决定为此正常运行时间检查添加警报策略
Note :在这里,我们将前创建的正常运行时间检查以前创建了
args := monitoring.AlertPolicyArgs{
DisplayName: pulumi.String(name),
Combiner: pulumi.String("AND"),
Conditions: monitoring.AlertPolicyConditionArray{
monitoring.AlertPolicyConditionArgs{
DisplayName: pulumi.String("Health check alerts for github %s", service.ShortName),
ConditionThreshold: monitoring.AlertPolicyConditionConditionThresholdArgs{
Filter: pulumi.Sprintf("metric.type=\"monitoring.googleapis.com/uptime_check/check_passed\" AND metric.label.check_id=\"%s\" AND resource.type=\"uptime_url\"", uptimeCheck.UptimeCheckId),
Duration: pulumi.String("60s"),
Trigger: monitoring.AlertPolicyConditionConditionThresholdTriggerArgs{
Count: pulumi.IntPtr(1),
},
ThresholdValue: pulumi.Float64Ptr(1),
Comparison: pulumi.String("COMPARISON_LT"),
Aggregations: monitoring.AlertPolicyConditionConditionThresholdAggregationArray{
monitoring.AlertPolicyConditionConditionThresholdAggregationArgs{
AlignmentPeriod: pulumi.String("60s"),
PerSeriesAligner: pulumi.String("ALIGN_COUNT_TRUE"),
},
},
},
},
}
NotificationChannels: "alerts",
此功能,第一个部署,但随后的部署开始失败。
error: deleting urn:pulumi:env::company::gcp:monitoring/uptimeCheckConfig:UptimeCheckConfig::uptime-check-github: 1 error occurred:
Error when reading or editing UptimeCheckConfig: googleapi: Error 400: Request contains an invalid argument.
观察到的行为
在我们的帐户中创建了新的正常运行时间检查是什么,但是GCP输入了某种怪异的状态,在某种情况下,它无法删除以前的正常运行时间检查, 我设法修复堆栈的唯一方法是手动删除旧的正常运行时间检查。
有人经历过吗?
We added recently GCP UptimeChecks to our pulumi stack, we create the uptime check like this
ucc, err := monitoring.NewUptimeCheckConfig(ctx, name, &monitoring.UptimeCheckConfigArgs{
DisplayName: pulumi.String("uptime check example"),
HttpCheck: &monitoring.UptimeCheckConfigHttpCheckArgs{
Path: pulumi.String(fmt.Sprintf("/%s/status", "github")),
Port: pulumi.Int(443),
RequestMethod: pulumi.String("GET"),
UseSsl: pulumi.Bool(true),
ValidateSsl: pulumi.Bool(true),
},
MonitoredResource: &monitoring.UptimeCheckConfigMonitoredResourceArgs{
Labels: pulumi.StringMap{
"host": pulumi.String(targetUrl),
},
Type: pulumi.String("uptime_url"),
},
Period: pulumi.String("60s"),
Timeout: pulumi.String("10s"),
})
Then I decided to add an alert policy for this uptime check
Note: here we forward the uptime check created previously
args := monitoring.AlertPolicyArgs{
DisplayName: pulumi.String(name),
Combiner: pulumi.String("AND"),
Conditions: monitoring.AlertPolicyConditionArray{
monitoring.AlertPolicyConditionArgs{
DisplayName: pulumi.String("Health check alerts for github %s", service.ShortName),
ConditionThreshold: monitoring.AlertPolicyConditionConditionThresholdArgs{
Filter: pulumi.Sprintf("metric.type=\"monitoring.googleapis.com/uptime_check/check_passed\" AND metric.label.check_id=\"%s\" AND resource.type=\"uptime_url\"", uptimeCheck.UptimeCheckId),
Duration: pulumi.String("60s"),
Trigger: monitoring.AlertPolicyConditionConditionThresholdTriggerArgs{
Count: pulumi.IntPtr(1),
},
ThresholdValue: pulumi.Float64Ptr(1),
Comparison: pulumi.String("COMPARISON_LT"),
Aggregations: monitoring.AlertPolicyConditionConditionThresholdAggregationArray{
monitoring.AlertPolicyConditionConditionThresholdAggregationArgs{
AlignmentPeriod: pulumi.String("60s"),
PerSeriesAligner: pulumi.String("ALIGN_COUNT_TRUE"),
},
},
},
},
}
NotificationChannels: "alerts",
This worked fine in the first deployment, but the subsequent ones started to fail.
error: deleting urn:pulumi:env::company::gcp:monitoring/uptimeCheckConfig:UptimeCheckConfig::uptime-check-github: 1 error occurred:
Error when reading or editing UptimeCheckConfig: googleapi: Error 400: Request contains an invalid argument.
Observed behavior
What a noticed is the new uptime checks got created in our account, but GCP entered in some weird state where it could not delete the previous uptime check,
the only way I managed to fix the stack was by deleting the old uptime checks manually.
Anyone experienced that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您尚未指定projectId in MenualedResource.Labels
正常运行检查监控资源需要该类型期望的所有标签
您已经用完了UPUNTIME_URL,所以...
https://cloud.google.com/monitoring/monitoring/monitoring/api/resources#tag_uptime_url
You haven't specified a projectId in in monitoredResource.labels
An uptime check monitored resources requires all labels that the type expects
You have used uptime_url so...
https://cloud.google.com/monitoring/api/resources#tag_uptime_url
只是经历了同样的事情。但是,我不需要删除,我也可以稍微修改并保存支票。之后很好。
Just experienced the same. However, I did not need to delete, I could also just slightly modify and save the checks. Afterwards it was fine.