如何将Argo CD与DataDog集成来查询自动升级（B/G）的部署资源状态？

发布于 2025-01-10 17:22:33 字数 1522 浏览 6 评论 0原文

我正在尝试将Argo与DataDog集成来查询指标并根据指标值来评估部署以自动升级B/G升级。就我而言，问题是 Argo 无法评估通过分析模板传递的 DataDog 查询...

Kubernetes 版本：v1.20 (EKS)，argo cd 版本：v2.2.2，argo 推出：v1.1.1

分析模板 I'我使用：

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: gateway-uat-pat
spec:
  args:
  - name: service-name
  metrics:
  - name: gateway-uat-pat
    interval: 5m
    successCondition: default(result, 0) <= 10
    failureLimit: 3
    provider:
      datadog:
        interval: 5m
        query: |
          sum:trace.http.request.errors{service:{{args.service-name}}}

我正在创建的秘密对象：

apiVersion: v1
kind: Secret
metadata:
  name: datadog
type: Opaque
stringData:
  address: https://api.datadoghq.com
  api-key: '***'
  app-key: '***'

分析模板和秘密都是在 Argo 外部创建的。然后尝试使用 Argo Rollouts 部署原始应用程序，我在我的 rollout 文件规范中包含了以下策略：

  strategy:
    blueGreen:
      activeService:  gateway
      previewService:  gateway-preview
      postPromotionAnalysis:
        templates:
        - templateName: gateway-uat-pat
        args:
        - name: service-name
          value: gateway-qa

我不断收到的错误：

InvalidSpec：推出“gateway-rollouts”无效：spec.strategy.blueGreen.postPromotionAnalysis.templates：无效值：“gateway-uat-pat”：AnalysisTemplate gateway-uat-pat 具有指标 gateway-uat-pat ，其中无限期地运行。计数值无效：

我深入研究了 Argo CD Analysis 文档，但找不到有关如何使用 Argo 成功评估 DataDog 查询的任何信息。我是否对 AnalysisTemplate 中的参数进行了任何错误配置/有关我做错的地方的任何信息？谢谢

原文

I'm trying to integrate Argo with DataDog to query the metrics and based on the metric value to evaluate the deployment to automatically promote for B/G promotion.
In my case the issue is Argo fails to evaluate the DataDog query that passed via Analysis template...

Kubernetes version: v1.20 (EKS), argo cd version: v2.2.2, argo rollouts: v1.1.1

The Analysis template I'm using:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: gateway-uat-pat
spec:
  args:
  - name: service-name
  metrics:
  - name: gateway-uat-pat
    interval: 5m
    successCondition: default(result, 0) <= 10
    failureLimit: 3
    provider:
      datadog:
        interval: 5m
        query: |
          sum:trace.http.request.errors{service:{{args.service-name}}}

The secret object I'm creating:

apiVersion: v1
kind: Secret
metadata:
  name: datadog
type: Opaque
stringData:
  address: https://api.datadoghq.com
  api-key: '***'
  app-key: '***'

Both Analysis Template and secret are created outside of Argo. And then tried deploying original application using Argo Rollouts and I have included the following strategy in my rollout file spec:

  strategy:
    blueGreen:
      activeService:  gateway
      previewService:  gateway-preview
      postPromotionAnalysis:
        templates:
        - templateName: gateway-uat-pat
        args:
        - name: service-name
          value: gateway-qa

The error I keep getting:

InvalidSpec: The Rollout "gateway-rollouts" is invalid: spec.strategy.blueGreen.postPromotionAnalysis.templates: Invalid value: "gateway-uat-pat": AnalysisTemplate gateway-uat-pat has metric gateway-uat-pat which runs indefinitely. Invalid value for count:

I dig into the Argo CD Analysis docs, but couldn't find any information on how to successfully evaluate the DataDog queries with Argo. Have I done any mis-configurations with args in AnalysisTemplate / any information on where I'm doing wrong? Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

陌伤浅笑 2025-01-17 17:22:33

我找到了解决方案@naveen。应将“计数”属性添加到分析中。如果没有，分析将永远循环并超时。

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: loq-error-rate
spec:
  metrics:
  - name: error-rate
    interval: 30s
    count: 2
    successCondition: result < 1
    failureLimit: 3
    provider:
      datadog:
        interval: 5m
        query: |
          sum:system.cpu.user

I found the solution @naveen. "Count" attribute should be added to the analysis. If not, analysis will loop forever and timeout.

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: loq-error-rate
spec:
  metrics:
  - name: error-rate
    interval: 30s
    count: 2
    successCondition: result < 1
    failureLimit: 3
    provider:
      datadog:
        interval: 5m
        query: |
          sum:system.cpu.user

回复收藏 0 原文

~没有更多了~