无法从自定义指标 API 获取指标:服务器当前无法处理请求

发布于 2025-01-13 00:47:23 字数 1108 浏览 5 评论 0 原文

我正在使用基于 GKE 上的自定义指标的 HPA。

HPA 不工作,它向我显示以下错误日志:

无法从自定义指标 API 获取指标:服务器当前无法处理请求

无法从自定义指标 API 获取指标:当我运行 kubectl get apiservices | 时, grep 自定义 我明白了

v1beta1.custom.metrics.k8s.io services/prometheus-adapter False (FailedDiscoveryCheck) 135d

这是 HPA 规范配置:

spec:
  scaleTargetRef:
    kind: Deployment
    name: api-name
    apiVersion: apps/v1
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Object
      object:
        target:
          kind: Service
          name: api-name
          apiVersion: v1
        metricName: messages_ready_per_consumer
        targetValue: '1'

这是服务的规范配置:

spec:
  ports:
    - name: worker-metrics
      protocol: TCP
      port: 8080
      targetPort: worker-metrics
  selector:
    app.kubernetes.io/instance: api
    app.kubernetes.io/name: api-name
  clusterIP: 10.8.7.9
  clusterIPs:
    - 10.8.7.9
  type: ClusterIP
  sessionAffinity: None
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack

我应该做什么才能使其工作?

I'm using a HPA based on a custom metric on GKE.

The HPA is not working and it's showing me this error log:

unable to fetch metrics from custom metrics API: the server is currently unable to handle the request

When I run kubectl get apiservices | grep custom I get

v1beta1.custom.metrics.k8s.io services/prometheus-adapter False (FailedDiscoveryCheck) 135d

this is the HPA spec config :

spec:
  scaleTargetRef:
    kind: Deployment
    name: api-name
    apiVersion: apps/v1
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Object
      object:
        target:
          kind: Service
          name: api-name
          apiVersion: v1
        metricName: messages_ready_per_consumer
        targetValue: '1'

and this is the service's spec config :

spec:
  ports:
    - name: worker-metrics
      protocol: TCP
      port: 8080
      targetPort: worker-metrics
  selector:
    app.kubernetes.io/instance: api
    app.kubernetes.io/name: api-name
  clusterIP: 10.8.7.9
  clusterIPs:
    - 10.8.7.9
  type: ClusterIP
  sessionAffinity: None
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack

What should I do to make it work ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

断肠人 2025-01-20 00:47:23

首先,确认 Metrics Server POD 正在您的 kube-system 命名空间中运行。此外,您还可以使用以下清单:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: metrics-server
  namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    k8s-app: metrics-server
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      name: metrics-server
      labels:
        k8s-app: metrics-server
    spec:
      serviceAccountName: metrics-server
      volumes:
      # mount in tmp so we can safely use from-scratch images and/or read-only containers
      - name: tmp-dir
        emptyDir: {}
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.1
        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP
        imagePullPolicy: Always
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp

如果是这样,请查看日志并查找任何 stackdriver 适配器的 行。此问题通常是由于 custom-metrics-stackdriver-adapter 问题引起的。它通常会在 metrics-server 命名空间中崩溃。要解决此问题,请使用此 URL,并使用此图像进行部署:

gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.10.1

另一个常见的根本原因是OOM问题。在这种情况下,添加更多内存可以解决问题。要分配更多内存,您可以在配置文件中指定新的内存量,如下例所示:

apiVersion: v1
kind: Pod
metadata:
  name: memory-demo
  namespace: mem-example
spec:
  containers:
  - name: memory-demo-ctr
    image: polinux/stress
    resources:
      limits:
        memory: "200Mi"
      requests:
        memory: "100Mi"
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "150M", "--vm-hang", "1"]

在上面的示例中,Container 的内存请求为 100 MiB,内存限制为 200 MiB。在清单中,“--vm-bytes”、“150M”参数告诉容器尝试分配 150 MiB 内存。您可以访问此 Kubernetes 官方文档来了解有关内存设置的更多参考。

您可以使用以下线程获取更多参考 GKE - 使用自定义的 HPA指标 - 无法获取指标Stackdriver-metadata-agent-cluster-level 被 OOMKilled,和 Custom-metrics-stackdriver-adapter pod 不断崩溃

First of all, confirm that the Metrics Server POD is running in your kube-system namespace. Also, you can use the following manifest:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: metrics-server
  namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    k8s-app: metrics-server
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      name: metrics-server
      labels:
        k8s-app: metrics-server
    spec:
      serviceAccountName: metrics-server
      volumes:
      # mount in tmp so we can safely use from-scratch images and/or read-only containers
      - name: tmp-dir
        emptyDir: {}
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server-amd64:v0.3.1
        command:
        - /metrics-server
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP
        imagePullPolicy: Always
        volumeMounts:
        - name: tmp-dir
          mountPath: /tmp

If so, take a look into the logs and look for any stackdriver adapter’s line. This issue is commonly caused due to a problem with the custom-metrics-stackdriver-adapter. It usually crashes in the metrics-server namespace. To solve that, use the resource from this URL, and for the deployment, use this image:

gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.10.1

Another common root cause of this is an OOM issue. In this case, adding more memory solves the problem. To assign more memory, you can specify the new memory amount in the configuration file, as the following example shows:

apiVersion: v1
kind: Pod
metadata:
  name: memory-demo
  namespace: mem-example
spec:
  containers:
  - name: memory-demo-ctr
    image: polinux/stress
    resources:
      limits:
        memory: "200Mi"
      requests:
        memory: "100Mi"
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "150M", "--vm-hang", "1"]

In the above example, the Container has a memory request of 100 MiB and a memory limit of 200 MiB. In the manifest, the "--vm-bytes", "150M" argument tells the Container to attempt to allocate 150 MiB of memory. You can visit this Kubernetes Official Documentation to have more references about the Memory settings.

You can use the following threads for more reference GKE - HPA using custom metrics - unable to fetch metrics, Stackdriver-metadata-agent-cluster-level gets OOMKilled, and Custom-metrics-stackdriver-adapter pod keeps crashing.

何其悲哀 2025-01-20 00:47:23

在我的 EKS 节点安全组规则中添加此块解决了我的问题:

node_security_group_additional_rules = {
  ...
  ingress_cluster_metricserver = {
    description                   = "Cluster to node 4443 (Metrics Server)"
    protocol                      = "tcp"
    from_port                     = 4443
    to_port                       = 4443
    type                          = "ingress"
    source_cluster_security_group = true 
  }
  ...
}

Adding this block in my EKS nodes security group rules solved the issue for me:

node_security_group_additional_rules = {
  ...
  ingress_cluster_metricserver = {
    description                   = "Cluster to node 4443 (Metrics Server)"
    protocol                      = "tcp"
    from_port                     = 4443
    to_port                       = 4443
    type                          = "ingress"
    source_cluster_security_group = true 
  }
  ...
}
乞讨 2025-01-20 00:47:23

kubectl get pod -l "app.kubernetes.io/instance=api,app.kubernetes.io/name=api-name" 会得到什么?
应该有一个 pod,该服务会引用到该 pod。
如果存在 Pod,请使用 kubectl logs 检查其日志。您可以将 -f 添加到 kubectl messages 命令中,以跟踪日志。

What do you get for kubectl get pod -l "app.kubernetes.io/instance=api,app.kubernetes.io/name=api-name"?
There should be a pod, to which the service reffers.
If there is a pod, check its logs with kubectl logs <pod-name>. you can add -f to kubectl logs command, to follow the logs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文