Kubelet 证书已过期,但工作节点工作正常,当我们看到该问题时
在我的 v1.23.1
测试集群中,我看到工作节点证书不久前已过期。但工作节点仍在承担工作负载并处于“就绪”状态。
当我们看到证书过期的问题时,如何使用该证书?
# curl -v https://localhost:10250 -k 2>&1 |grep 'expire date'
* expire date: Oct 4 18:02:14 2021 GMT
# openssl x509 -text -noout -in /var/lib/kubelet/pki/kubelet.crt |grep -A2 'Validity'
Validity
Not Before: Oct 4 18:02:14 2020 GMT
Not After : Oct 4 18:02:14 2021 GMT
更新1: 集群在 CentOS Stream 8 操作系统上永久运行,使用 kubeadm 工具构建。我能够在所有工作节点上安排工作负载。创建了 nginx 部署并将其扩展为 50 个 Pod,我可以在所有工作节点上看到 nginx POD。
我也可以毫无问题地重新启动工作节点。
更新 2:
kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0303 11:17:18.261639 698383 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /run/systemd/resolve/resolv.conf
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Jan 16, 2023 16:15 UTC 318d ca no
apiserver Jan 16, 2023 16:15 UTC 318d ca no
apiserver-kubelet-client Jan 16, 2023 16:15 UTC 318d ca no
controller-manager.conf Jan 16, 2023 16:15 UTC 318d ca no
front-proxy-client Jan 16, 2023 16:15 UTC 318d front-proxy-ca no
scheduler.conf Jan 16, 2023 16:15 UTC 318d ca no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Oct 02, 2030 18:44 UTC 8y no
front-proxy-ca Oct 02, 2030 18:44 UTC 8y no
谢谢
更新 3
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
server10 Ready control-plane,master 519d v1.23.1
server11 Ready control-plane,master 519d v1.23.1
server12 Ready control-plane,master 519d v1.23.1
server13 Ready <none> 519d v1.23.1
server14 Ready <none> 519d v1.23.1
server15 Ready <none> 516d v1.23.1
server16 Ready <none> 516d v1.23.1
server17 Ready <none> 516d v1.23.1
server18 Ready <none> 516d v1.23.1
# kubectl get pods -o wide
nginx-dev-8677c757d4-4k9xp 1/1 Running 0 4d12h 10.203.53.19 server17 <none> <none>
nginx-dev-8677c757d4-6lbc6 1/1 Running 0 4d12h 10.203.89.120 server14 <none> <none>
nginx-dev-8677c757d4-ksckf 1/1 Running 0 4d12h 10.203.124.4 server16 <none> <none>
nginx-dev-8677c757d4-lrz9h 1/1 Running 0 4d12h 10.203.124.41 server16 <none> <none>
nginx-dev-8677c757d4-tllx9 1/1 Running 0 4d12h 10.203.151.70 server11 <none> <none>
# grep client /etc/kubernetes/kubelet.conf
client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
client-key: /var/lib/kubelet/pki/kubelet-client-current.pem
# ls -ltr /var/lib/kubelet/pki
total 16
-rw------- 1 root root 1679 Oct 4 2020 kubelet.key
-rw-r--r-- 1 root root 2258 Oct 4 2020 kubelet.crt
-rw------- 1 root root 1114 Oct 4 2020 kubelet-client-2020-10-04-14-50-21.pem
lrwxrwxrwx 1 root root 59 Jul 6 2021 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2021-07-06-01-44-10.pem
-rw------- 1 root root 1114 Jul 6 2021 kubelet-client-2021-07-06-01-44-10.pem
In my v1.23.1
test cluster I see worker node certificate expired some time ago. but worker node still taking the workload and in Ready
status.
How this certificate is getting used, when we will see the issue with expired certificate?
# curl -v https://localhost:10250 -k 2>&1 |grep 'expire date'
* expire date: Oct 4 18:02:14 2021 GMT
# openssl x509 -text -noout -in /var/lib/kubelet/pki/kubelet.crt |grep -A2 'Validity'
Validity
Not Before: Oct 4 18:02:14 2020 GMT
Not After : Oct 4 18:02:14 2021 GMT
Update 1:
Cluster is running on-perm with CentOS Stream 8
OS , build with kubeadm
tool. I was able to schedule the workload on all the worker nodes. created nginx deployment and scaled it 50 pods, I can see nginx PODs on all the worker nodes.
Also I can reboot the work nodes with-out any issue.
Update 2:
kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0303 11:17:18.261639 698383 utils.go:69] The recommended value for "resolvConf" in "KubeletConfiguration" is: /run/systemd/resolve/resolv.conf; the provided value is: /run/systemd/resolve/resolv.conf
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Jan 16, 2023 16:15 UTC 318d ca no
apiserver Jan 16, 2023 16:15 UTC 318d ca no
apiserver-kubelet-client Jan 16, 2023 16:15 UTC 318d ca no
controller-manager.conf Jan 16, 2023 16:15 UTC 318d ca no
front-proxy-client Jan 16, 2023 16:15 UTC 318d front-proxy-ca no
scheduler.conf Jan 16, 2023 16:15 UTC 318d ca no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Oct 02, 2030 18:44 UTC 8y no
front-proxy-ca Oct 02, 2030 18:44 UTC 8y no
Thanks
Update 3
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
server10 Ready control-plane,master 519d v1.23.1
server11 Ready control-plane,master 519d v1.23.1
server12 Ready control-plane,master 519d v1.23.1
server13 Ready <none> 519d v1.23.1
server14 Ready <none> 519d v1.23.1
server15 Ready <none> 516d v1.23.1
server16 Ready <none> 516d v1.23.1
server17 Ready <none> 516d v1.23.1
server18 Ready <none> 516d v1.23.1
# kubectl get pods -o wide
nginx-dev-8677c757d4-4k9xp 1/1 Running 0 4d12h 10.203.53.19 server17 <none> <none>
nginx-dev-8677c757d4-6lbc6 1/1 Running 0 4d12h 10.203.89.120 server14 <none> <none>
nginx-dev-8677c757d4-ksckf 1/1 Running 0 4d12h 10.203.124.4 server16 <none> <none>
nginx-dev-8677c757d4-lrz9h 1/1 Running 0 4d12h 10.203.124.41 server16 <none> <none>
nginx-dev-8677c757d4-tllx9 1/1 Running 0 4d12h 10.203.151.70 server11 <none> <none>
# grep client /etc/kubernetes/kubelet.conf
client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
client-key: /var/lib/kubelet/pki/kubelet-client-current.pem
# ls -ltr /var/lib/kubelet/pki
total 16
-rw------- 1 root root 1679 Oct 4 2020 kubelet.key
-rw-r--r-- 1 root root 2258 Oct 4 2020 kubelet.crt
-rw------- 1 root root 1114 Oct 4 2020 kubelet-client-2020-10-04-14-50-21.pem
lrwxrwxrwx 1 root root 59 Jul 6 2021 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2021-07-06-01-44-10.pem
-rw------- 1 root root 1114 Jul 6 2021 kubelet-client-2021-07-06-01-44-10.pem
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这些 kubelet 证书称为 kubelet-serving 证书。当 Kubelet 充当“服务器”而不是“客户端”时使用它们。
例如,kubelet 向指标服务器提供指标。因此,当指标服务器启用使用 secure-tls 时,如果这些证书过期,指标服务器将无法与 Kubelet 建立正确的连接来获取指标。如果您使用 K8s Dashboard,Dashboard 将无法在页面中显示 CPU 和内存消耗情况。这时您就会看到这些过期证书的问题。
参考: https:// /kubernetes.io/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/#client-and-serving-certificates
这些证书不会到期时自动轮换。它们也无法通过“kubeadm 证书更新”进行轮换。要续订这些证书,您需要在集群配置中添加“serverTLSBootstrap:true”。这样,当服务证书过期时,kubelet 将向 K8s 集群发送 CSR 请求,从集群中,您可以使用“kubectl 证书批准”来更新它们。
Those kubelet certificates is called kubelet-serving certificates. They are used when Kubelet acts as a "server" instead of a "client".
For example, kubelet provides metrics to metrics server. So when metrics-server was enabled to use secure-tls, and in case that those certificate are expired, the metrics-server would not have a proper connection to Kubelet to get the metrics. In case you are using K8s Dashboard, the Dashboard will not able to show CPU and memory consumption in the page. That's the time when you see the issue from those expired certificates.
Reference: https://kubernetes.io/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/#client-and-serving-certificates
Those certificate will not auto-rotate when expiring. They are not also able to be rotated with "kubeadm certificate renew". To renew those certificate, you will need to add "serverTLSBootstrap: true" in your cluster config. With this, when the serving certificate expired, kubelet will send a CSR request to K8s cluster, from the cluster, you can use "kubectl certificate approve" to renew them.