Rancher UI 如何对集群进行备份和恢复以及灾难恢复？

发布于 2022-09-11 21:58:04 字数 4770 浏览 34 评论 0

Rancher UI 版本： v2.2.5
Rancher UI可以通过界面对集群进行备份和恢复，这一步操作没有问题。

如何使用一个集群的备份文件恢复到一个全新的集群当中。相当于灾难恢复了。
请教各位大神如何实现灾难恢复？如果Rancher UI 无法实现，有没有其他的灾难恢复方案？
例如RKE,KUBECTL？

rke目前官网提供的备份和恢复的文档说明。但是集群是通过 RANCHER UI 创建的，无法提供cluster.yml文件。所以没法使用rke命令恢复；
我尝试创建了一个cluster.yml文件，文件内容只有nodes的配置信息。这个时候通过rke命令进行回复会提示：

CA Certificate or Key is empty

这个是我的操作步骤:

使用rancher ui 新增一个 dev 集群。并添加一个node 角色etcd、Control、worker
使用rancher ui 部署一个nginx服务。
使用rancher ui 备份当前集群。并拿到/opt/rke/etcd-snapshots目录下的zip备份文件
删除dev集群，清理dev集群的node节点。
再次使用rancher ui 创建 dev 集群。
重新把node节点添加到dev集群。
复制备份的zip文件到/opt/rke/etcd-snapshots中，并解压备份文件，从解压的目录backup中复制文件到/opt/rke/etcd-snapshots中，并改名backup

新增一个cluster.yml文件，用于rke命令，内容如下：

nodes:
- address: 192.168.1.12
 user: root
 role: [controlplane,worker,etcd]
 port: 722

使用命令进行恢复
rke etcd snapshot-restore --name backup
发生错误:
CA Certificate or Key is empty

执行日志:

INFO[0000] Restoring etcd snapshot backup               
INFO[0000] Successfully Deployed state file at [./cluster.rkestate] 
INFO[0000] [dialer] Setup tunnel for host [172.31.177.174] 
INFO[0007] [etcd] starting backup server on host [172.31.177.174] 
INFO[0007] [etcd] Successfully started [etcd-Serve-backup] container on host [172.31.177.174] 
INFO[0013] [remove/etcd-Serve-backup] Successfully removed container on host [172.31.177.174] 
INFO[0013] [etcd] Checking if all snapshots are identical 
INFO[0014] [etcd] Successfully started [etcd-checksum-checker] container on host [172.31.177.174] 
INFO[0014] Waiting for [etcd-checksum-checker] container to exit on host [172.31.177.174] 
INFO[0014] Container [etcd-checksum-checker] is still running on host [172.31.177.174] 
INFO[0015] Waiting for [etcd-checksum-checker] container to exit on host [172.31.177.174] 
INFO[0015] [etcd] Checksum of etcd snapshot on host [172.31.177.174] is [f57212dc433cda1ba45f897cf322b144] 
INFO[0015] Cleaning old kubernetes cluster              
INFO[0015] [worker] Tearing down Worker Plane..         
INFO[0015] [remove/kubelet] Successfully removed container on host [172.31.177.174] 
INFO[0015] [remove/kube-proxy] Successfully removed container on host [172.31.177.174] 
INFO[0015] [remove/service-sidekick] Successfully removed container on host [172.31.177.174] 
INFO[0015] [worker] Successfully tore down Worker Plane.. 
INFO[0015] [controlplane] Tearing down the Controller Plane.. 
INFO[0016] [remove/kube-apiserver] Successfully removed container on host [172.31.177.174] 
INFO[0016] [remove/kube-controller-manager] Successfully removed container on host [172.31.177.174] 
INFO[0016] [remove/kube-scheduler] Successfully removed container on host [172.31.177.174] 
INFO[0016] [controlplane] Successfully tore down Controller Plane.. 
INFO[0016] [etcd] Tearing down etcd plane..             
INFO[0016] [remove/etcd] Successfully removed container on host [172.31.177.174] 
INFO[0016] [etcd] Successfully tore down etcd plane..   
INFO[0016] [hosts] Cleaning up host [172.31.177.174]    
INFO[0016] [hosts] Cleaning up host [172.31.177.174]    
INFO[0016] [hosts] Running cleaner container on host [172.31.177.174] 
INFO[0017] [kube-cleaner] Successfully started [kube-cleaner] container on host [172.31.177.174] 
INFO[0017] Waiting for [kube-cleaner] container to exit on host [172.31.177.174] 
INFO[0017] Container [kube-cleaner] is still running on host [172.31.177.174] 
INFO[0018] Waiting for [kube-cleaner] container to exit on host [172.31.177.174] 
INFO[0018] [hosts] Removing cleaner container on host [172.31.177.174] 
INFO[0018] [hosts] Removing dead container logs on host [172.31.177.174] 
INFO[0019] [cleanup] Successfully started [rke-log-cleaner] container on host [172.31.177.174] 
INFO[0019] [remove/rke-log-cleaner] Successfully removed container on host [172.31.177.174] 
INFO[0019] [hosts] Successfully cleaned up host [172.31.177.174] 
INFO[0019] [etcd] Restoring [backup] snapshot on etcd host [172.31.177.174] 
INFO[0020] [etcd] Successfully started [etcd-restore] container on host [172.31.177.174] 
INFO[0020] Waiting for [etcd-restore] container to exit on host [172.31.177.174] 
INFO[0020] Container [etcd-restore] is still running on host [172.31.177.174] 
INFO[0021] Waiting for [etcd-restore] container to exit on host [172.31.177.174] 
INFO[0021] Initiating Kubernetes cluster                
INFO[0021] [certificates] Generating Kubernetes API server aggregation layer requestheader client CA certificates 
INFO[0021] [certificates] Generating admin certificates and kubeconfig 
FATA[0021] CA Certificate or Key is empty

分享到QQ

分享到微博