背景:
一台master磁盘爆了导致k8s服务故障,重启之后死活kubelet起不来,于是老哥就想把它给reset掉重新join,接着出现如下报错提示是说etcd集群健康检查未通过:
error execution phase check-etcd: error syncing endpoints with etc: dial tcp 172.31.182.152:2379: connect: connection refused
解决方法:
1.在kubeadm-config删除的状态不存在的etcd节点:
代码语言:javascript复制kubectl edit configmaps -n kube-system kubeadm-config
cn-hongkong.i-j6caps6av1mtyxyofmrw:
advertiseAddress: 172.31.182.152
bindPort: 6443
把上边的删掉:
2.因为我是用kubeadm搭建的集群,所有etcd在每个master节点都会以pod的形式存在一个,etcd是在每个控制平面都启动一个实例的,当删除k8s-001节点时,etcd集群未自动删除此节点上的etcd成员,因此需要手动删除。
注意这里首先要进入etcd的pod。
kubectl exec -it etcd-cn-hongkong.i-j6caps6av1mtyxyofmrx sh -n kube-system
代码语言:javascript复制export ETCDCTL_API=3
alias etcdctl='etcdctl --endpoints=https://172.31.182.153:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'
/ # etcdctl member list
ceb6b1f4369e9ecc, started, cn-hongkong.i-j6caps6av1mtyxyofmrx, https://172.31.182.154:2380, https://172.31.182.154:2379
d4322ce19cc3f8da, started, cn-hongkong.i-j6caps6av1mtyxyofmrw, https://172.31.182.152:2380, https://172.31.182.152:2379
d598f7eabefcc101, started, cn-hongkong.i-j6caps6av1mtyxyofmry, https://172.31.182.153:2380, https://172.31.182.153:2379
#删除不存在的节点
/ # etcdctl member remove d4322ce19cc3f8da
Member d4322ce19cc3f8da removed from cluster ed812b9f85d5bcd7
/ # etcdctl member list
ceb6b1f4369e9ecc, started, cn-hongkong.i-j6caps6av1mtyxyofmrx, https://172.31.182.154:2380, https://172.31.182.154:2379
d598f7eabefcc101, started, cn-hongkong.i-j6caps6av1mtyxyofmry, https://172.31.182.153:2380, https://172.31.182.153:2379
/ # etcdctl member list
cd4e1e075b1904b2, started, cn-hongkong.i-j6caps6av1mtyxyofmrw, https://172.31.182.152:2380, https://172.31.182.152:2379
ceb6b1f4369e9ecc, started, cn-hongkong.i-j6caps6av1mtyxyofmrx, https://172.31.182.154:2380, https://172.31.182.154:2379
d598f7eabefcc101, started, cn-hongkong.i-j6caps6av1mtyxyofmry, https://172.31.182.153:2380, https://172.31.182.153:2379
/ # exit