1. 故障现象
在测试环境使用kubeadm部署的集群,在运行了一年之后今天,出现k8s api无法调取的现象,使用kubectl命令获取资源均返回如下报错:
代码语言:javascript复制[root@master35 ~]# kubectl get nodes
Unable to connect to the server: x509: certificate has expired or is not yet valid
一看报错,大概率是证书到期了,经过命令一查证书时间,果然是
代码语言:javascript复制openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text |grep ' Not '
2. 替换apiserver证书
进入master节点
代码语言:javascript复制
cd /etc/kubernetes
# 备份证书和配置
mkdir ./pki_bak
mkdir ./conf_bak
mv pki/apiserver* ./pki_bak/
mv pki/front-proxy-client.* ./pki_bak/
mv ./admin.conf ./conf_bak/
mv ./kubelet.conf ./conf_bak/
mv ./controller-manager.conf ./conf_bak/
mv ./scheduler.conf ./conf_bak/
# 创建证书
kubeadm alpha phase certs apiserver --apiserver-advertise-address ${MASTER_API_SERVER_IP}
kubeadm alpha phase certs apiserver-kubelet-client
kubeadm alpha phase certs front-proxy-client
会发现谷歌被强,命令执行不上,会报错,所以用配置文件来执行命令
kubeadm alpha phase certs apiserver --config /root/yaml/kubeadm-config.yaml
kubeadm alpha phase certs apiserver-kubelet-client --config /root/yaml/kubeadm-config.yaml
kubeadm alpha phase certs front-proxy-client --config /root/yaml/kubeadm-config.yaml
# 生成新配置文件
kubeadm alpha phase kubeconfig all --config /root/yaml/kubeadm-config.yaml
# 将新生成的admin配置文件覆盖掉原本的admin文件
mv $HOME/.kube/config $HOME/.kube/config.old
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
chmod 777 $HOME/.kube/config
完成上方操作后,docker restart重启kube-apiserver,kube-controller,kube-scheduler这3个容器
如果有多台master节点,先仿照上方将证书文件和配置文件进行备份,然后将这一台配置完成的master上的证书和配置scp过去
3. 验证
kubectl命令发现还是无法查看资源,检查apiserver的日志: docker logs
代码语言:javascript复制customresource_discovery_controller.go:156] Shutting down DiscoveryController
available_controller.go:266] Shutting down AvailableConditionController
crdregistration_controller.go:115] Shutting down crd-autoregister controller
4. etcd证书过期处理
看下etcd证书配置文件,发现是8760h
代码语言:javascript复制
cat config.json
{
"signing": {
"default": {
"expiry": "8760h"
},
"profiles": {
"kubernetes": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "8760h"
}
}
}
}
首先备份etcd数据:
代码语言:javascript复制
cd /var/lib
tar -zvcf etcd.tar.gz etcd/
修改ca配置文件,将默认证书签署过期时间修改为10年:
代码语言:javascript复制
[root@master35 etcd]# cat ca-config.json
{
"signing": {
"default": {
"expiry": "87600h"
},
"profiles": {
"kubernetes": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "87600h"
}
}
}
}
生成新证书:
代码语言:javascript复制#删除过期证书
rm -f /etc/etcd/ssl/*
# 创建新证书
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=config.json -profile=kubernetes etcd-csr.json | cfssljson -bare etcd
cp etcd.pem etcd-key.pem ca.pem /etc/etcd/ssl/
#拷贝到其他etcd节点
scp -r /etc/etcd/ssl root@${other_node}:/etc/etcd/
# 重启etcd服务(记住,要3个节点一起重启,不然会hang住)
systemctl restart etcd
etcd替换成功后,再重启kube-apiserver,kube-controller,kube-scheduler这3个容器