K8s 集群故障(... did you specify the right host or port?)解决

2024-02-26 16:12:24 浏览数 (2)

写在前面


  • 过年回家整理集群相关的笔记,发现集群不能用了.
  • 简单记录解决办法,其实就是证书过期导致一系列问题,但是提示和之前的不一样。
  • 理解不足小伙伴帮忙指正

不必太纠结于当下,也不必太忧虑未来,当你经历过一些事情的时候,眼前的风景已经和从前不一样了。——村上春树


遇到了什么问题?

本地通过虚机部署一个高可用 k8s 集群,好久没用了,开机命令无法正常执行,提示 vip 对应的 IP 访问 apiservice 对应的端口无法访问成功

代码语言:javascript复制
┌──[root@vms100.liruilongs.github.io]-[~]
└─$kubectl  get nodes
The connection to the server 192.168.26.99:30033 was refused - did you specify the right host or port?

如何排故的?

测试端口确实不通,说明传输层就不通了

代码语言:javascript复制
┌──[root@vms100.liruilongs.github.io]-[~]
└─$</dev/tcp/192.168.26.99/30033
-bash: connect: 拒绝连接
-bash: /dev/tcp/192.168.26.99/30033: 拒绝连接

通过 ip a 命令查看配置的 VIP 是否生效,发现没有生效。说明 当前节点配置 VIP 的 keepalived 有问题

代码语言:javascript复制
┌──[root@vms100.liruilongs.github.io]-[~]
└─$ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:0e:5d:5f brd ff:ff:ff:ff:ff:ff
    inet 192.168.26.100/24 brd 192.168.26.255 scope global ens32
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe0e:5d5f/64 scope link
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:68:f8:90:26 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever

测试网络层,ping 测发现可以通,好奇怪,说明其他的 VIP 节点有可用的

代码语言:javascript复制
┌──[root@vms100.liruilongs.github.io]-[~]
└─$ping 192.168.26.99
PING 192.168.26.99 (192.168.26.99) 56(84) bytes of data.
64 bytes from 192.168.26.99: icmp_seq=1 ttl=64 time=0.784 ms
64 bytes from 192.168.26.99: icmp_seq=2 ttl=64 time=0.411 ms
^C
--- 192.168.26.99 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1012ms
rtt min/avg/max/mdev = 0.411/0.597/0.784/0.188 ms

SSH 进去看一下

代码语言:javascript复制
┌──[root@vms100.liruilongs.github.io]-[~]
└─$ssh root@192.168.26.99
The authenticity of host '192.168.26.99 (192.168.26.99)' can't be established.
ECDSA key fingerprint is SHA256:BmaDR4pX6G1WgStkR7Lcl7Yg4fhP2d8idUBxW3HEzsA.
ECDSA key fingerprint is MD5:2e:49:16:97:30:90:e3:28:b2:43:2d:64:9d:f2:d4:6d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.26.99' (ECDSA) to the list of known hosts.
Last login: Wed Nov 15 11:12:11 2023 from 192.168.26.100

另一个 k8s manster 节点,查看 IP,这个节点的 VIP 正常

代码语言:javascript复制
┌──[root@vms102.liruilongs.github.io]-[~]
└─$ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:eb:fa:00 brd ff:ff:ff:ff:ff:ff
    inet 192.168.26.102/24 brd 192.168.26.255 scope global ens32
       valid_lft forever preferred_lft forever
    inet 192.168.26.99/32 scope global ens32
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:feeb:fa00/64 scope link
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:ed:cf:c0:d1 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever

做网络端口测试,发现通的

代码语言:javascript复制
┌──[root@vms102.liruilongs.github.io]-[~]
└─$</dev/tcp/192.168.26.99/30033

运行 kubectl 客户端命令,确认一下

代码语言:javascript复制
┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubectl  get nodes
Unable to connect to the server: EOF

连接异常,这里我们打印一下接口调用详细信息

代码语言:javascript复制
┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubectl  get nodes -vv
error: invalid argument "v" for "-v, --v" flag: strconv.ParseInt: parsing "v": invalid syntax
See 'kubectl get --help' for usage.

高版本的命令有变化,需要注意一下。

代码语言:javascript复制
┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubectl  get nodes -v=1
I0209 13:52:56.780335   72398 shortcut.go:100] Error loading discovery information: Get "https://192.168.26.99:30033/api?timeout=32s": dial tcp 192.168.26.99:30033: connect: connection refused
The connection to the server 192.168.26.99:30033 was refused - did you specify the right host or port?
┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubectl  get nodes -v=2
I0209 13:53:16.963102   72533 shortcut.go:100] Error loading discovery information: Get "https://192.168.26.99:30033/api?timeout=32s": dial tcp 192.168.26.99:30033: connect: connection refused
The connection to the server 192.168.26.99:30033 was refused - did you specify the right host or port?

报和之前同样的错,说明所有的节点都有问题,不是特点的某个节点问题,通过容器管理工具查看一下 高可用组件是否正常

代码语言:javascript复制
┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker ps | grep keep
f2a9b9f187a6   0cde578847cc                                        "/container/tool/run"    12 hours ago     Up 12 hours               k8s_keepalived_keepalived-vms102.liruilongs.github.io_kube-system_f0ae51f10833bbd4d70ccb8690f2429c_55
822eec55d6af   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 12 hours ago     Up 12 hours               k8s_POD_keepalived-vms102.liruilongs.github.io_kube-system_f0ae51f10833bbd4d70ccb8690f2429c_55

查看 apiserver 是否正常,执行命令实际上是调用的 kube-apiserver

代码语言:javascript复制
┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker ps |grep api
56807ccad104   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 12 hours ago         Up 12 hours                   k8s_POD_kube-apiserver-vms102.liruilongs.github.io_kube-system_88f80934116e8f989883c8eba6636201_41

果然挂掉了,这里看下最后的日志

代码语言:javascript复制
┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker ps -a | grep api
c9bd413b176f   b09a3dc327be                                        "kube-apiserver --ad…"   2 minutes ago        Exited (1) 2 minutes ago             k8s_kube-apiserver_kube-apiserver-vms102.liruilongs.github.io_kube-system_88f80934116e8f989883c8eba6636201_225
56807ccad104   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 12 hours ago         Up 12 hours                          k8s_POD_kube-apiserver-vms102.liruilongs.github.io_kube-system_88f80934116e8f989883c8eba6636201_41

日志显示加载准入控制器之后直接报错了,没有其他的提示。

代码语言:javascript复制
┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker logs --tail -5 c9bd413b176f
I0209 05:51:43.041043       1 server.go:563] external host was not specified, using 192.168.26.102
I0209 05:51:43.042642       1 server.go:161] Version: v1.25.1
I0209 05:51:43.042693       1 server.go:163] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0209 05:51:43.362808       1 shared_informer.go:255] Waiting for caches to sync for node_authorizer
I0209 05:51:43.363544       1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0209 05:51:43.363560       1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
I0209 05:51:43.364480       1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0209 05:51:43.364499       1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
E0209 05:52:03.366417       1 run.go:74] "command failed" err="context deadline exceeded"

kube-apiserver 需要和 etcd 不断的交互获取集群信息,更新集群信息,所以看一下 etcd

代码语言:javascript复制
┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker ps | grep etcd
43dccee957e0   a8a176a5d5d6                                        "etcd --advertise-cl…"   About a minute ago   Up About a minute             k8s_etcd_etcd-vms102.liruilongs.github.io_kube-system_bb9615ff1be73c1b0c1f420f3da9806a_156
523a83b11288   registry.aliyuncs.com/google_containers/pause:3.8   "/pause"                 12 hours ago         Up 12 hours                   k8s_POD_etcd-vms102.liruilongs.github.io_kube-system_bb9615ff1be73c1b0c1f420f3da9806a_41

通过 etcd 的日志可以看到,证书相关警告,很大原因是证书过期了

代码语言:javascript复制
┌──[root@vms102.liruilongs.github.io]-[~]
└─$docker logs 43dccee957e0 | tail -5
...................
{"level":"warn","ts":"2024-02-09T05:59:17.452Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.26.101:51158","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-02-09T05:59:17.452Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.26.101:51148","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-02-09T05:59:17.553Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.26.101:51166","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-02-09T05:59:17.553Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.26.101:51164","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-02-09T05:59:17.588Z","caller":"etcdhttp/metrics.go:173","msg":"serving /health false; no leader"}
{"level":"warn","ts":"2024-02-09T05:59:17.588Z","caller":"etcdhttp/metrics.go:86","msg":"/health error","output":"{"health":"false","reason":"RAFT NO LEADER"}","status-code":503}

检查证书,发现确实过期了,1 月 26 到期,现在 2 月 8 号

代码语言:javascript复制
┌──[root@vms102.liruilongs.github.io]-[~]
└─$openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text | grep Not
            Not Before: Jan 26 11:27:49 2023 GMT
            Not After : Jan 26 11:30:26 2024 GMT

通过 kubeadm 工具再次检查

代码语言:javascript复制
┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Jan 26, 2024 11:30 UTC   <invalid>       ca                      no
apiserver                  Jan 26, 2024 11:30 UTC   <invalid>       ca                      no
apiserver-etcd-client      Jan 26, 2024 11:30 UTC   <invalid>       etcd-ca                 no
apiserver-kubelet-client   Jan 26, 2024 11:30 UTC   <invalid>       ca                      no
controller-manager.conf    Jan 26, 2024 11:30 UTC   <invalid>       ca                      no
etcd-healthcheck-client    Jan 26, 2024 11:30 UTC   <invalid>       etcd-ca                 no
etcd-peer                  Jan 26, 2024 11:30 UTC   <invalid>       etcd-ca                 no
etcd-server                Jan 26, 2024 11:30 UTC   <invalid>       etcd-ca                 no
front-proxy-client         Jan 26, 2024 11:30 UTC   <invalid>       front-proxy-ca          no
scheduler.conf             Jan 26, 2024 11:30 UTC   <invalid>       ca                      no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Jan 23, 2033 11:27 UTC   8y              no
etcd-ca                 Jan 23, 2033 11:27 UTC   8y              no
front-proxy-ca          Jan 23, 2033 11:27 UTC   8y              no
┌──[root@vms102.liruilongs.github.io]-[~]
└─$

如何解决的?

问题确定,解决就比较简单了,直接更新证书即可,需要注意 当前集群为 高可用 ,3 master 节点,所以所有的 master 节点需要更新。

先更新一个节点的

备份一下

代码语言:javascript复制
┌──[root@vms102.liruilongs.github.io]-[~]
└─$cp -r /etc/kubernetes /etc/kubernetes.20240209.bak

kubeadm certs renew all 命令用于批量的证书续约

代码语言:javascript复制
┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubeadm certs renew all
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.

检查续约是否成功

代码语言:javascript复制
┌──[root@vms102.liruilongs.github.io]-[~]
└─$kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Feb 08, 2025 06:18 UTC   364d            ca                      no
apiserver                  Feb 08, 2025 06:18 UTC   364d            ca                      no
apiserver-etcd-client      Feb 08, 2025 06:18 UTC   364d            etcd-ca                 no
apiserver-kubelet-client   Feb 08, 2025 06:18 UTC   364d            ca                      no
controller-manager.conf    Feb 08, 2025 06:18 UTC   364d            ca                      no
etcd-healthcheck-client    Feb 08, 2025 06:18 UTC   364d            etcd-ca                 no
etcd-peer                  Feb 08, 2025 06:18 UTC   364d            etcd-ca                 no
etcd-server                Feb 08, 2025 06:18 UTC   364d            etcd-ca                 no
front-proxy-client         Feb 08, 2025 06:18 UTC   364d            front-proxy-ca          no
scheduler.conf             Feb 08, 2025 06:18 UTC   364d            ca                      no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Jan 23, 2033 11:27 UTC   8y              no
etcd-ca                 Jan 23, 2033 11:27 UTC   8y              no
front-proxy-ca          Jan 23, 2033 11:27 UTC   8y              no
┌──[root@vms102.liruilongs.github.io]-[~]
└─$

没问题之后,通过 ansible 批量操作

下面为清单文件

代码语言:javascript复制
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$cat host.yaml
ansible:
  children:
    ansible_master:
      hosts:
        192.168.26.100:
    ansible_node:
      hosts:
        192.168.26.[101:103]:
        192.168.26.[105:106]:
k8s:
  children:
    k8s_master:
      hosts:
        192.168.26.[100:102]:
    k8s_node:
      hosts:
        192.168.26.103:
        192.168.26.[105:106]:

所有的 master 节点 批量续约,需要把之前的操作的节点排除掉

代码语言:javascript复制
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master  -m shell -a "kubeadm certs renew all" -i host.yaml  --limit !192.168.26.102
ansible k8s_master  -m shell -a "kubeadm certs renew all" -i host.yaml  --limit kubectl get all -A  -o wide | grep tidb-cluster  | awk '{print $2}' | awk -F'/' '{ print "kubectl delete "$1" "$2 " -n tidb-cluster --force" }' | xargs  -n1 -I{} bash -c "{}".168.26.102
usage: ansible [-h] [--version] [-v] [-b] [--become-method BECOME_METHOD]
               [--become-user BECOME_USER] [-K] [-i INVENTORY] [--list-hosts]
               [-l SUBSET] [-P POLL_INTERVAL] [-B SECONDS] [-o] [-t TREE] [-k]
               [--private-key PRIVATE_KEY_FILE] [-u REMOTE_USER]
               [-c CONNECTION] [-T TIMEOUT]
               [--ssh-common-args SSH_COMMON_ARGS]
               [--sftp-extra-args SFTP_EXTRA_ARGS]
               [--scp-extra-args SCP_EXTRA_ARGS]
               [--ssh-extra-args SSH_EXTRA_ARGS] [-C] [--syntax-check] [-D]
               [-e EXTRA_VARS] [--vault-id VAULT_IDS]
               [--ask-vault-pass | --vault-password-file VAULT_PASSWORD_FILES]
               [-f FORKS] [-M MODULE_PATH] [--playbook-dir BASEDIR]
               [-a MODULE_ARGS] [-m MODULE_NAME]
               pattern
ansible: error: unrecognized arguments: get all -A wide
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master  -m shell -a "kubeadm certs renew all" -i host.yaml  --limit "!192.168.26.102"
ansible k8s_master  -m shell -a "kubeadm certs renew all" -i host.yaml  --limit "kubectl get all -A  -o wide | grep tidb-cluster  | awk '{print $2}' | awk -F'/' '{ print "kubectl delete "$1" "$2 " -n tidb-cluster --force" }' | xargs  -n1 -I{} bash -c "{}".168.26.102"
usage: ansible [-h] [--version] [-v] [-b] [--become-method BECOME_METHOD]
               [--become-user BECOME_USER] [-K] [-i INVENTORY] [--list-hosts]
               [-l SUBSET] [-P POLL_INTERVAL] [-B SECONDS] [-o] [-t TREE] [-k]
               [--private-key PRIVATE_KEY_FILE] [-u REMOTE_USER]
               [-c CONNECTION] [-T TIMEOUT]
               [--ssh-common-args SSH_COMMON_ARGS]
               [--sftp-extra-args SFTP_EXTRA_ARGS]
               [--scp-extra-args SCP_EXTRA_ARGS]
               [--ssh-extra-args SSH_EXTRA_ARGS] [-C] [--syntax-check] [-D]
               [-e EXTRA_VARS] [--vault-id VAULT_IDS]
               [--ask-vault-pass | --vault-password-file VAULT_PASSWORD_FILES]
               [-f FORKS] [-M MODULE_PATH] [--playbook-dir BASEDIR]
               [-a MODULE_ARGS] [-m MODULE_NAME]
               pattern
ansible: error: unrecognized arguments: delete    -n tidb-cluster --force }' | xargs  -n1 -I{} bash -c {}.168.26.102

报错了,!192.168.26.102 是一个特殊命令,所以我们添加 引号试试,添加单引号可以正常运行,其他节点续约完成

代码语言:javascript复制
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master  -m shell -a "kubeadm certs renew all" -i host.yaml  --limit '!192.168.26.102'
192.168.26.101 | CHANGED | rc=0 >>
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.
192.168.26.100 | CHANGED | rc=0 >>
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$

这里我们重启 docker ,正常重启 静态pod 就可以,如果当前为生产集群,考虑晚上重启 容器运行时,或者移动 静态Pod 对应的 yaml 文件,默认kubelet 会每个一段时间重新扫描对应的目录的yaml 文件

重启 docker 注意这里的 --forks 1,序列化运行,每次一个节点运行

代码语言:javascript复制
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master  -m shell -a "systemctl restart docker" -i host.yaml  --forks 1
192.168.26.100 | CHANGED | rc=0 >>

192.168.26.101 | CHANGED | rc=0 >>

192.168.26.102 | CHANGED | rc=0 >>

运行 kubectl 命令测试

代码语言:javascript复制
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master  -m shell -a "kubectl get nodes --kubeconfig /etc/kubernetes/admin.conf" -i host.yaml
192.168.26.100 | CHANGED | rc=0 >>
NAME                          STATUS   ROLES           AGE    VERSION
vms100.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms101.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms102.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms103.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms105.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms106.liruilongs.github.io   Ready    <none>          378d   v1.25.1
192.168.26.102 | CHANGED | rc=0 >>
NAME                          STATUS   ROLES           AGE    VERSION
vms100.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms101.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms102.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms103.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms105.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms106.liruilongs.github.io   Ready    <none>          378d   v1.25.1
192.168.26.101 | CHANGED | rc=0 >>
NAME                          STATUS   ROLES           AGE    VERSION
vms100.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms101.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms102.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms103.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms105.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms106.liruilongs.github.io   Ready    <none>          378d   v1.25.1
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$

确实没问题后,拷贝证书到默认加载位置,或者配置环境变量

代码语言:javascript复制
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master  -m copy -a "src=/etc/kubernetes/admin.conf dest=/root/.kube/config" -i host.yaml
192.168.26.101 | CHANGED => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
    },
    "changed": true,
    "checksum": "c58460352ef70350a39a4fc6b01645ed68cf56dc",
    "dest": "/root/.kube/config",
    "gid": 0,
    "group": "root",
    "md5sum": "470ad5691e98e2dd5682186c64cc5d33",
    "mode": "0600",
    "owner": "root",
    "size": 5674,
    "src": "/root/.ansible/tmp/ansible-tmp-1707464341.43-44557-35016830998762/source",
    "state": "file",
    "uid": 0
}
192.168.26.100 | CHANGED => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
    },
    "changed": true,
    "checksum": "c58460352ef70350a39a4fc6b01645ed68cf56dc",
    "dest": "/root/.kube/config",
    "gid": 0,
    "group": "root",
    "md5sum": "470ad5691e98e2dd5682186c64cc5d33",
    "mode": "0600",
    "owner": "root",
    "size": 5674,
    "src": "/root/.ansible/tmp/ansible-tmp-1707464341.41-44555-140261297562614/source",
    "state": "file",
    "uid": 0
}
192.168.26.102 | CHANGED => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
    },
    "changed": true,
    "checksum": "c58460352ef70350a39a4fc6b01645ed68cf56dc",
    "dest": "/root/.kube/config",
    "gid": 0,
    "group": "root",
    "md5sum": "470ad5691e98e2dd5682186c64cc5d33",
    "mode": "0600",
    "owner": "root",
    "size": 5674,
    "src": "/root/.ansible/tmp/ansible-tmp-1707464341.39-44559-184122506106441/source",
    "state": "file",
    "uid": 0
}

在次测试,集群恢复正常

代码语言:javascript复制
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$ansible k8s_master  -m shell -a "kubectl get nodes " -i host.yaml
192.168.26.101 | CHANGED | rc=0 >>
NAME                          STATUS   ROLES           AGE    VERSION
vms100.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms101.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms102.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms103.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms105.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms106.liruilongs.github.io   Ready    <none>          378d   v1.25.1
192.168.26.100 | CHANGED | rc=0 >>
NAME                          STATUS   ROLES           AGE    VERSION
vms100.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms101.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms102.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms103.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms105.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms106.liruilongs.github.io   Ready    <none>          378d   v1.25.1
192.168.26.102 | CHANGED | rc=0 >>
NAME                          STATUS   ROLES           AGE    VERSION
vms100.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms101.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms102.liruilongs.github.io   Ready    control-plane   378d   v1.25.1
vms103.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms105.liruilongs.github.io   Ready    <none>          378d   v1.25.1
vms106.liruilongs.github.io   Ready    <none>          378d   v1.25.1
┌──[root@vms100.liruilongs.github.io]-[~/ansible]
└─$

博文部分内容参考

© 文中涉及参考链接内容版权归原作者所有,如有侵权请告知 :)


https://blog.csdn.net/sanhewuyang/article/details/128436670


© 2018-2024 liruilonger@gmail.com, All rights reserved. 保持署名-非商用-相同方式共享(CC BY-NC-SA 4.0)

0 人点赞