可以看到执行了 kubeadm init
之后,貌似一直卡住 kubelet 这个进程的健康检查上,日志如下。
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
I1031 14:44:25.770815 10034 local.go:65] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
I1031 14:44:25.770828 10034 waitcontrolplane.go:89] [wait-control-plane] Waiting for the API server to be healthy
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
进一步,按照提示,尝试看看 kubelet 启动的时候是不是遇到什么问题。
代码语言:javascript复制 Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
详细看看 kubelet 启动的信息。
代码语言:javascript复制[root@VM-23-145-centos ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Sun 2021-10-31 18:53:46 CST; 6s ago
Docs: https://kubernetes.io/docs/
Process: 15696 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 15696 (code=exited, status=1/FAILURE)
Oct 31 18:53:46 VM-23-145-centos systemd[1]: Unit kubelet.service entered failed state.
Oct 31 18:53:46 VM-23-145-centos systemd[1]: kubelet.service failed.
最后发现是下面这个原因,导致启动失败。
代码语言:javascript复制Oct 31 18:54:17 VM-23-145-centos kubelet[16036]: I1031 18:54:17.310044 16036 docker_service.go:264] "Docker Info" dockerInfo=&{ID:AU55:ZTZU:4WX2:CKY5:KCPE:OPOQ:EUEZ:AXZY:GP7R:7CZV:6LBL:V6WA Containers:0 ContainersRunning:0 ContainersPaused:0 ContainersStopped:0 Images:7 Driver:overlay2 DriverStatus:[[Backing Filesystem extfs] [Supports d_type true] [Native Overlay Diff true] [userxattr false]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host ipvlan macvlan null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true KernelMemoryTCP:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true PidsLimit:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:25 OomKillDisable:true NGoroutines:34 SystemTime:2021-10-31T18:54:17.30101368 08:00 LoggingDriver:json-file CgroupDriver:cgroupfs CgroupVersion:1 NEventsListener:0 KernelVersion:3.10.0-1160.31.1.el7.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSVersion:7 OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc000b7c310 NCPU:8 MemTotal:16655941632 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:VM-23-145-centos Labels:[] ExperimentalBuild:false ServerVersion:20.10.10 ClusterStore: ClusterAdvertise: Runtimes:map[io.containerd.runc.v2:{Path:runc Args:[] Shim:<nil>} io.containerd.runtime.v1.linux:{Path:runc Args:[] Shim:<nil>} runc:{Path:runc Args:[] Shim:<nil>}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<nil> Warnings:[]} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:5b46e404f6b9f661a205e28d59c982d3634148f8 Expected:5b46e404f6b9f661a205e28d59c982d3634148f8} RuncCommit:{ID:v1.0.2-0-g52b36a2 Expected:v1.0.2-0-g52b36a2} InitCommit:{ID:de40ad0 Expected:de40ad0} SecurityOptions:[name=seccomp,profile=default] ProductLicense: DefaultAddressPools:[] Warnings:[]}
Oct 31 18:54:17 VM-23-145-centos kubelet[16036]: E1031 18:54:17.310092 16036 server.go:294] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs""
Oct 31 18:54:17 VM-23-145-centos systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Oct 31 18:54:17 VM-23-145-centos systemd[1]: Unit kubelet.service entered failed state.
Oct 31 18:54:17 VM-23-145-centos systemd[1]: kubelet.service failed.
这个解决起来比较容易,修改下面的配置,然后重启 docker。
代码语言:javascript复制[root@VM-23-145-centos ~]# cat /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
然后重新 kubeadm init
一下,就可以看到 kubelet 正常启动了。
[root@VM-23-145-centos ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Sun 2021-10-31 18:58:15 CST; 30s ago
Docs: https://kubernetes.io/docs/
Main PID: 19917 (kubelet)
Tasks: 18
Memory: 37.0M
CGroup: /system.slice/kubelet.service
└─19917 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config...
Oct 31 18:58:26 VM-23-145-centos kubelet[19917]: I1031 18:58:26.059691 19917 cni.go:239] "Unable to update cni config" err="no networks fo...i/net.d"
Oct 31 18:58:26 VM-23-145-centos kubelet[19917]: E1031 18:58:26.209071 19917 kubelet.go:2337] "Container runtime network not ready" networ...ialized"
Oct 31 18:58:31 VM-23-145-centos kubelet[19917]: I1031 18:58:31.060399 19917 cni.go:239] "Unable to update cni config" err="no networks fo...i/net.d"
Oct 31 18:58:31 VM-23-145-centos kubelet[19917]: E1031 18:58:31.217841 19917 kubelet.go:2337] "Container runtime network not ready" networ...ialized"
Oct 31 18:58:36 VM-23-145-centos kubelet[19917]: I1031 18:58:36.060914 19917 cni.go:239] "Unable to update cni config" err="no networks fo...i/net.d"
Oct 31 18:58:36 VM-23-145-centos kubelet[19917]: E1031 18:58:36.225707 19917 kubelet.go:2337] "Container runtime network not ready" networ...ialized"
Oct 31 18:58:41 VM-23-145-centos kubelet[19917]: I1031 18:58:41.062031 19917 cni.go:239] "Unable to update cni config" err="no networks fo...i/net.d"
Oct 31 18:58:41 VM-23-145-centos kubelet[19917]: E1031 18:58:41.232789 19917 kubelet.go:2337] "Container runtime network not ready" networ...ialized"
Oct 31 18:58:46 VM-23-145-centos kubelet[19917]: I1031 18:58:46.063112 19917 cni.go:239] "Unable to update cni config" err="no networks fo...i/net.d"
Oct 31 18:58:46 VM-23-145-centos kubelet[19917]: E1031 18:58:46.240827 19917 kubelet.go:2337] "Container runtime network not ready" networ...ialized"
Hint: Some lines were ellipsized, use -l to show in full.
最后,如果你是 root 用户,执行下面的命令,就可以正常工作了。
代码语言:javascript复制[root@VM-23-145-centos ~]# export KUBECONFIG=/etc/kubernetes/admin.conf
[root@VM-23-145-centos ~]# kubectl get pods
No resources found in default namespace.
[root@VM-23-145-centos ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-78fcd69978-4rs67 0/1 Pending 0 15m
coredns-78fcd69978-tjq2q 0/1 Pending 0 15m
etcd-vm-23-145-centos 1/1 Running 0 15m
kube-apiserver-vm-23-145-centos 1/1 Running 0 15m
kube-controller-manager-vm-23-145-centos 1/1 Running 0 15m
kube-proxy-j9pdn 1/1 Running 0 15m
kube-scheduler-vm-23-145-centos 1/1 Running 0 15m