虚拟机创建
在自己的Mac系统里面利用Parallels Desktop
创建3台虚拟机,具体信息如下:
CentOS7-Node1: 10.211.55.7 parallels/centos-testCentOS7-Node2: 10.211.55.8 parallels/centos-testCentOS7-Node3: 10.211.55.9 parallels/centos-test
Master安装
选择CentOS7-Node1
机器作为Master节点。
配置yum
更新yum源:
代码语言:javascript复制[parallels@CentOS7-Node1 yum.repos.d]$ cd /etc/yum.repos.d[parallels@CentOS7-Node1 yum.repos.d]$ sudo touch kubernetes.repo[kubernetes]name=Kubernetesbaseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64enabled=1gpgcheck=0repo_gpgcheck=0
安装Kubernetes环境
评估下来,利用kubeadm来搭建是大家比较推荐的,而且公司的集群也是。所以毫不忧虑就用kubeadm。
代码语言:javascript复制[parallels@CentOS7-Node1 yum.repos.d]$ yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetesLoaded plugins: fastestmirror, langpacksYou need to be root to perform this command.[parallels@CentOS7-Node1 yum.repos.d]$ sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetesLoaded plugins: fastestmirror, langpacksLoading mirror speeds from cached hostfile * base: mirrors.aliyun.com * extras: mirrors.aliyun.com * updates: mirrors.aliyun.comkubernetes | 1.4 kB 00:00:00 kubernetes/primary | 58 kB 00:00:00 kubernetes 421/421Resolving Dependencies--> Running transaction check...... # 省略一堆无意义的日志Dependency Installed: conntrack-tools.x86_64 0:1.4.4-5.el7_7.2 cri-tools.x86_64 0:1.13.0-0 kubernetes-cni.x86_64 0:0.7.5-0 libnetfilter_cthelper.x86_64 0:1.0.0-10.el7_7.1 libnetfilter_cttimeout.x86_64 0:1.0.0-6.el7_7.1 libnetfilter_queue.x86_64 0:1.0.2-2.el7_2 socat.x86_64 0:1.7.3.2-2.el7 Complete!
关于yum的配置与升级
代码语言:javascript复制yum install -y yum-utils device-mapper-persistent-data lvm2yum update
启动docker
启动Docker,加入开启机动项:
代码语言:javascript复制[parallels@CentOS7-Node1 ~]$ sudo systemctl enable docker && systemctl start docker[sudo] password for parallels: Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
启动kubelet
启动kubelet,加入开机启动项:
代码语言:javascript复制sudo systemctl enable kubelet && systemctl start kubelet[parallels@CentOS7-Node1 ~]$ sudo systemctl enable kubelet && systemctl start kubeletCreated symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /usr/lib/systemd/system/kubelet.service.==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===Authentication is required to manage system services or units.Authenticating as: Parallels (parallels)Password: ==== AUTHENTICATION COMPLETE ===
kubeadm config
代码语言:javascript复制[parallels@CentOS7-Node1 Workspace]$ kubeadm config print init-defaultsapiVersion: kubeadm.k8s.io/v1beta2bootstrapTokens:- groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authenticationkind: InitConfigurationlocalAPIEndpoint: advertiseAddress: 1.2.3.4 bindPort: 6443nodeRegistration: criSocket: /var/run/dockershim.sock name: centos7-node1 taints: - effect: NoSchedule key: node-role.kubernetes.io/master---apiServer: timeoutForControlPlane: 4m0sapiVersion: kubeadm.k8s.io/v1beta2certificatesDir: /etc/kubernetes/pkiclusterName: kubernetescontrollerManager: {}dns: type: CoreDNSetcd: local: dataDir: /var/lib/etcdimageRepository: k8s.gcr.iokind: ClusterConfigurationkubernetesVersion: v1.16.0networking: dnsDomain: cluster.local serviceSubnet: 10.96.0.0/12scheduler: {}
代码语言:javascript复制kubeadm config print init-defaults > /home/parallels/Workspace/init.default.yaml
配置Docker
首先要安装好Docker环境,请参考之前的 http://www.cyblogs.com/centos7shang-an-zhuang-docker/
Docker的一些相关命令
代码语言:javascript复制yum install docker-ce-18.09.9-3.el7 # 指定版本为18.09.9-3.el7systemctl status dockersystemctl restart dockersystemctl daemon-reload
下载kubernetes的相关镜像
配置镜像地址,但没什么用。后面还是需要用到国内的镜像:
代码语言:javascript复制echo '{"registry-mirrors":["https://docker.mirrors.ustc.edu.cn"]}' > /etc/docker/daemon.json# 如果提示没有权限,就手动vim添加进去。然后重启docker服务
查看一下kubernetes依赖的镜像名称以及版本
代码语言:javascript复制[parallels@CentOS7-Node1 Workspace]$ kubeadm config images listW1022 13:51:12.550171 19704 version.go:101] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)W1022 13:51:12.550458 19704 version.go:102] falling back to the local client version: v1.16.2k8s.gcr.io/kube-apiserver:v1.16.2k8s.gcr.io/kube-controller-manager:v1.16.2k8s.gcr.io/kube-scheduler:v1.16.2k8s.gcr.io/kube-proxy:v1.16.2k8s.gcr.io/pause:3.1k8s.gcr.io/etcd:3.3.15-0k8s.gcr.io/coredns:1.6.2
如果网络OK,应该直接执行这个命令即可,但实际会报错误。
代码语言:javascript复制[parallels@CentOS7-Node1 Workspace]$ sudo kubeadm config images pull --config=/home/parallels/Workspace/init.default.yaml# 这里由于网络拉取镜像的问题,基本无法操作,只能先去aliyun获取回来后再修改tag的方式,错误如下。[parallels@CentOS7-Node1 Workspace]$ sudo kubeadm config images pull --config=/home/parallels/Workspace/init.default.yaml[sudo] password for parallels: failed to pull image "k8s.gcr.io/kube-apiserver:v1.16.0": output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers), error: exit status 1To see the stack trace of this error execute with --v=5 or higher
获取镜像
通过另外一种方式来获取镜像:
代码语言:javascript复制touch kubeadm.sh#!/bin/bashKUBE_VERSION=v1.16.0KUBE_PAUSE_VERSION=3.1ETCD_VERSION=3.3.15-0CORE_DNS_VERSION=1.6.2GCR_URL=k8s.gcr.ioALIYUN_URL=registry.cn-hangzhou.aliyuncs.com/google_containersimages=( kube-apiserver:${KUBE_VERSION} kube-controller-manager:${KUBE_VERSION} kube-scheduler:${KUBE_VERSION} kube-proxy:${KUBE_VERSION} pause:${KUBE_PAUSE_VERSION} etcd:${ETCD_VERSION} coredns:${CORE_DNS_VERSION})for imageName in ${images[@]} ; do docker pull $ALIYUN_URL/$imageName docker tag $ALIYUN_URL/$imageName $GCR_URL/$imageName docker rmi $ALIYUN_URL/$imageNamedone
拉取镜像
代码语言:javascript复制chmod u x kubeadm.sh # 添加权限sudo ./kubeadm.sh
剩下的就是耐心等待......
查看最终本地的镜像
代码语言:javascript复制[root@CentOS7-Node1 Workspace]# docker imagesREPOSITORY TAG IMAGE ID CREATED SIZEk8s.gcr.io/kube-apiserver v1.16.0 b305571ca60a 4 weeks ago 217MBk8s.gcr.io/kube-proxy v1.16.0 c21b0c7400f9 4 weeks ago 86.1MBk8s.gcr.io/kube-controller-manager v1.16.0 06a629a7e51c 4 weeks ago 163MBk8s.gcr.io/kube-scheduler v1.16.0 301ddc62b80b 4 weeks ago 87.3MBk8s.gcr.io/etcd 3.3.15-0 b2756210eeab 6 weeks ago 247MBk8s.gcr.io/coredns 1.6.2 bf261d157914 2 months ago 44.1MBk8s.gcr.io/pause 3.1 da86e6ba6ca1 22 months ago 742kB
代码语言:javascript复制[parallels@CentOS7-Node1 Workspace]$ sudo kubeadm init --config=init.default.yaml [init] Using Kubernetes version: v1.16.0[preflight] Running pre-flight checks [WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 19.03.4. Latest validated version: 18.09error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR Swap]: running with swap on is not supported. Please disable swap[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`To see the stack trace of this error execute with --v=5 or higher
关闭防火墙
解决掉防火墙的问题,请参阅:http://www.cyblogs.com/centos7cha-kan-he-guan-bi-fang-huo-qiang/
cgroupfs错误
代码语言:javascript复制detected "cgroupfs" as the Docker cgroup driver
新增:/etc/docker/daemon.json
代码语言:javascript复制{ "registry-mirrors": [ "https://registry.docker-cn.com" ], "live-restore": true, "exec-opts": [ "native.cgroupdriver=systemd" # 修改用户 ]}# 重新启动Dockersystemctl restart dockersystemctl status docker
禁止swap
还是发现需要禁止掉swap
Oct 22 16:35:36 CentOS7-Node1 kubelet[1395]: F1022 16:35:36.065168 1395 server.go:271] failed to run Kubelet: running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename Type Size Used Priority /dev/dm-1 partition 2097148 29952 -1]
代码语言:javascript复制swapoff -a#要永久禁掉swap分区,打开如下文件注释掉swap那一行sudo vi /etc/fstab
再次启动kubeadm init
kubeadm init --config=init.default.yaml[init] Using Kubernetes version: v1.16.2...[preflight] Pulling images required for setting up a Kubernetes cluster...[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"...[certs] Using certificateDir folder "/etc/kubernetes/pki"...[kubeconfig] Using kubeconfig folder "/etc/kubernetes"...[kubelet-check] Initial timeout of 40s passed.[kubelet-check] It seems like the kubelet isn't running or healthy.[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.[kubelet-check] It seems like the kubelet isn't running or healthy.[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.[kubelet-check] It seems like the kubelet isn't running or healthy.[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.[kubelet-check] It seems like the kubelet isn't running or healthy.[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.[kubelet-check] It seems like the kubelet isn't running or healthy.[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
出现错误了,变更Docker的版本后,继续执行,还是会报错误。
代码语言:javascript复制 [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
重设kubeadm
这里需要重设kubeadm
了。具体操作如下:
kubeadm resetecho '1' > /proc/sys/net/bridge/bridge-nf-call-iptables echo '1' > /proc/sys/net/ipv4/ip_forward
journalctl查看日志
代码语言:javascript复制journalctl -xefu kubelet
这里还是会报错,因为之前的
代码语言:javascript复制apiVersion: kubeadm.k8s.io/v1beta2kind: ClusterConfigurationimageRepository: k8s.gcr.iokubernetesVersion: v1.16.0networking: dnsDomain: cluster.local serviceSubnet: "10.96.0.0/16"
继续执行init的过程,kubeadm init --config=/home/parallels/Workspace/init.default.yaml
[addons] Applied essential addon: CoreDNS[addons] Applied essential addon: kube-proxyYour Kubernetes control-plane has initialized successfully!To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You should now deploy a pod network to the cluster.Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/Then you can join any number of worker nodes by running the following on each as root:kubeadm join 10.211.55.7:6443 --token imwj34.ksfiwzj5ga80du0r --discovery-token-ca-cert-hash sha256:7ffef85880ed43dd539afa045715f9ad5bef15e904cede96213d6cfd4adb0795
真心不容易,这里一直反反复复执行。只要是images
的版本问题以及init
的过程容易出错。
验证configmap
代码语言:javascript复制[root@CentOS7-Node1 ~]# kubectl get -n kube-system configmap NAME DATA AGEcoredns 1 5m49sextension-apiserver-authentication 6 5m53skube-proxy 2 5m49skubeadm-config 2 5m50skubelet-config-1.16 1 5m50s
安装Node,加入集群
安装跟Master一直的基本环境,包括docker,kubelet,kubeadm等,重复上面的动作。
代码语言:javascript复制scp root@10.211.55.7:/home/parallels/Workspace/init.default.yaml .scp root@10.211.55.7:/home/parallels/Workspace/kubeadm.sh .yum install docker-ce-18.06.3.ce-3.el7
为kubeadm
命令生成配置文件,创建join-config.yaml
,内容如下:
apiVersion: kubeadm.k8s.io/v1beta2kind: JoinConfigurationdiscovery: bootstrapToken: apiServerEndpoint: 10.211.55.7:6443 token: imwj34.ksfiwzj5ga80du0r unsafeSkipCAVerification: true tlsBootstrapToken: imwj34.ksfiwzj5ga80du0r
其中,apiServerEndpoint的值来自于Master的服务器地址,这里就是10.211.55.7
。token
和tlsBootstrapToken
的值就来自于kubeadm init
安装Master
的最后一行提示信息。这里一定要注意yaml文件的格式,否则执行会报错误。
[root@CentOS7-Node2 Workspace]# kubeadm join --config=join-config.yaml[preflight] Running pre-flight checkserror execution phase preflight: [preflight] Some fatal errors occurred: [ERROR Swap]: running with swap on is not supported. Please disable swap[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`To see the stack trace of this error execute with --v=5 or higher[root@CentOS7-Node2 Workspace]# swapoff -a[root@CentOS7-Node2 Workspace]# kubeadm join --config=join-config.yaml[preflight] Running pre-flight checks[preflight] Reading configuration from the cluster...[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.16" ConfigMap in the kube-system namespace[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"[kubelet-start] Activating the kubelet service[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...This node has joined the cluster:* Certificate signing request was sent to apiserver and a response was received.* The Kubelet was informed of the new secure connection details.Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
安装网络插件
去Master机器,执行:
代码语言:javascript复制[root@CentOS7-Node1 Workspace]# kubectl get nodesNAME STATUS ROLES AGE VERSIONcentos7-node1 NotReady master 154m v1.16.2centos7-node2 NotReady <none> 2m49s v1.16.2
这里显示的是NotReady
状态,是因为还没有安装CNI
网络插件。我们选择weave
插件来安装。
[root@CentOS7-Node1 Workspace]# kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d 'n')"serviceaccount/weave-net createdclusterrole.rbac.authorization.k8s.io/weave-net createdclusterrolebinding.rbac.authorization.k8s.io/weave-net createdrole.rbac.authorization.k8s.io/weave-net createdrolebinding.rbac.authorization.k8s.io/weave-net createddaemonset.apps/weave-net created
验证集群是否安装完成
代码语言:javascript复制[root@CentOS7-Node1 Workspace]# kubectl get pods -n kube-systemNAME READY STATUS RESTARTS AGEcoredns-5644d7b6d9-9fr9p 0/1 ContainerCreating 0 172mcoredns-5644d7b6d9-pmpkq 0/1 ContainerCreating 0 172metcd-centos7-node1 1/1 Running 0 171mkube-apiserver-centos7-node1 1/1 Running 0 171mkube-controller-manager-centos7-node1 1/1 Running 0 171mkube-proxy-ccnht 1/1 Running 0 21mkube-proxy-rdq9l 1/1 Running 0 172mkube-scheduler-centos7-node1 1/1 Running 0 171mweave-net-6hw26 2/2 Running 0 8m7sweave-net-qv8vz 2/2 Running 0 8m7s
发现coredns一直处于ContainerCreating的状态。具体的看一下错误信息。
代码语言:javascript复制[root@CentOS7-Node1 Workspace]# kubectl describe pod coredns-5644d7b6d9-9fr9p -n kube-systemName: coredns-5644d7b6d9-9fr9pNamespace: kube-systemPriority: 2000000000Priority Class Name: system-cluster-criticalNode: centos7-node2/10.211.55.8Start Time: Tue, 22 Oct 2019 20:49:47 0800Labels: k8s-app=kube-dns pod-template-hash=5644d7b6d9.... # 此处省略一些Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate. Warning FailedScheduling <unknown> default-scheduler 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate. Normal Scheduled <unknown> default-scheduler Successfully assigned kube-system/coredns-5644d7b6d9-9fr9p to centos7-node2 Warning FailedCreatePodSandBox 2m kubelet, centos7-node2 Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded Normal SandboxChanged 119s kubelet, centos7-node2 Pod sandbox changed, it will be killed and re-created.
这里可以看出一些错误:
代码语言:javascript复制Oct 22 10:50:15 CentOS7-Node1 kubelet[7649]: F1022 10:50:15.170550 7649 server.go:196] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", Oct 22 10:50:15 CentOS7-Node1 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
可以删除掉一个pod的方式让它重新启动:
代码语言:javascript复制[root@CentOS7-Node1 ~]# kubectl delete pod coredns-5644d7b6d9-9fr9p -n kube-systempod "coredns-5644d7b6d9-9fr9p" deleted
看了太多的文章与博客,发现没有几个写的太完全的,都是写的成功的经验,实际上中间不知道有各种奇怪问题。说句实话,k8s很方便,但是门槛很高,依赖的东西真的太多太多了。特别是版本问题导致的问题,很难解决掉。
最后看一下成功的图片吧
常用命令汇总
代码语言:javascript复制systemctl daemon-reloadsystemctl restart kubeletkubectl get pods -n kube-systemkubectl describe pod coredns-5644d7b6d9-lqtks -n kube-systemkubectl delete pod coredns-5644d7b6d9-qh4bc -n kube-system# 允许master节点部署podkubectl taint nodes --all node-role.kubernetes.io/master-# 禁止master部署podkubectl taint nodes k8s node-role.kubernetes.io/master=true:NoSchedulekubeadm resetsystemctl enable docker && systemctl start dockersystemctl enable kubelet && systemctl start kubeletjournalctl -xefu kubelet
参考地址
- https://www.cnblogs.com/yufeng218/p/8370670.html
- https://stackoverflow.com/questions/55531834/kubeadm-fails-to-initialize-when-kubeadm-init-is-called
- https://zhuanlan.zhihu.com/p/31398416
- https://blog.csdn.net/M82_A1/article/details/97626309
- https://blog.csdn.net/liumiaocn/article/details/99608323
- https://www.hi-linux.com/posts/54191.html
- https://blog.csdn.net/BigData_Mining/article/details/88683459