外包精通--手撸Rancher从入门到笔记

2023-07-31 19:28:07 浏览数 (1)

Rancher2.2.4 HA K8S部署

本文主要目的在于记录rancher ha集群搭建步骤,内容包括系统配置、docker安装、k8s安装、rancher ha安装等。

服务器环境信息:

节点名称

IP

OS

角色

node9

172.16.17.72

CentOS Linux release 7.4.1708 (Core)

Nginx

Node10

172.16.17.10

CentOS Linux release 7.4.1708 (Core)

etcd, docker, k8s

Node11

172.16.17.11

CentOS Linux release 7.4.1708 (Core)

etcd, docker, k8s

Node12

172.16.17.12

CentOS Linux release 7.4.1708 (Core)

etcd, docker, k8s

一、环境设置(node10node11node12每一台主机都需要操作)

操作系统文件限制

vi /etc/security/limits.conf

在文件末尾添加以下内容:

代码语言:txt复制
# k8s
root soft nofile 655350
root hard nofile 655350
* soft nofile 655350
* hard nofile 655350
关闭防火墙
代码语言:txt复制
systemctl stop firewalld
systemctl disable firewalld
关闭setlinx

SELINUX值设置为disabled:

代码语言:txt复制
vim /etc/selinux/config
SELINUX=disabled
关闭swap

注释或删除swap交换分区:vi /etc/fstab

代码语言:txt复制
#
# /etc/fstab
# Created by anaconda on Fri Jun  2 14:11:50 2017
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/centos-root /                       xfs     defaults        0 0
UUID=f5b4435a-77bc-48f4-8d22-6fa55e9e04a2 /boot                   xfs     defaults        0 0
/dev/mapper/centos-grid0 /grid0                  xfs     defaults        0 0
#/dev/mapper/centos-swap swap                    swap    defaults        0 0
kernel调优

添加如下内容,vi /etc/sysctl.conf

代码语言:txt复制
# k8s
net.ipv4.ip_forward=1
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
vm.swappiness=0
vm.max_map_count=655360

sysctl -p

docker安装

安装指定版本docker

代码语言:txt复制
yum install yum-utils device-mapper-persistent-data lvm2 curl -y
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
yum clean all && yum repolist && yum install docker-ce-18.09.7 -y
systemctl start docker && systemctl enable docker
docker info

添加国内加速代理,设置storage-driver;设置默认内网网段;限制docker日志大小避免日志撑爆硬盘!!!!!

代码语言:txt复制
[root@node10 ~]# cat /etc/docker/daemon.json 
{"bip": "192.168.100.1/24"}
{
  "registry-mirrors": ["https://39r65dar.mirror.aliyuncs.com"],
    "storage-driver": "overlay2",
    "storage-opts": [
    "overlay2.override_kernel_check=true"
    ]
}
{
"log-driver": "json-file",
"log-opts": {
    "max-size": "100m",
    "max-file": "3"
    }
}

重启docker,如果起不来请检查/etc/docker/daemon.json配置文件

代码语言:txt复制
systemctl restart docker
创建用户

创建用户并且添加到docker组:(其实是先要安装docker的!)

代码语言:txt复制
useradd rancher -G docker
passwd rancher
(生产密码 QEw2pldqPnPOkUsj)
ssh免密登录(node10node11node12)

每台主机上面执行下面命令,包括本机对本机

代码语言:txt复制
su - rancher
ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub rancher@172.16.17.10
ssh-copy-id -i .ssh/id_rsa.pub rancher@172.16.17.11
ssh-copy-id -i .ssh/id_rsa.pub rancher@172.16.17.12

root用户也需要做免密登录

代码语言:txt复制
su - root
ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub root@172.16.17.10
ssh-copy-id -i .ssh/id_rsa.pub root@172.16.17.11
ssh-copy-id -i .ssh/id_rsa.pub root@172.16.17.12

二、nginx安装(node72节点部署nginx负载均衡)

代码语言:txt复制
[root@node9 ~]# wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
[root@node9 ~]# wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
[root@node9 ~]# yum clean all
[root@node9 ~]# yum install nginx -y
[root@node9 ~]# nginx -v
nginx version: nginx/1.16.0
建议使用1.16版本,不然不支持stream

如果现实版本是nginx/1.12.2
添加yum nginx官方配置
[nginx] 
name=nginx repo 
baseurl=http://nginx.org/packages/centos/7/$basearch/
gpgcheck=0 
enabled=1

执行升级命令
yum clean all && yum update nginx -y

修改配置文件:vi /etc/nginx/nginx.conf

代码语言:txt复制
user nginx;
worker_processes 4;
worker_rlimit_nofile 40000;

events {
    worker_connections 8192;
}

http {
    # Gzip Settings
    gzip on;
    gzip_disable "msie6";
    gzip_disable "MSIE [1-6].(?!.*SV1)";
    gzip_vary on;
    gzip_static on;
    gzip_proxied any;
    gzip_min_length 0;
    gzip_comp_level 8;
    gzip_buffers 16 8k;
    gzip_http_version 1.1;
    gzip_types text/xml application/xml application/atom xml application/rss xml application/xhtml xml image/svg xml application/font-woff text/javascript application/javascript application/x-javascript text/x-json application/json application/x-web-app-manifest json text/css text/plain text/x-component font/opentype application/x-font-ttf application/vnd.ms-fontobject font/woff2 image/x-icon image/png image/jpeg;

    server {
        listen         80;
        return 301 https://$host$request_uri;
    }
}

stream {
    upstream rancher_servers {
        least_conn;
        server 172.16.17.10:443 max_fails=3 fail_timeout=5s;
        server 172.16.17.11:443 max_fails=3 fail_timeout=5s;
        server 172.16.17.12:443 max_fails=3 fail_timeout=5s;
    }
    server {
        listen     443;
        proxy_pass rancher_servers;
    }
}

启动nginx:

代码语言:txt复制
sudo systemctl restart nginx.service

三、Rancher集群部署(以下操作全部在node10操作直至集群完成)

安装必要工具

安装rke:

代码语言:txt复制
网上找下资源下载rke_linux-amd64并上传到服务器
[root@node10 ~]# chmod  x rke_linux-amd64
[root@node10 ~]# mv rke_linux-amd64 /usr/bin/rke
验证:
[root@node10 ~]# rke -v
rke version v0.2.4

安装kubectl:

代码语言:txt复制
网上找下资源下载kubectl_amd64-linux并上传到服务器
[root@node10 ~]# chmod  x kubectl_amd64-linux
[root@node10 ~]# mv kubectl_amd64-linux /usr/bin/kubectl
验证:
[root@node10 ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:40:16Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server localhost:8080 was refused - did you specify the right host or port?

安装helm:

代码语言:txt复制
网上找下资源下载helm-v2.14.1-linux-amd64.tar.gz并上传到服务器
[root@node10 ~]# tar -zxvf helm-v2.14.1-linux-amd64.tar.gz 
[root@node10 ~]# mv linux-amd64/helm /usr/bin/helm
[root@node10 ~]# mv linux-amd64/tiller /usr/bin/tiller
[root@node10 ~]# rm -rf helm-v2.14.1-linux-amd64.tar.gz linux-amd64/
验证:
[root@node10 ~]# helm version
Client: &version.Version{SemVer:"v2.14.1", GitCommit:"5270352a09c7e8b6e8c9593002a73535276507c0", GitTreeState:"clean"}
Error: Get http://localhost:8080/api/v1/namespaces/kube-system/pods?labelSelector=app=helm,name=tiller: dial tcp [::1]:8080: connect: connection refused
[root@node10 ~]# tiller version
[main] 2019/06/28 19:34:21 Starting Tiller v2.14.1 (tls=false)
[main] 2019/06/28 19:34:21 GRPC listening on :44134
[main] 2019/06/28 19:34:21 Probes listening on :44135
[main] 2019/06/28 19:34:21 Storage driver is ConfigMap
[main] 2019/06/28 19:34:21 Max history per release is 0

四、安装k8s

1、切换到rancher用户

代码语言:txt复制
su - rancher

2、创建rancher集群配置文件:

vi rancher-cluster.yml

代码语言:txt复制
nodes:
  - address: 172.16.17.10
    user: rancher
    role: [controlplane,worker,etcd]
  - address: 172.16.17.11
    user: rancher
    role: [controlplane,worker,etcd]
  - address: 172.16.17.12
    user: rancher
    role: [controlplane,worker,etcd]

services:
  etcd:
    snapshot: true
    creation: 6h
    retention: 24h

3、启动集群

代码语言:txt复制
[rancher@node10 ~]$ rke up --config ./rancher-cluster.yml
INFO[0000] Initiating Kubernetes cluster                
INFO[0000] [dialer] Setup tunnel for host [172.16.17.11] 
INFO[0000] [dialer] Setup tunnel for host [172.16.17.12] 
INFO[0000] [dialer] Setup tunnel for host [172.16.17.10] 
INFO[0000] [state] Successfully started [cluster-state-deployer] container on host [172.16.17.10] 
INFO[0001] [state] Successfully started [cluster-state-deployer] container on host [172.16.17.11] 
INFO[0001] [state] Successfully started [cluster-state-deployer] container on host [172.16.17.12] 
INFO[0001] [certificates] Generating CA kubernetes certificates 
INFO[0002] [certificates] Generating Kubernetes API server aggregation layer requestheader client CA certificates 
INFO[0002] [certificates] Generating Node certificate   
INFO[0002] [certificates] Generating admin certificates and kubeconfig 
INFO[0002] [certificates] Generating Kubernetes API server proxy client certificates 
INFO[0002] [certificates] Generating etcd-172.16.17.10 certificate and key 
INFO[0002] [certificates] Generating etcd-172.16.17.11 certificate and key 
INFO[0003] [certificates] Generating etcd-172.16.17.12 certificate and key 
INFO[0003] [certificates] Generating Kube Controller certificates 
INFO[0003] [certificates] Generating Kube Scheduler certificates 
INFO[0004] [certificates] Generating Kube Proxy certificates 
INFO[0004] [certificates] Generating Kubernetes API server certificates 
INFO[0004] Successfully Deployed state file at [./rancher-cluster.rkestate] 
INFO[0004] Building Kubernetes cluster                  
INFO[0004] [dialer] Setup tunnel for host [172.16.17.12] 
INFO[0004] [dialer] Setup tunnel for host [172.16.17.10] 
INFO[0004] [dialer] Setup tunnel for host [172.16.17.11] 
INFO[0004] [network] Deploying port listener containers 
INFO[0005] [network] Successfully started [rke-etcd-port-listener] container on host [172.16.17.10] 
INFO[0005] [network] Successfully started [rke-etcd-port-listener] container on host [172.16.17.11] 
INFO[0005] [network] Successfully started [rke-etcd-port-listener] container on host [172.16.17.12] 
INFO[0005] [network] Successfully started [rke-cp-port-listener] container on host [172.16.17.10] 
INFO[0005] [network] Successfully started [rke-cp-port-listener] container on host [172.16.17.12] 
INFO[0005] [network] Successfully started [rke-cp-port-listener] container on host [172.16.17.11] 
INFO[0006] [network] Successfully started [rke-worker-port-listener] container on host [172.16.17.11] 
INFO[0006] [network] Successfully started [rke-worker-port-listener] container on host [172.16.17.10] 
INFO[0006] [network] Successfully started [rke-worker-port-listener] container on host [172.16.17.12] 
INFO[0006] [network] Port listener containers deployed successfully 
INFO[0006] [network] Running etcd <-> etcd port checks  
INFO[0006] [network] Successfully started [rke-port-checker] container on host [172.16.17.10] 
INFO[0006] [network] Successfully started [rke-port-checker] container on host [172.16.17.11] 
INFO[0006] [network] Successfully started [rke-port-checker] container on host [172.16.17.12] 
INFO[0006] [network] Running control plane -> etcd port checks 
INFO[0007] [network] Successfully started [rke-port-checker] container on host [172.16.17.11] 
INFO[0007] [network] Successfully started [rke-port-checker] container on host [172.16.17.10] 
INFO[0007] [network] Successfully started [rke-port-checker] container on host [172.16.17.12] 
INFO[0007] [network] Running control plane -> worker port checks 
INFO[0007] [network] Successfully started [rke-port-checker] container on host [172.16.17.11] 
INFO[0007] [network] Successfully started [rke-port-checker] container on host [172.16.17.10] 
INFO[0007] [network] Successfully started [rke-port-checker] container on host [172.16.17.12] 
INFO[0007] [network] Running workers -> control plane port checks 
INFO[0008] [network] Successfully started [rke-port-checker] container on host [172.16.17.11] 
INFO[0008] [network] Successfully started [rke-port-checker] container on host [172.16.17.10] 
INFO[0008] [network] Successfully started [rke-port-checker] container on host [172.16.17.12] 
INFO[0008] [network] Checking KubeAPI port Control Plane hosts 
INFO[0008] [network] Removing port listener containers  
INFO[0008] [remove/rke-etcd-port-listener] Successfully removed container on host [172.16.17.11] 
INFO[0008] [remove/rke-etcd-port-listener] Successfully removed container on host [172.16.17.10] 
INFO[0008] [remove/rke-etcd-port-listener] Successfully removed container on host [172.16.17.12] 
INFO[0008] [remove/rke-cp-port-listener] Successfully removed container on host [172.16.17.10] 
INFO[0008] [remove/rke-cp-port-listener] Successfully removed container on host [172.16.17.11] 
INFO[0008] [remove/rke-cp-port-listener] Successfully removed container on host [172.16.17.12] 
INFO[0009] [remove/rke-worker-port-listener] Successfully removed container on host [172.16.17.10] 
INFO[0009] [remove/rke-worker-port-listener] Successfully removed container on host [172.16.17.11] 
INFO[0009] [remove/rke-worker-port-listener] Successfully removed container on host [172.16.17.12] 
INFO[0009] [network] Port listener containers removed successfully 
INFO[0009] [certificates] Deploying kubernetes certificates to Cluster nodes 
INFO[0014] [reconcile] Rebuilding and updating local kube config 
INFO[0014] Successfully Deployed local admin kubeconfig at [./kube_config_rancher-cluster.yml] 
INFO[0014] Successfully Deployed local admin kubeconfig at [./kube_config_rancher-cluster.yml] 
INFO[0014] Successfully Deployed local admin kubeconfig at [./kube_config_rancher-cluster.yml] 
INFO[0014] [certificates] Successfully deployed kubernetes certificates to Cluster nodes 
INFO[0014] [reconcile] Reconciling cluster state        
INFO[0014] [reconcile] This is newly generated cluster  
INFO[0014] Pre-pulling kubernetes images                
INFO[0014] [pre-deploy] Pulling image [rancher/hyperkube:v1.13.5-rancher1] on host [172.16.17.10] 
INFO[0014] [pre-deploy] Pulling image [rancher/hyperkube:v1.13.5-rancher1] on host [172.16.17.11] 
INFO[0014] [pre-deploy] Pulling image [rancher/hyperkube:v1.13.5-rancher1] on host [172.16.17.12] 
INFO[0120] [pre-deploy] Successfully pulled image [rancher/hyperkube:v1.13.5-rancher1] on host [172.16.17.10] 
INFO[0159] [pre-deploy] Successfully pulled image [rancher/hyperkube:v1.13.5-rancher1] on host [172.16.17.11] 
INFO[0177] [pre-deploy] Successfully pulled image [rancher/hyperkube:v1.13.5-rancher1] on host [172.16.17.12] 
INFO[0177] Kubernetes images pulled successfully        
INFO[0177] [etcd] Building up etcd plane..              
INFO[0177] [etcd] Pulling image [rancher/coreos-etcd:v3.2.24-rancher1] on host [172.16.17.10] 
INFO[0201] [etcd] Successfully pulled image [rancher/coreos-etcd:v3.2.24-rancher1] on host [172.16.17.10] 
INFO[0201] [etcd] Successfully started [etcd] container on host [172.16.17.10] 
INFO[0201] [etcd] Saving snapshot [etcd-rolling-snapshots] on host [172.16.17.10] 
INFO[0202] [etcd] Successfully started [etcd-rolling-snapshots] container on host [172.16.17.10] 
INFO[0207] [certificates] Successfully started [rke-bundle-cert] container on host [172.16.17.10] 
INFO[0207] Waiting for [rke-bundle-cert] container to exit on host [172.16.17.10] 
INFO[0207] Container [rke-bundle-cert] is still running on host [172.16.17.10] 
INFO[0208] Waiting for [rke-bundle-cert] container to exit on host [172.16.17.10] 
INFO[0208] [certificates] successfully saved certificate bundle [/opt/rke/etcd-snapshots//pki.bundle.tar.gz] on host [172.16.17.10] 
INFO[0209] [etcd] Successfully started [rke-log-linker] container on host [172.16.17.10] 
INFO[0209] [remove/rke-log-linker] Successfully removed container on host [172.16.17.10] 
INFO[0209] [etcd] Pulling image [rancher/coreos-etcd:v3.2.24-rancher1] on host [172.16.17.11] 
INFO[0226] [etcd] Successfully pulled image [rancher/coreos-etcd:v3.2.24-rancher1] on host [172.16.17.11] 
INFO[0226] [etcd] Successfully started [etcd] container on host [172.16.17.11] 
INFO[0226] [etcd] Saving snapshot [etcd-rolling-snapshots] on host [172.16.17.11] 
INFO[0227] [etcd] Successfully started [etcd-rolling-snapshots] container on host [172.16.17.11] 
INFO[0232] [certificates] Successfully started [rke-bundle-cert] container on host [172.16.17.11] 
INFO[0232] Waiting for [rke-bundle-cert] container to exit on host [172.16.17.11] 
INFO[0232] Container [rke-bundle-cert] is still running on host [172.16.17.11] 
INFO[0233] Waiting for [rke-bundle-cert] container to exit on host [172.16.17.11] 
INFO[0233] [certificates] successfully saved certificate bundle [/opt/rke/etcd-snapshots//pki.bundle.tar.gz] on host [172.16.17.11] 
INFO[0234] [etcd] Successfully started [rke-log-linker] container on host [172.16.17.11] 
INFO[0234] [remove/rke-log-linker] Successfully removed container on host [172.16.17.11] 
INFO[0234] [etcd] Pulling image [rancher/coreos-etcd:v3.2.24-rancher1] on host [172.16.17.12] 
INFO[0253] [etcd] Successfully pulled image [rancher/coreos-etcd:v3.2.24-rancher1] on host [172.16.17.12] 
INFO[0253] [etcd] Successfully started [etcd] container on host [172.16.17.12] 
INFO[0253] [etcd] Saving snapshot [etcd-rolling-snapshots] on host [172.16.17.12] 
INFO[0254] [etcd] Successfully started [etcd-rolling-snapshots] container on host [172.16.17.12] 
INFO[0259] [certificates] Successfully started [rke-bundle-cert] container on host [172.16.17.12] 
INFO[0259] Waiting for [rke-bundle-cert] container to exit on host [172.16.17.12] 
INFO[0259] Container [rke-bundle-cert] is still running on host [172.16.17.12] 
INFO[0260] Waiting for [rke-bundle-cert] container to exit on host [172.16.17.12] 
INFO[0260] [certificates] successfully saved certificate bundle [/opt/rke/etcd-snapshots//pki.bundle.tar.gz] on host [172.16.17.12] 
INFO[0261] [etcd] Successfully started [rke-log-linker] container on host [172.16.17.12] 
INFO[0261] [remove/rke-log-linker] Successfully removed container on host [172.16.17.12] 
INFO[0261] [etcd] Successfully started etcd plane.. Checking etcd cluster health 
INFO[0261] [controlplane] Building up Controller Plane.. 
INFO[0261] [controlplane] Successfully started [kube-apiserver] container on host [172.16.17.12] 
INFO[0261] [healthcheck] Start Healthcheck on service [kube-apiserver] on host [172.16.17.12] 
INFO[0261] [controlplane] Successfully started [kube-apiserver] container on host [172.16.17.11] 
INFO[0261] [healthcheck] Start Healthcheck on service [kube-apiserver] on host [172.16.17.11] 
INFO[0261] [controlplane] Successfully started [kube-apiserver] container on host [172.16.17.10] 
INFO[0261] [healthcheck] Start Healthcheck on service [kube-apiserver] on host [172.16.17.10] 
INFO[0273] [healthcheck] service [kube-apiserver] on host [172.16.17.11] is healthy 
INFO[0273] [healthcheck] service [kube-apiserver] on host [172.16.17.10] is healthy 
INFO[0274] [controlplane] Successfully started [rke-log-linker] container on host [172.16.17.10] 
INFO[0274] [healthcheck] service [kube-apiserver] on host [172.16.17.12] is healthy 
INFO[0274] [remove/rke-log-linker] Successfully removed container on host [172.16.17.10] 
INFO[0274] [controlplane] Successfully started [rke-log-linker] container on host [172.16.17.11] 
INFO[0274] [controlplane] Successfully started [kube-controller-manager] container on host [172.16.17.10] 
INFO[0274] [healthcheck] Start Healthcheck on service [kube-controller-manager] on host [172.16.17.10] 
INFO[0274] [controlplane] Successfully started [rke-log-linker] container on host [172.16.17.12] 
INFO[0274] [remove/rke-log-linker] Successfully removed container on host [172.16.17.11] 
INFO[0274] [remove/rke-log-linker] Successfully removed container on host [172.16.17.12] 
INFO[0274] [controlplane] Successfully started [kube-controller-manager] container on host [172.16.17.12] 
INFO[0274] [healthcheck] Start Healthcheck on service [kube-controller-manager] on host [172.16.17.12] 
INFO[0274] [controlplane] Successfully started [kube-controller-manager] container on host [172.16.17.11] 
INFO[0274] [healthcheck] Start Healthcheck on service [kube-controller-manager] on host [172.16.17.11] 
INFO[0279] [healthcheck] service [kube-controller-manager] on host [172.16.17.10] is healthy 
INFO[0280] [controlplane] Successfully started [rke-log-linker] container on host [172.16.17.10] 
INFO[0280] [healthcheck] service [kube-controller-manager] on host [172.16.17.12] is healthy 
INFO[0280] [remove/rke-log-linker] Successfully removed container on host [172.16.17.10] 
INFO[0280] [healthcheck] service [kube-controller-manager] on host [172.16.17.11] is healthy 
INFO[0280] [controlplane] Successfully started [kube-scheduler] container on host [172.16.17.10] 
INFO[0280] [healthcheck] Start Healthcheck on service [kube-scheduler] on host [172.16.17.10] 
INFO[0280] [controlplane] Successfully started [rke-log-linker] container on host [172.16.17.12] 
INFO[0280] [remove/rke-log-linker] Successfully removed container on host [172.16.17.12] 
INFO[0280] [controlplane] Successfully started [rke-log-linker] container on host [172.16.17.11] 
INFO[0280] [remove/rke-log-linker] Successfully removed container on host [172.16.17.11] 
INFO[0280] [controlplane] Successfully started [kube-scheduler] container on host [172.16.17.12] 
INFO[0280] [healthcheck] Start Healthcheck on service [kube-scheduler] on host [172.16.17.12] 
INFO[0281] [controlplane] Successfully started [kube-scheduler] container on host [172.16.17.11] 
INFO[0281] [healthcheck] Start Healthcheck on service [kube-scheduler] on host [172.16.17.11] 
INFO[0285] [healthcheck] service [kube-scheduler] on host [172.16.17.10] is healthy 
INFO[0285] [controlplane] Successfully started [rke-log-linker] container on host [172.16.17.10] 
INFO[0286] [remove/rke-log-linker] Successfully removed container on host [172.16.17.10] 
INFO[0286] [healthcheck] service [kube-scheduler] on host [172.16.17.12] is healthy 
INFO[0286] [healthcheck] service [kube-scheduler] on host [172.16.17.11] is healthy 
INFO[0286] [controlplane] Successfully started [rke-log-linker] container on host [172.16.17.12] 
INFO[0286] [remove/rke-log-linker] Successfully removed container on host [172.16.17.12] 
INFO[0286] [controlplane] Successfully started [rke-log-linker] container on host [172.16.17.11] 
INFO[0287] [remove/rke-log-linker] Successfully removed container on host [172.16.17.11] 
INFO[0287] [controlplane] Successfully started Controller Plane.. 
INFO[0287] [authz] Creating rke-job-deployer ServiceAccount 
INFO[0287] [authz] rke-job-deployer ServiceAccount created successfully 
INFO[0287] [authz] Creating system:node ClusterRoleBinding 
INFO[0287] [authz] system:node ClusterRoleBinding created successfully 
INFO[0287] Successfully Deployed state file at [./rancher-cluster.rkestate] 
INFO[0287] [state] Saving full cluster state to Kubernetes 
INFO[0287] [state] Successfully Saved full cluster state to Kubernetes ConfigMap: cluster-state 
INFO[0287] [worker] Building up Worker Plane..          
INFO[0287] [sidekick] Sidekick container already created on host [172.16.17.10] 
INFO[0287] [sidekick] Sidekick container already created on host [172.16.17.12] 
INFO[0287] [sidekick] Sidekick container already created on host [172.16.17.11] 
INFO[0287] [worker] Successfully started [kubelet] container on host [172.16.17.10] 
INFO[0287] [healthcheck] Start Healthcheck on service [kubelet] on host [172.16.17.10] 
INFO[0287] [worker] Successfully started [kubelet] container on host [172.16.17.11] 
INFO[0287] [healthcheck] Start Healthcheck on service [kubelet] on host [172.16.17.11] 
INFO[0287] [worker] Successfully started [kubelet] container on host [172.16.17.12] 
INFO[0287] [healthcheck] Start Healthcheck on service [kubelet] on host [172.16.17.12] 
INFO[0292] [healthcheck] service [kubelet] on host [172.16.17.10] is healthy 
INFO[0292] [healthcheck] service [kubelet] on host [172.16.17.12] is healthy 
INFO[0292] [healthcheck] service [kubelet] on host [172.16.17.11] is healthy 
INFO[0293] [worker] Successfully started [rke-log-linker] container on host [172.16.17.10] 
INFO[0293] [worker] Successfully started [rke-log-linker] container on host [172.16.17.12] 
INFO[0293] [remove/rke-log-linker] Successfully removed container on host [172.16.17.10] 
INFO[0293] [worker] Successfully started [rke-log-linker] container on host [172.16.17.11] 
INFO[0293] [remove/rke-log-linker] Successfully removed container on host [172.16.17.12] 
INFO[0293] [worker] Successfully started [kube-proxy] container on host [172.16.17.10] 
INFO[0293] [healthcheck] Start Healthcheck on service [kube-proxy] on host [172.16.17.10] 
INFO[0293] [remove/rke-log-linker] Successfully removed container on host [172.16.17.11] 
INFO[0293] [worker] Successfully started [kube-proxy] container on host [172.16.17.12] 
INFO[0293] [healthcheck] Start Healthcheck on service [kube-proxy] on host [172.16.17.12] 
INFO[0293] [worker] Successfully started [kube-proxy] container on host [172.16.17.11] 
INFO[0293] [healthcheck] Start Healthcheck on service [kube-proxy] on host [172.16.17.11] 
INFO[0298] [healthcheck] service [kube-proxy] on host [172.16.17.10] is healthy 
INFO[0298] [healthcheck] service [kube-proxy] on host [172.16.17.12] is healthy 
INFO[0299] [healthcheck] service [kube-proxy] on host [172.16.17.11] is healthy 
INFO[0299] [worker] Successfully started [rke-log-linker] container on host [172.16.17.10] 
INFO[0299] [worker] Successfully started [rke-log-linker] container on host [172.16.17.12] 
INFO[0299] [remove/rke-log-linker] Successfully removed container on host [172.16.17.10] 
INFO[0299] [remove/rke-log-linker] Successfully removed container on host [172.16.17.12] 
INFO[0299] [worker] Successfully started [rke-log-linker] container on host [172.16.17.11] 
INFO[0299] [remove/rke-log-linker] Successfully removed container on host [172.16.17.11] 
INFO[0299] [worker] Successfully started Worker Plane.. 
INFO[0300] [cleanup] Successfully started [rke-log-cleaner] container on host [172.16.17.10] 
INFO[0300] [cleanup] Successfully started [rke-log-cleaner] container on host [172.16.17.11] 
INFO[0300] [remove/rke-log-cleaner] Successfully removed container on host [172.16.17.10] 
INFO[0300] [remove/rke-log-cleaner] Successfully removed container on host [172.16.17.11] 
INFO[0300] [cleanup] Successfully started [rke-log-cleaner] container on host [172.16.17.12] 
INFO[0300] [remove/rke-log-cleaner] Successfully removed container on host [172.16.17.12] 
INFO[0300] [sync] Syncing nodes Labels and Taints       
INFO[0300] [sync] Successfully synced nodes Labels and Taints 
INFO[0300] [network] Setting up network plugin: canal   
INFO[0300] [addons] Saving ConfigMap for addon rke-network-plugin to Kubernetes 
INFO[0300] [addons] Successfully saved ConfigMap for addon rke-network-plugin to Kubernetes 
INFO[0300] [addons] Executing deploy job rke-network-plugin 
INFO[0316] [addons] Setting up kube-dns                 
INFO[0316] [addons] Saving ConfigMap for addon rke-kube-dns-addon to Kubernetes 
INFO[0316] [addons] Successfully saved ConfigMap for addon rke-kube-dns-addon to Kubernetes 
INFO[0316] [addons] Executing deploy job rke-kube-dns-addon 
INFO[0321] [addons] kube-dns deployed successfully      
INFO[0321] [dns] DNS provider kube-dns deployed successfully 
INFO[0321] [addons] Setting up Metrics Server           
INFO[0321] [addons] Saving ConfigMap for addon rke-metrics-addon to Kubernetes 
INFO[0321] [addons] Successfully saved ConfigMap for addon rke-metrics-addon to Kubernetes 
INFO[0321] [addons] Executing deploy job rke-metrics-addon 
INFO[0326] [addons] Metrics Server deployed successfully 
INFO[0326] [ingress] Setting up nginx ingress controller 
INFO[0326] [addons] Saving ConfigMap for addon rke-ingress-controller to Kubernetes 
INFO[0326] [addons] Successfully saved ConfigMap for addon rke-ingress-controller to Kubernetes 
INFO[0326] [addons] Executing deploy job rke-ingress-controller 
INFO[0331] [ingress] ingress controller nginx deployed successfully 
INFO[0331] [addons] Setting up user addons              
INFO[0331] [addons] no user addons defined              
INFO[0331] Finished building Kubernetes cluster successfully 

4、如果操作失败,重新安装需要清理数据:

代码语言:txt复制
每个节点
su - root
rm -rf /var/lib/rancher/etcd/*
rm -rf /etc/kubernetes/*
docker rm -f `docker ps -aq`
node10
su - rancher
rke remove --config ./rancher-cluster.yml

等待30分钟左右,完成后它应显示:Finished building Kubernetes cluster successfully。

5、配置环境变量:

切换到root用户su - root

vi /etc/profile

代码语言:txt复制
export KUBECONFIG=/home/rancher/kube_config_rancher-cluster.yml

保存,并执行:

代码语言:txt复制
source /etc/profile

6、通过kubectl测试您的连接,并查看您的所有节点是否处于Ready状态

代码语言:txt复制
[rencher@node10 ~]# kubectl get nodes
NAME           STATUS   ROLES                      AGE    VERSION
172.16.17.10   Ready    controlplane,etcd,worker   4m7s   v1.13.5
172.16.17.11   Ready    controlplane,etcd,worker   4m7s   v1.13.5
172.16.17.12   Ready    controlplane,etcd,worker   4m7s   v1.13.5

7、检查集群Pod的运行状况

代码语言:txt复制
[rencher@node10 ~]# kubectl get pods --all-namespaces
NAMESPACE       NAME                                      READY   STATUS      RESTARTS   AGE
ingress-nginx   default-http-backend-78fccfc5d9-fb2hn     1/1     Running     0          7m6s
ingress-nginx   nginx-ingress-controller-bzfl2            1/1     Running     0          6m35s
ingress-nginx   nginx-ingress-controller-wsgz2            1/1     Running     0          6m45s
ingress-nginx   nginx-ingress-controller-z7vfr            1/1     Running     0          6m24s
kube-system     canal-hb5wm                               2/2     Running     0          7m19s
kube-system     canal-j79kp                               2/2     Running     0          7m19s
kube-system     canal-nk8hm                               2/2     Running     0          7m19s
kube-system     kube-dns-58bd5b8dd7-j4t7h                 3/3     Running     0          7m16s
kube-system     kube-dns-autoscaler-77bc5fd84-lwnpr       1/1     Running     0          7m15s
kube-system     metrics-server-58bd5dd8d7-7r97s           1/1     Running     0          7m11s
kube-system     rke-ingress-controller-deploy-job-s2nhp   0/1     Completed   0          7m7s
kube-system     rke-kube-dns-addon-deploy-job-dslj9       0/1     Completed   0          7m18s
kube-system     rke-metrics-addon-deploy-job-ghgh7        0/1     Completed   0          7m13s
kube-system     rke-network-plugin-deploy-job-rxwl4       0/1     Completed   0          7m33s

保存kube_config_rancher-cluster.yml和rancher-cluster.yml文件的副本,您将需要这些文件来维护和升级Rancher实例。

Helm

使用Helm在集群上安装tiller服务以管理charts,由于RKE默认启用RBAC, 因此我们需要使用kubectl来创建一个serviceaccount,clusterrolebinding才能让tiller具有部署到集群的权限。

1、在kube-system命名空间中创建ServiceAccount:

代码语言:txt复制
[rancher@node10 ~]# kubectl -n kube-system create serviceaccount tiller
serviceaccount/tiller created

2、创建ClusterRoleBinding以授予tiller帐户对集群的访问权限:

代码语言:txt复制
[rancher@node10 ~]# kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
clusterrolebinding.rbac.authorization.k8s.io/tiller created

3、安装Helm Server(Tiller)

代码语言:txt复制
[rancher@node10 ~]# helm version  #先查询客户端版本 v2.14.1
Client: &version.Version{SemVer:"v2.14.1", GitCommit:"5270352a09c7e8b6e8c9593002a73535276507c0", GitTreeState:"clean"}
Error: could not find tiller

[rancher@node10 ~]# helm init --service-account tiller   --tiller-image registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.14.1 --stable-repo-url https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts #安装对应的服务端版本 v2.14.1

验证:
[rancher@node10 ~]# helm version
Client: &version.Version{SemVer:"v2.14.1", GitCommit:"5270352a09c7e8b6e8c9593002a73535276507c0", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.1", GitCommit:"5270352a09c7e8b6e8c9593002a73535276507c0", GitTreeState:"clean"}

利用Helm部署高可用rancher集群

一、背景

Rancher HA有多种部署方式:

  1. Helm HA安装,将Rancher部署在已有的Kubernetes集群中,Rancher将使用集群的etcd存储数据,并利用Kubernetes调度实现高可用性。
  2. RKE HA安装,使用RKE工具安装独立的Kubernetes集群,专门用于Rancher HA部署运行,RKE HA安装仅支持Rancher v2.0.8以及之前的版本,Rancher v2.0.8之后的版本使用helm安装Rancher。 本方案将基于已有的Kubernetes集群,利用Helm安装Rancher HA,并采用四层负载均衡方式。

二、添加Chart仓库地址

使用helm repo add命令添加Rancher chart仓库地址

Rancher tag和Chart版本选择参考:https://www.cnrancher.com/docs/rancher/v2.x/cn/installation/server-tags/

代码语言:txt复制
[rancher@node10 ~]# helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
"rancher-stable" has been added to your repositories

三、使用自签名证书安装Rancher server

Rancher server设计默认需要开启SSL/TLS配置来保证安全,将ssl证书以Kubernetes Secret卷的形式传递给rancher server或Ingress Controller。首先创建证书密文,以便Rancher和Ingress Controller可以使用。

1、 生成自签名证书

#脚本

代码语言:txt复制
#!/bin/bash -e

help ()
{
    echo  ' ================================================================ '
    echo  ' --ssl-domain: 生成ssl证书需要的主域名,如不指定则默认为localhost,如果是ip访问服务,则可忽略;'
    echo  ' --ssl-trusted-ip: 一般ssl证书只信任域名的访问请求,有时候需要使用ip去访问server,那么需要给ssl证书添加扩展IP,多个IP用逗号隔开;'
    echo  ' --ssl-trusted-domain: 如果想多个域名访问,则添加扩展域名(SSL_TRUSTED_DOMAIN),多个扩展域名用逗号隔开;'
    echo  ' --ssl-size: ssl加密位数,默认2048;'
    echo  ' --ssl-date: ssl有效期,默认10年;'
    echo  ' --ca-date: ca有效期,默认10年;'
    echo  ' --ssl-cn: 国家代码(2个字母的代号),默认CN;'
    echo  ' 使用示例:'
    echo  ' ./create_self-signed-cert.sh --ssl-domain=www.test.com --ssl-trusted-domain=www.test2.com  '
    echo  ' --ssl-trusted-ip=1.1.1.1,2.2.2.2,3.3.3.3 --ssl-size=2048 --ssl-date=3650'
    echo  ' ================================================================'
}

case "$1" in
    -h|--help) help; exit;;
esac

if [[ $1 == '' ]];then
    help;
    exit;
fi

CMDOPTS="$*"
for OPTS in $CMDOPTS;
do
    key=$(echo ${OPTS} | awk -F"=" '{print $1}' )
    value=$(echo ${OPTS} | awk -F"=" '{print $2}' )
    case "$key" in
        --ssl-domain) SSL_DOMAIN=$value ;;
        --ssl-trusted-ip) SSL_TRUSTED_IP=$value ;;
        --ssl-trusted-domain) SSL_TRUSTED_DOMAIN=$value ;;
        --ssl-size) SSL_SIZE=$value ;;
        --ssl-date) SSL_DATE=$value ;;
        --ca-date) CA_DATE=$value ;;
        --ssl-cn) CN=$value ;;
    esac
done

# CA相关配置
CA_DATE=${CA_DATE:-3650}
CA_KEY=${CA_KEY:-cakey.pem}
CA_CERT=${CA_CERT:-cacerts.pem}
CA_DOMAIN=localhost

# ssl相关配置
SSL_CONFIG=${SSL_CONFIG:-$PWD/openssl.cnf}
SSL_DOMAIN=${SSL_DOMAIN:-localhost}
SSL_DATE=${SSL_DATE:-3650}
SSL_SIZE=${SSL_SIZE:-2048}

## 国家代码(2个字母的代号),默认CN;
CN=${CN:-CN}

SSL_KEY=$SSL_DOMAIN.key
SSL_CSR=$SSL_DOMAIN.csr
SSL_CERT=$SSL_DOMAIN.crt

echo -e "33[32m ---------------------------- 33[0m"
echo -e "33[32m       | 生成 SSL Cert |       33[0m"
echo -e "33[32m ---------------------------- 33[0m"

if [[ -e ./${CA_KEY} ]]; then
    echo -e "33[32m ====> 1. 发现已存在CA私钥,备份"${CA_KEY}"为"${CA_KEY}"-bak,然后重新创建 33[0m"
    mv ${CA_KEY} "${CA_KEY}"-bak
    openssl genrsa -out ${CA_KEY} ${SSL_SIZE}
else
    echo -e "33[32m ====> 1. 生成新的CA私钥 ${CA_KEY} 33[0m"
    openssl genrsa -out ${CA_KEY} ${SSL_SIZE}
fi

if [[ -e ./${CA_CERT} ]]; then
    echo -e "33[32m ====> 2. 发现已存在CA证书,先备份"${CA_CERT}"为"${CA_CERT}"-bak,然后重新创建 33[0m"
    mv ${CA_CERT} "${CA_CERT}"-bak
    openssl req -x509 -sha256 -new -nodes -key ${CA_KEY} -days ${CA_DATE} -out ${CA_CERT} -subj "/C=${CN}/CN=${CA_DOMAIN}"
else
    echo -e "33[32m ====> 2. 生成新的CA证书 ${CA_CERT} 33[0m"
    openssl req -x509 -sha256 -new -nodes -key ${CA_KEY} -days ${CA_DATE} -out ${CA_CERT} -subj "/C=${CN}/CN=${CA_DOMAIN}"
fi

echo -e "33[32m ====> 3. 生成Openssl配置文件 ${SSL_CONFIG} 33[0m"
cat > ${SSL_CONFIG} <<EOM
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
extendedKeyUsage = clientAuth, serverAuth
EOM

if [[ -n ${SSL_TRUSTED_IP} || -n ${SSL_TRUSTED_DOMAIN} ]]; then
    cat >> ${SSL_CONFIG} <<EOM
subjectAltName = @alt_names
[alt_names]
EOM
    IFS=","
    dns=(${SSL_TRUSTED_DOMAIN})
    dns =(${SSL_DOMAIN})
    for i in "${!dns[@]}"; do
      echo DNS.$((i 1)) = ${dns[$i]} >> ${SSL_CONFIG}
    done

    if [[ -n ${SSL_TRUSTED_IP} ]]; then
        ip=(${SSL_TRUSTED_IP})
        for i in "${!ip[@]}"; do
          echo IP.$((i 1)) = ${ip[$i]} >> ${SSL_CONFIG}
        done
    fi
fi

echo -e "33[32m ====> 4. 生成服务SSL KEY ${SSL_KEY} 33[0m"
openssl genrsa -out ${SSL_KEY} ${SSL_SIZE}

echo -e "33[32m ====> 5. 生成服务SSL CSR ${SSL_CSR} 33[0m"
openssl req -sha256 -new -key ${SSL_KEY} -out ${SSL_CSR} -subj "/C=${CN}/CN=${SSL_DOMAIN}" -config ${SSL_CONFIG}

echo -e "33[32m ====> 6. 生成服务SSL CERT ${SSL_CERT} 33[0m"
openssl x509 -sha256 -req -in ${SSL_CSR} -CA ${CA_CERT} 
    -CAkey ${CA_KEY} -CAcreateserial -out ${SSL_CERT} 
    -days ${SSL_DATE} -extensions v3_req 
    -extfile ${SSL_CONFIG}

echo -e "33[32m ====> 7. 证书制作完成 33[0m"
echo
echo -e "33[32m ====> 8. 以YAML格式输出结果 33[0m"
echo "----------------------------------------------------------"
echo "ca_key: |"
cat $CA_KEY | sed 's/^/  /'
echo
echo "ca_cert: |"
cat $CA_CERT | sed 's/^/  /'
echo
echo "ssl_key: |"
cat $SSL_KEY | sed 's/^/  /'
echo
echo "ssl_csr: |"
cat $SSL_CSR | sed 's/^/  /'
echo
echo "ssl_cert: |"
cat $SSL_CERT | sed 's/^/  /'
echo

echo -e "33[32m ====> 9. 附加CA证书到Cert文件 33[0m"
cat ${CA_CERT} >> ${SSL_CERT}
echo "ssl_cert: |"
cat $SSL_CERT | sed 's/^/  /'
echo

echo -e "33[32m ====> 10. 重命名服务证书 33[0m"
echo "cp ${SSL_DOMAIN}.key tls.key"
cp ${SSL_DOMAIN}.key tls.key
echo "cp ${SSL_DOMAIN}.crt tls.crt"
cp ${SSL_DOMAIN}.crt tls.crt

对应官网地址:

https://www.cnrancher.com/docs/rancher/v2.x/cn/install-prepare/self-signed-ssl/#四-生成自签名证书

#执行脚本生成证书

代码语言:txt复制
[rancher@node10 cert]# sh create_self-signed-cert.sh --ssl-domain=saas.rancher.com --ssl-trusted-ip=172.16.17.10,172.16.17.11,172.16.17.12 --ssl-size=2048 --ssl-date=3650
#生成证书
[rancher@node10 cert]# ll
总用量 44
-rw-r--r-- 1 root root 1131 6月  28 20:12 cacerts.pem
-rw-r--r-- 1 root root   17 6月  28 20:12 cacerts.srl
-rw-r--r-- 1 root root 1675 6月  28 20:12 cakey.pem
-rw-r--r-- 1 root root 5309 6月  28 20:11 create_self-signed-cert.sh
-rw-r--r-- 1 root root  366 6月  28 20:12 openssl.cnf
-rw-r--r-- 1 root root 2303 6月  28 20:12 saas.rancher.com.crt
-rw-r--r-- 1 root root 1078 6月  28 20:12 saas.rancher.com.csr
-rw-r--r-- 1 root root 1679 6月  28 20:12 saas.rancher.com.key
-rw-r--r-- 1 root root 2303 6月  28 20:12 tls.crt
-rw-r--r-- 1 root root 1679 6月  28 20:12 tls.key

2、使用kubectl创建tls类型的secrets

#创建命名空间

代码语言:txt复制
[rancher@node10 cert]# kubectl create namespace cattle-system
namespace/cattle-system created

#服务证书和私钥密文

代码语言:txt复制
[rancher@node10 cert]# kubectl -n cattle-system create secret tls tls-rancher-ingress --cert=./tls.crt --key=./tls.key
secret/tls-rancher-ingress created

#ca证书密文

代码语言:txt复制
[rancher@node10 cert]# kubectl -n cattle-system create secret generic tls-ca --from-file=cacerts.pem
secret/tls-ca created

3、安装rancher server

#使用helm安装rancher HA

代码语言:txt复制
[rancher@node10 cert]# helm install rancher-stable/rancher --name rancher2 --namespace cattle-system --set hostname=saas.rancher.com --set ingress.tls.source=secret --set privateCA=true
NAME:   rancher2
LAST DEPLOYED: Fri Jun 28 20:14:53 2019
NAMESPACE: cattle-system
STATUS: DEPLOYED

RESOURCES:
==> v1/ClusterRoleBinding
NAME      AGE
rancher2  0s

==> v1/Deployment
NAME      READY  UP-TO-DATE  AVAILABLE  AGE
rancher2  0/3    3           0          0s

==> v1/Pod(related)
NAME                       READY  STATUS             RESTARTS  AGE
rancher2-57c84cfc98-db5r9  0/1    ContainerCreating  0         0s
rancher2-57c84cfc98-f257c  0/1    ContainerCreating  0         0s
rancher2-57c84cfc98-hxcwn  0/1    ContainerCreating  0         0s

==> v1/Service
NAME      TYPE       CLUSTER-IP  EXTERNAL-IP  PORT(S)  AGE
rancher2  ClusterIP  10.43.12.8  <none>       80/TCP   0s

==> v1/ServiceAccount
NAME      SECRETS  AGE
rancher2  1        0s

==> v1beta1/Ingress
NAME      HOSTS             ADDRESS  PORTS  AGE
rancher2  saas.rancher.com  80, 443  0s


NOTES:
Rancher Server has been installed.

NOTE: Rancher may take several minutes to fully initialize. Please standby while Certificates are being issued and Ingress comes up.

Check out our docs at https://rancher.com/docs/rancher/v2.x/en/

Browse to https://saas.rancher.com

Happy Containering!

#查看创建

代码语言:txt复制
[root@k8s-master03 ~]# kubectl get ns
NAME                         STATUS   AGE
cattle-global-data           Active   2d5h
cattle-system                Active   2d5h

[root@k8s-master03 ~]# kubectl get ingress -n cattle-system           
NAME       HOSTS                 ADDRESS   PORTS     AGE
rancher2   rancher.sumapay.com             80, 443   57m

[root@k8s-master03 ~]# kubectl get service -n cattle-system        
NAME       TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
rancher2   ClusterIP   10.111.16.80   <none>        80/TCP    54m

[root@k8s-master03 ~]# kubectl get serviceaccount -n cattle-system           
NAME       TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
rancher2   ClusterIP   10.111.16.80   <none>        80/TCP    51m

[root@k8s-master03 ~]# kubectl get ClusterRoleBinding -n cattle-system -l app=rancher2 -o wide
NAME       AGE   ROLE                        USERS   GROUPS   SERVICEACCOUNTS
rancher2   58m   ClusterRole/cluster-admin                    cattle-system/rancher2

[root@k8s-master03 ~]# kubectl get pods -n cattle-system 
NAME                                    READY   STATUS    RESTARTS   AGE
cattle-cluster-agent-594b8f79bb-pgmdt   1/1     Running   5          2d2h
cattle-node-agent-lg44f                 1/1     Running   0          2d2h
cattle-node-agent-zgdms                 1/1     Running   5          2d2h
rancher2-9774897c-622sc                 1/1     Running   0          50m
rancher2-9774897c-czxxx                 1/1     Running   0          50m
rancher2-9774897c-sm2n5                 1/1     Running   0          50m

[root@k8s-master03 ~]# kubectl get deployment -n cattle-system 
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
cattle-cluster-agent   1/1     1            1           2d4h
rancher2               3/3     3            3           55m
!!!!前方有坑

问题:登录rancher控制台后发现cattle-cluster-agent无法启动,查看日志显示,没办法解析zhengtyun.rancher.com

代码语言:txt复制
INFO: Environment: CATTLE_ADDRESS=10.42.1.6 CATTLE_CA_CHECKSUM=9ef62078651045e219468d407377131eb6fedd516950458af80ba3c0f53e2678 CATTLE_CLUSTER=true CATTLE_INTERNAL_ADDRESS= CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-5b5d6c584c-dfw65 CATTLE_SERVER=https://zhengtyun.rancher.com
2019/7/1 下午3:20:18 INFO: Using resolv.conf: nameserver 10.43.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local options ndots:5
2019/7/1 下午3:20:18 ERROR: https://zhengtyun.rancher.com/ping is not accessible (Could not resolve host: zhengtyun.rancher.com)

解决办法:把cattle-cluster-agent网络更改为主机模式,完成后再执行下面绑定hosts

4、为Agent Pod添加主机别名(/etc/hosts)

如果你没有内部DNS服务器而是通过添加/etc/hosts主机别名的方式指定的Rancher server域名,那么不管通过哪种方式(自定义、导入、Host驱动等)创建K8S集群,K8S集群运行起来之后,因为cattle-cluster-agent Pod和cattle-node-agent无法通过DNS记录找到Rancher server,最终导致无法通信。

解决方法

可以通过给cattle-cluster-agent Pod和cattle-node-agent添加主机别名(/etc/hosts),让其可以正常通信(前提是IP地址可以互通)。

代码语言:txt复制
#cattle-cluster-agent pod
kubectl -n cattle-system 
patch deployments cattle-cluster-agent --patch '{
    "spec": {
        "template": {
            "spec": {
                "hostAliases": [
                    {
                        "hostnames":
                        [
                            "saas.rancher.com"
                        ],
                            "ip": "172.16.17.72"
                    }
                ]
            }
        }
    }
}'

#cattle-node-agent pod
kubectl -n cattle-system 
patch  daemonsets cattle-node-agent --patch '{
    "spec": {
        "template": {
            "spec": {
                "hostAliases": [
                    {
                        "hostnames":
                        [
                            "saas.rancher.com"
                        ],
                            "ip": "172.16.17.72"
                    }
                ]
            }
        }
    }
}'
!!!!最后cattle-cluster-agent记得换回原来的网络模式

至此,rancher HA已部署完毕,由于不是NodePort形式,在没有部署ingress-controller情况下,我们还不能直接去访问rancher服务。

ingress-controller部署请参考traefik部署与使用。

0 人点赞