利用 Kruise Rollouts 对 Kubernetes 资源实现金丝雀发布
介绍
Kruise Rollouts 是一个旁路组件,它为一系列 Kubernetes 工作负载(如 Deployment and CloneSet)提供高级部署功能,如金丝雀、流量路由和渐进式交付功能
Kruise Rollout 与入口控制器和服务网格集成,利用其流量整形功能在更新期间逐渐将流量转移到新版本。此外,在发布期间,可以使用业务 Pod 指标分析来确定发布是继续发布还是暂停发布
Features
功能性:
- 支持 Deployment/CloneSet 的多批次交付
- 在 rollout 期间支持 Nginx/ALB/Istio 流量路由控制
灵活性:
- 支持在 rollout 期间纵向扩展/缩减到工作负载
- 可以直接应用于新创建或现有的工作负载对象
- 可以在不需要的时候随时摆脱,而无需考虑不可用的工作负载和流量问题
- 可以与其他原生/第三方 Kubernetes 控制器/运营商合作,例如 HPA 和 WorkloadSpread
非入侵:
- 不会入侵本机工作负载控制器
- 不会替换用户定义的工作负载和流量配置
扩展:
- 通过插件代码轻松扩展到其他流量路由类型或工作负载类型
Easy-integration:
- 轻松与经典或 GitOps 风格的基于 Kubernetes 的 PaaS 集成InstallationInstall Kubernetes Cluster, requires Kubernetes version >= 1.19. (Optional, If Use CloneSet) Helm installation of OpenKruise, Since v1.1.0, Reference Install OpenKruise.https://openkruise.io/docs/installation
Install with helm:
代码语言:javascript复制$ helm repo add openkruise https://openkruise.github.io/charts/
"openkruise" has been added to your repositories
$ helm repo update
$ helm install kruise-rollout openkruise/kruise-rollout --version 0.1.0 -n rollout
NAME: kruise-rollout
LAST DEPLOYED: Thu Dec 29 06:15:44 2022
NAMESPACE: rollout
STATUS: deployed
REVISION: 1
TEST SUITE: None
$ kubectl get po -n kruise-rollout
NAME READY STATUS RESTARTS AGE
kruise-rollout-controller-manager-745b499f68-s2k2r 1/1 Running 0 2m49s
基本用法
本示例将通过 Canary Release, Deployment, Nginx Ingress 来演示 Kruise Rollout 的各种概念和功能。
Requirements:
- Helm installation of Kruise Rollout, Reference Install Kruise Rollout.
- Helm installation of Nginx Ingress Controller, (e.g. helm upgrade --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace ingress-nginx)
本示例环境:
代码语言:javascript复制$ helm version
version.BuildInfo{Version:"v3.9.2", GitCommit:"1addefbfe665c350f4daf868a9adc5600cc064fd", GitTreeState:"clean", GoVersion:"go1.17.12"}
$ kubectl get no
NAME STATUS ROLES AGE VERSION
ubuntu Ready control-plane 154d v1.24.3
Deploy Business Application (Contains Deployment, Service and Ingress)
这是 echoserver 应用程序的示例,其中包含 ingress, service, 和 deployment crd 资源,如下echoserver.yaml 所示:
代码语言:javascript复制apiVersion: apps/v1
kind: Deployment
metadata:
name: echoserver
labels:
app: echoserver
spec:
replicas: 5
selector:
matchLabels:
app: echoserver
template:
metadata:
labels:
app: echoserver
spec:
containers:
- name: echoserver
# mac m1 should choics image can support arm64,such as image e2eteam/echoserver:2.2-linux-arm64
image: cilium/echoserver:1.10.2
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
env:
- name: PORT
value: '8080'
---
apiVersion: v1
kind: Service
metadata:
name: echoserver
labels:
app: echoserver
spec:
ports:
- port: 80
targetPort: 8080
protocol: TCP
name: http
selector:
app: echoserver
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: echoserver
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: echoserver.example.com
http:
paths:
- backend:
service:
name: echoserver
port:
number: 80
path: /apis/echo
pathType: Exact
代码语言:javascript复制$ kubectl apply -f echoserver.yaml
deployment.apps/echoserver created
service/echoserver created
ingress.networking.k8s.io/echoserver created
After deployed in k8s cluster, it can be accessed via nginx ingress, as follows:
代码语言:javascript复制$ kubectl get po,svc,ingress
NAME READY STATUS RESTARTS AGE
pod/echoserver-6d6dfb59f6-6gq2l 1/1 Running 0 59s
pod/echoserver-6d6dfb59f6-fl6n2 1/1 Running 0 59s
pod/echoserver-6d6dfb59f6-qtljl 1/1 Running 0 59s
pod/echoserver-6d6dfb59f6-t8wl4 1/1 Running 0 59s
pod/echoserver-6d6dfb59f6-znbl6 1/1 Running 0 59s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/echoserver ClusterIP 10.108.15.140 <none> 80/TCP 59s
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress.networking.k8s.io/echoserver <none> echoserver.example.com 80 59s
$ curl http://echoserver.example.com/apis/echo
Hostname: echoserver-6d6dfb59f6-znbl6
Pod Information:
-no pod information available-
Server values:
server_version=nginx: 1.13.3 - lua: 10008
Request Information:
client_address=::ffff:10.0.0.93
method=GET
real path=/apis/echo
query=
request_version=1.1
request_scheme=http
request_uri=http://echoserver.example.com:8080/apis/echo
Request Headers:
accept=*/*
host=echoserver.example.com
user-agent=curl/7.68.0
x-forwarded-for=192.168.86.129
x-forwarded-host=echoserver.example.com
x-forwarded-port=80
x-forwarded-proto=http
x-forwarded-scheme=http
x-real-ip=192.168.86.129
x-request-id=04da9c6438639791d70098c139ed2c3e
x-scheme=http
Request Body:
-no body in request-
Deploy Kruise Rollout CRD
Kruise Rollout CRD 定义了 deployment rollout 发布过程,如下是一个金丝雀发布的例子,第一步是 20% 的 pod,以及路由 5% 的 traffics 到新版本
创建 rollout.yaml:
代码语言:javascript复制apiVersion: rollouts.kruise.io/v1alpha1
kind: Rollout
metadata:
name: rollouts-demo-1
# The rollout resource needs to be in the same namespace as the corresponding workload(deployment, cloneSet)
namespace: kruise-rollout
spec:
objectRef:
# rollout of published workloads, currently only supports Deployment, CloneSet
workloadRef:
apiVersion: apps/v1
kind: Deployment
name: echoserver
strategy:
canary:
# canary published, e.g. 20%, 40%, 60% ...
steps:
# routing 5% traffics to the new version
- weight: 5
# Manual confirmation of the release of the remaining pods
pause: {}
# optional, The first step of released replicas. If not set, the default is to use 'weight', as shown above is 5%.
replicas: 20%
trafficRoutings:
# echoserver service name
- service: echoserver
# nginx ingress
type: nginx
# echoserver ingress name, current only nginx ingress
ingress:
name: echoserver
代码语言:javascript复制$ kubectl apply -f rollout.yaml
rollout.rollouts.kruise.io/rollouts-demo created
注,字段 type: nginx 上 github 项目没有,需要自行添加,否则创建资源报错!
Upgrade echoserver (Version 1.10.2 -> 1.10.3)
将部署中的镜像版本从 1.10.2 改为 1.10.3,然后 kubectl apply -f deployment.yaml 到 k8s 集群,如下所示。
Kruise Rollout Controller 会监听上述行为并在 webhook 中设置部署 paused=true,然后根据用户定义的部署、服务和入口配置生成相应的 canary 资源。
如下图所示,replicas(5)*replicas(20%)=1 新版本的 Pod 被发布,5% 的流量被路由到新版本
代码语言:javascript复制apiVersion: apps/v1
kind: Deployment
metadata:
name: echoserver
labels:
app: echoserver
spec:
replicas: 5
selector:
matchLabels:
app: echoserver
template:
metadata:
labels:
app: echoserver
spec:
containers:
- name: echoserver
# mac m1 should choics image can support arm64,such as image e2eteam/echoserver:2.2-linux-arm64
image: cilium/echoserver:1.10.3
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
env:
- name: PORT
value: '8080'
查看资源
代码语言:javascript复制$ kubectl get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
echoserver 5/5 0 5 60m
echoserver-dqz 1/1 1 1 26s
$ kubectl describe deploy echoserver-dqz
Name: echoserver-dqz
Namespace: monitor
CreationTimestamp: Thu, 29 Dec 2022 07:26:42 0000
Labels: app=echoserver
rollouts.kruise.io/canary-deployment=echoserver
Annotations: batchrelease.rollouts.kruise.io/control-info:
{"apiVersion":"rollouts.kruise.io/v1alpha1","kind":"BatchRelease","name":"rollouts-demo","uid":"40e91b79-1e56-4230-b402-f2ae1ceaac25","con...
deployment.kubernetes.io/revision: 1
rollouts.kruise.io/in-progressing: {"RolloutName":"rollouts-demo"}
Selector: app=echoserver
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=echoserver
Containers:
echoserver:
Image: cilium/echoserver:1.10.3
...
$ kubectl get po
NAME READY STATUS RESTARTS AGE
echoserver-6d6dfb59f6-6gq2l 1/1 Running 0 67m
echoserver-6d6dfb59f6-fl6n2 1/1 Running 0 67m
echoserver-6d6dfb59f6-qtljl 1/1 Running 0 67m
echoserver-6d6dfb59f6-t8wl4 1/1 Running 0 67m
echoserver-6d6dfb59f6-znbl6 1/1 Running 0 67m
echoserver-dqz-7dbf8796d5-mmnmk 1/1 Running 0 7m12s
$ kubectl get ing
NAME CLASS HOSTS ADDRESS PORTS AGE
echoserver <none> echoserver.example.com 80 65m
echoserver-canary <none> echoserver.example.com 80 4m33s
$ kubectl describe ingress echoserver-canary
Name: echoserver-canary
Labels: <none>
Namespace: monitor
Address:
Ingress Class: <none>
Default backend: <default>
Rules:
Host Path Backends
---- ---- --------
echoserver.example.com
/apis/echo echoserver-canary:80 (10.0.0.95:8080)
Annotations: kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/canary: true
nginx.ingress.kubernetes.io/canary-weight: 5
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Sync 10m nginx-ingress-controller Scheduled for sync
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
echoserver ClusterIP 10.108.15.140 <none> 80/TCP 65m
echoserver-canary ClusterIP 10.103.69.160 <none> 80/TCP 5m18s
批准推出(发布成功)
Rollout 状态显示,当前的 rollout 状态是 StepPaused,这意味着前 20% 的 Pod 被发布成功,5%的流量被路由到新版本上
之后,开发人员可以使用一些其他方法,如 Prometheus metrics 业务指标,确定发布符合预期,然后通过 kubectl-kruise(https://github.com/openkruise/kruise-tools) rollout approve rollout/rollouts-demo -n default 和等待部署发布完成,继续后续发布,如下所示:
代码语言:javascript复制$ kubectl get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
echoserver 5/5 5 5 77m
$ kubectl get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
echoserver <none> echoserver.example.com 80 78m
发布失败
发布不正常的版本:
在发布过程中,经常会出现发布失败的情况,比如下面的图片拉动失败:
代码语言:javascript复制apiVersion: apps/v1
kind: Deployment
metadata:
name: echoserver
...
spec:
...
containers:
- name: echoserver
# image not found
image: cilium/echoserver:failed
imagePullPolicy: IfNotPresent
在这一点上,rollout status 仍然是 StepUpgrade 状态,通过检查部署和 pods 状态,你可以看到这是因为镜像拉动失败
a. 回滚到 V1 版本
最常见的方式是回滚,在这里不需要对 rollout crd 做任何事情,只需要将部署配置回滚到之前的版本,如下回滚部署中的镜像版本到 1.10.2,然后 kubectl apply -f 到 k8s 集群:
代码语言:javascript复制apiVersion: apps/v1
kind: Deployment
metadata:
name: echoserver
...
spec:
...
containers:
- name: echoserver
# m1 should rollback to e2eteam/echoserver:2.2-linux-arm64
image: cilium/echoserver:1.10.2
imagePullPolicy: IfNotPresent
b. 持续发布 V3 版本
对于一些不能回滚的场景,可以连续发布 V3 版本。如下,将部署镜像地址改为 1.10.3,然后将 kubectl apply -f 到 k8s 集群。发布完成后,只需执行Approve Rollout步骤:
代码语言:javascript复制apiVersion: apps/v1
kind: Deployment
metadata:
name: echoserver
...
spec:
...
containers:
- name: echoserver
# m1 can choice image e2eteam/echoserver:2.2-linux-arm
image: cilium/echoserver:1.10.3
imagePullPolicy: IfNotPresent