研发工程师玩转Kubernetes——就绪探针(Readiness Probe)和服务(Service)

2023-08-15 08:12:16 浏览数 (1)

在《研发工程师玩转Kubernetes——启动、存活和就绪探针》中,我们讲了就绪探针和服务之间的特殊关系。就绪探针检测失败并不代表整个程序处于“非存活”状态,可能只是短暂临时的不可以提供服务,比如CPU阶段性占满,导致就绪探针检测超时而导致失败。这个时候就绪探针并不会向存活探针那样尝试重启容器,而只是简单的把它从何它关联的Service中摘除。

带Readiness Probe的Nginx

代码语言:javascript复制
apiVersion: apps/v1
kind: Deployment
metadata:
  name: readiness-nginx-deployment
spec:
  selector:
    matchLabels:
      app: readiness-nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: readiness-nginx
    spec:
      containers:
      - name: readiness-nginx-container
        image: nginx
        ports:
        - containerPort: 80
        command: ["/bin/sh", "-c", "sleep 3; touch /tempdir/readiness-nginx; while true; do sleep 5; done"]
        volumeMounts:
        - name:  probe-volume
          mountPath:  /tempdir
        readinessProbe:
          exec:
            command:
            - cat
            - /tempdir/readiness-nginx
          initialDelaySeconds: 2
          failureThreshold: 6
          periodSeconds: 1
          successThreshold: 1
      volumes:
      - name: probe-volume
        emptyDir: 
          medium: Memory
          sizeLimit: 1Gi

Nginx关联的Service

代码语言:javascript复制
kind: Service
apiVersion: v1
metadata:
  name: readiness-nginx-service
spec:
  selector:
    app: readiness-nginx
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80

实验

创建上述组件,可以看到启动了下面的Pod

代码语言:javascript复制
kubectl get pod -o wide
代码语言:javascript复制
NAME                                          READY   STATUS    RESTARTS   AGE   IP             NODE      NOMINATED NODE   READINESS GATES
readiness-nginx-deployment-57b7fd5644-7x7wc   1/1     Running   0          25s   10.1.43.223    ubuntuc   <none>           <none>
readiness-nginx-deployment-57b7fd5644-lhszp   1/1     Running   0          25s   10.1.209.155   ubuntub   <none>           <none>

Service也绑定了这些IP。

代码语言:javascript复制
kubectl describe endpoints readiness-nginx-service 
代码语言:javascript复制
Name:         readiness-nginx-service
Namespace:    default
Labels:       <none>
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2023-08-14T14:35:33Z
Subsets:
  Addresses:          10.1.209.155,10.1.43.223
  NotReadyAddresses:  <none>
  Ports:
    Name     Port  Protocol
    ----     ----  --------
    <unset>  80    TCP

Events:  <none>

现在我们挑选一个容器(readiness-nginx-deployment-57b7fd5644-7x7wc,10.1.43.223),观察该容器的Event状态:

代码语言:javascript复制
kubectl describe pod readiness-nginx-deployment-57b7fd5644-7x7wc
代码语言:javascript复制
Name:             readiness-nginx-deployment-57b7fd5644-7x7wc
Namespace:        default
Priority:         0
Service Account:  default
Node:             ubuntuc/172.22.247.176
Start Time:       Mon, 14 Aug 2023 14:35:27  0000
Labels:           app=readiness-nginx
                  pod-template-hash=57b7fd5644
Annotations:      cni.projectcalico.org/containerID: c475d3e82ff0d5adbd35252ab990608ad75955f8d0862bb8b0c54ee60a0878eb
                  cni.projectcalico.org/podIP: 10.1.43.223/32
                  cni.projectcalico.org/podIPs: 10.1.43.223/32
Status:           Running
IP:               10.1.43.223
IPs:
  IP:           10.1.43.223
Controlled By:  ReplicaSet/readiness-nginx-deployment-57b7fd5644
Containers:
  readiness-nginx-container:
    Container ID:  containerd://5d82d8467bc6e0c8151e40ee3258d54bffec8659bcdad4a441848ea8f77a3223
    Image:         nginx
    Image ID:      docker.io/library/nginx@sha256:67f9a4f10d147a6e04629340e6493c9703300ca23a2f7f3aa56fe615d75d31ca
    Port:          80/TCP
    Host Port:     0/TCP
    Command:
      /bin/sh
      -c
      sleep 3; touch /tempdir/readiness-nginx; while true; do sleep 5; done
    State:          Running
      Started:      Mon, 14 Aug 2023 14:35:30  0000
    Ready:          True
    Restart Count:  0
    Readiness:      exec [cat /tempdir/readiness-nginx] delay=2s timeout=1s period=1s #success=1 #failure=6
    Environment:    <none>
    Mounts:
      /tempdir from probe-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-c4tcl (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  probe-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  1Gi
  kube-api-access-c4tcl:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  3m53s                  default-scheduler  Successfully assigned default/readiness-nginx-deployment-57b7fd5644-7x7wc to ubuntuc
  Normal   Pulling    3m53s                  kubelet            Pulling image "nginx"
  Normal   Pulled     3m50s                  kubelet            Successfully pulled image "nginx" in 2.489885583s (2.489893984s including waiting)
  Normal   Created    3m50s                  kubelet            Created container readiness-nginx-container
  Normal   Started    3m50s                  kubelet            Started container readiness-nginx-container
  Warning  Unhealthy  3m48s (x2 over 3m48s)  kubelet            Readiness probe failed: cat: /tempdir/readiness-nginx: No such file or directory

可以看到就绪探针在第3次检测时就存在了,这个时候Pod的Ready和ContainersReady都是True的状态。

就绪->非就绪

现在我们删除就绪标志文件

代码语言:javascript复制
kubectl exec pods/readiness-nginx-deployment-57b7fd5644-7x7wc --container readiness-nginx-container -- rm /tempdir/readiness-nginx

再观察其状态,可以发现

代码语言:javascript复制
Name:             readiness-nginx-deployment-57b7fd5644-7x7wc
Namespace:        default
Priority:         0
Service Account:  default
Node:             ubuntuc/172.22.247.176
Start Time:       Mon, 14 Aug 2023 14:35:27  0000
Labels:           app=readiness-nginx
                  pod-template-hash=57b7fd5644
Annotations:      cni.projectcalico.org/containerID: c475d3e82ff0d5adbd35252ab990608ad75955f8d0862bb8b0c54ee60a0878eb
                  cni.projectcalico.org/podIP: 10.1.43.223/32
                  cni.projectcalico.org/podIPs: 10.1.43.223/32
Status:           Running
IP:               10.1.43.223
IPs:
  IP:           10.1.43.223
Controlled By:  ReplicaSet/readiness-nginx-deployment-57b7fd5644
Containers:
  readiness-nginx-container:
    Container ID:  containerd://5d82d8467bc6e0c8151e40ee3258d54bffec8659bcdad4a441848ea8f77a3223
    Image:         nginx
    Image ID:      docker.io/library/nginx@sha256:67f9a4f10d147a6e04629340e6493c9703300ca23a2f7f3aa56fe615d75d31ca
    Port:          80/TCP
    Host Port:     0/TCP
    Command:
      /bin/sh
      -c
      sleep 3; touch /tempdir/readiness-nginx; while true; do sleep 5; done
    State:          Running
      Started:      Mon, 14 Aug 2023 14:35:30  0000
    Ready:          False
    Restart Count:  0
    Readiness:      exec [cat /tempdir/readiness-nginx] delay=2s timeout=1s period=1s #success=1 #failure=6
    Environment:    <none>
    Mounts:
      /tempdir from probe-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-c4tcl (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  probe-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  1Gi
  kube-api-access-c4tcl:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                 From     Message
  ----     ------     ----                ----     -------
  Warning  Unhealthy  7s (x22 over 6m6s)  kubelet  Readiness probe failed: cat: /tempdir/readiness-nginx: No such file or directory

可以看到Ready和ContainersReady都变成了False状态。 我们再观察Service

代码语言:javascript复制
kubectl describe endpoints readiness-nginx-service 
代码语言:javascript复制
Name:         readiness-nginx-service
Namespace:    default
Labels:       <none>
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2023-08-14T14:41:18Z
Subsets:
  Addresses:          10.1.209.155
  NotReadyAddresses:  10.1.43.223
  Ports:
    Name     Port  Protocol
    ----     ----  --------
    <unset>  80    TCP

Events:  <none>

可以看到被删除了就绪探针检测文件的Pod被从Service中摘掉了。

非就绪->就绪

我们再将检测文件还原

代码语言:javascript复制
kubectl exec pods/readiness-nginx-deployment-57b7fd5644-7x7wc --container readiness-nginx-container -- touch /tempdir/readiness-nginx

观察对应Pod的状态,其Ready和ContainersReady又变成了True状态。

代码语言:javascript复制
Name:             readiness-nginx-deployment-57b7fd5644-7x7wc
Namespace:        default
Priority:         0
Service Account:  default
Node:             ubuntuc/172.22.247.176
Start Time:       Mon, 14 Aug 2023 14:35:27  0000
Labels:           app=readiness-nginx
                  pod-template-hash=57b7fd5644
Annotations:      cni.projectcalico.org/containerID: c475d3e82ff0d5adbd35252ab990608ad75955f8d0862bb8b0c54ee60a0878eb
                  cni.projectcalico.org/podIP: 10.1.43.223/32
                  cni.projectcalico.org/podIPs: 10.1.43.223/32
Status:           Running
IP:               10.1.43.223
IPs:
  IP:           10.1.43.223
Controlled By:  ReplicaSet/readiness-nginx-deployment-57b7fd5644
Containers:
  readiness-nginx-container:
    Container ID:  containerd://5d82d8467bc6e0c8151e40ee3258d54bffec8659bcdad4a441848ea8f77a3223
    Image:         nginx
    Image ID:      docker.io/library/nginx@sha256:67f9a4f10d147a6e04629340e6493c9703300ca23a2f7f3aa56fe615d75d31ca
    Port:          80/TCP
    Host Port:     0/TCP
    Command:
      /bin/sh
      -c
      sleep 3; touch /tempdir/readiness-nginx; while true; do sleep 5; done
    State:          Running
      Started:      Mon, 14 Aug 2023 14:35:30  0000
    Ready:          True
    Restart Count:  0
    Readiness:      exec [cat /tempdir/readiness-nginx] delay=2s timeout=1s period=1s #success=1 #failure=6
    Environment:    <none>
    Mounts:
      /tempdir from probe-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-c4tcl (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  probe-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  1Gi
  kube-api-access-c4tcl:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From     Message
  ----     ------     ----                  ----     -------
  Warning  Unhealthy  3m5s (x262 over 13m)  kubelet  Readiness probe failed: cat: /tempdir/readiness-nginx: No such file or directory

Service也重新将其加回来了。

代码语言:javascript复制
Name:         readiness-nginx-service
Namespace:    default
Labels:       <none>
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2023-08-14T14:48:23Z
Subsets:
  Addresses:          10.1.209.155,10.1.43.223
  NotReadyAddresses:  <none>
  Ports:
    Name     Port  Protocol
    ----     ----  --------
    <unset>  80    TCP

Events:  <none>

0 人点赞