问题描述
在业务服务有更新镜像进行业务上线时, 会出现Pod 一直处于Pedding状态. 一直更新失败。
排查思路
- 先检查Pod 启动的阶段发生了什么问题:
kubectl describe po -n {namespace}
发现是挂在pv超时
`Unable to mount volumes for pod “xxx-test-xx-0_ns-prj57r7d-1091927-test(52bcf47a-2354-11eb-a92c-525400b26555)”: timeout expired waiting for volumes to attach or mount for pod “ns-prj57r7d-1091927-test”/“xxx-test-xx-0”. list of unmounted volumes=pretty. list of unattached volumes=pretty cgroup shm xx filebeatdata applogdata filebeatconfig default-token-lgrlv
- 在pod启动流程里,在pod启动先挂载pv,块存储的pv 会有2个动作一个是attach 一个mount, attach阶段是调用cbs 去挂载磁盘到node节点,
kubectl get pv pvc-845bdb98-7a9f-4156-8aa2-8c7c08b65e90 -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
Provisioner_Id: ""
kubernetes.io/createdby: qcloud-cbs-dynamic-provisioner
kubernetes.io/disk-delete-with-cluster-deletion: "true"
pv.kubernetes.io/provisioned-by: cloud.tencent.com/qcloud-cbs
creationTimestamp: "2020-07-30T07:14:41Z"
finalizers:
- kubernetes.io/pv-protection
labels:
failure-domain.beta.kubernetes.io/region: bj
failure-domain.beta.kubernetes.io/zone: "800001"
name: pvc-845bdb98-7a9f-4156-8aa2-8c7c08b65e90
resourceVersion: "11784521"
selfLink: /api/v1/persistentvolumes/pvc-845bdb98-7a9f-4156-8aa2-8c7c08b65e90
uid: 1d9380a9-8218-4d28-a79c-e77921a929bb
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 20Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: cbs-test1-0
namespace: default
resourceVersion: "11784500"
uid: 845bdb98-7a9f-4156-8aa2-8c7c08b65e90
persistentVolumeReclaimPolicy: Delete
qcloudCbs:
cbsDiskId: disk-xx
storageClassName: cbs
volumeMode: Filesystem
status:
phase: Bound
- 接下来正常流程会执行第二个阶段 ,查看mount信息:
mount |grep disk-oeuba9ig
看看是否存在,不存在继续查看磁盘情况: cd /dev/disk/by-id
查看磁盘是否有挂在成功
解决方案:
临时创建软链接ln -s /dev/vde /dev/disk/by-id/virtio-disk-xx