docker hang问题排查

2023-03-06 11:18:23 浏览数 (1)

转载自组内同事

1. 背景

最近升级了一版kubelet,修复因kubelet删除Pod慢导致平台删除集群超时的问题。在灰度redis隔离集群的时候,发现升级kubelet并重启服务后,少量宿主状态变成了NotReady,并且回滚kubelet至之前版本,宿主状态仍然是NotReady。查看宿主状态时提示 ‘container runtime is down’ ,根据经验,此时一般就是容器运行时出了问题。弹性云使用的容器运行时是docker,我们就去检查docker的状态,检测结果如下:

  • docker ps 查看所有容器状态,执行正常
  • docker inspect 查看某一容器详细状态,执行阻塞

典型的docker hang死行为。因为我们最近在升级docker版本,存量宿主docker的版本为1.13.1,并且在逐步升级至18.06.3,新宿主的docker版本都是18.06.3。docker hang死问题在1.13.1版本上表现得更彻底,在执行docker ps的时候就已经hang死了,一旦某个容器出了问题,docker就处于无响应状态;而docker 18.06.3做了一点小小的优化,在执行docker ps时去掉了针对容器级别的加锁操作,但是docker inspect依然会加容器锁,因此某一个容器出现问题,并不会造成docker服务不可响应,受影响的也仅仅是该容器,无法执行任何操作。

至于为什么以docker ps与docker inspect为指标检查docker状态,因为kubelet就是依赖这两个docker API获取容器状态。

所以,现在问题有二:

  1. docker hang死的根因是什么?
  2. docker hang死时,为什么重启kubelet,会导致宿主状态变为NotReady?

2.重启kubelet变更宿主状态

kubelet重启后宿主状态从Ready变为NotReady,这个问题相较docker hang死而言,没有那么复杂,所以我们先排查这个问题。

kubelet针对宿主会设置多个Condition,表明宿主当前所处的状态,比如宿主内存是否告急、线程数是否告急,以及宿主是否就绪。其中ReadyCondition表明宿主是否就绪,kubectl查看宿主状态时,展示的Statue信息就是ReadCondition的内容,常见的状态及其含义定义如下:

  • Ready状态:表明当前宿主状态一切OK,能正常响应Pod事件
  • NotReady状态:表明宿主的kubelet仍在运行,但是此时已经无法处理Pod事件。NotReady绝大多数情况都是容器运行时出了问题
  • Unknown状态:表明宿主kubelet已停止运行

kubelet定义的ReadyCondition的判定条件如下:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

// defaultNodeStatusFuncs is a factory that generates the default set of // setNodeStatus funcs func (kl *Kubelet) defaultNodeStatusFuncs() []func(*v1.Node) error { ...... setters = append(setters, nodestatus.OutOfDiskCondition(kl.clock.Now, kl.recordNodeStatusEvent), nodestatus.MemoryPressureCondition(kl.clock.Now, kl.evictionManager.IsUnderMemoryPressure, kl.recordNodeStatusEvent), nodestatus.DiskPressureCondition(kl.clock.Now, kl.evictionManager.IsUnderDiskPressure, kl.recordNodeStatusEvent), nodestatus.PIDPressureCondition(kl.clock.Now, kl.evictionManager.IsUnderPIDPressure, kl.recordNodeStatusEvent), nodestatus.ReadyCondition(kl.clock.Now, kl.runtimeState.runtimeErrors, kl.runtimeState.networkErrors, validateHostFunc, kl.containerManager.Status, kl.recordNodeStatusEvent), nodestatus.VolumesInUse(kl.volumeManager.ReconcilerStatesHasBeenSynced, kl.volumeManager.GetVolumesInUse), // TODO(mtaufen): I decided not to move this setter for now, since all it does is send an event // and record state back to the Kubelet runtime object. In the future, I'd like to isolate // these side-effects by decoupling the decisions to send events and partial status recording // from the Node setters. kl.recordNodeSchedulableEvent, ) return setters }

深入nodestatus.ReadyCondition的实现可以发现,宿主是否Ready取决于很多条件,包含运行时判定、网络判定、基本资源判定等。这里我们只需关注运行时判定即可:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

func (s *runtimeState) runtimeErrors() []string { s.RLock() defer s.RUnlock() var ret []string if !s.lastBaseRuntimeSync.Add(s.baseRuntimeSyncThreshold).After(time.Now()) { // 1 ret = append(ret, "container runtime is down") } if s.internalError != nil { ret = append(ret, s.internalError.Error()) } for _, hc := range s.healthChecks { // 2 if ok, err := hc.fn(); !ok { ret = append(ret, fmt.Sprintf("%s is not healthy: %v", hc.name, err)) } } return ret }

当出现如下两种状况之一时,则判定运行时检查不通过:

  1. 距最近一次运行时同步操作的时间间隔超过指定阈值(默认30s)
  2. 运行时健康检查未通过

那么,当时宿主的NotReady是由哪种状况引起的呢?结合kubelet日志分析,kubelet每隔5s就输出一条日志:

1 2 3 4 5

...... I0715 10:43:28.049240 16315 kubelet.go:1835] skipping pod synchronization - [container runtime is down] I0715 10:43:33.049359 16315 kubelet.go:1835] skipping pod synchronization - [container runtime is down] I0715 10:43:38.049492 16315 kubelet.go:1835] skipping pod synchronization - [container runtime is down] ......

因此,状况1是宿主NotReady的元凶。

我们继续分析为什么kubelet没有按照预期设置lastBaseRuntimeSync。kubelet启动时会创建一个goroutine,并在该goroutine中循环设置lastBaseRuntimeSync,循环如下:

1 2 3 4 5 6 7 8 9 10 11 12 13 14

func (kl *Kubelet) Run(updates <-chan kubetypes.PodUpdate) { ...... go wait.Until(kl.updateRuntimeUp, 5*time.Second, wait.NeverStop) ...... } func (kl *Kubelet) updateRuntimeUp() { kl.updateRuntimeMux.Lock() defer kl.updateRuntimeMux.Unlock() ...... kl.oneTimeInitializer.Do(kl.initializeRuntimeDependentModules) kl.runtimeState.setRuntimeSync(kl.clock.Now()) }

正常情况下,kubelet每隔5s会将lastBaseRuntimeSync设置为当前时间,而宿主状态异常时,这个时间戳一直未被更新。也即updateRuntimeUp一直被阻塞在设置lastBaseRuntimeSync之前的某一步。我们只需逐个排查updateRuntimeUp内的函数调用即可,具体过程不再展示,最终的函数调用链路如下:

1

initializeRuntimeDependentModules -> kl.cadvisor.Start -> cc.Manager.Start -> self.createContainer -> m.createContainerLocked -> container.NewContainerHandler -> factory.CanHandleAndAccept -> self.client.ContainerInspect

由于某个容器状态异常,kubelet执行docker inspect操作也被hang死。

因此,重启kubelet引起宿主状态从Ready变为NotReady,其根因在于某个容器状态异常,执行docker inspect时被hang死。而如果docker inspect hang死发生在kubelet重启之后,则不会对宿主的Ready状态造成任何影响,因为oneTimeInitializer是sync.Once类型,也即仅仅会在kebelet启动时执行一次。那时kubelet仅仅是不能处理该Pod相关的任何事件,包含删除、变更等,但是仍然能够处理其他Pod的任意事件。

可能有人会问,为什么kubelet重启时访问docker inspect操作不加超时控制?确实,如果添加了超时控制,kubelet重启不会引起宿主状态变更。待详细挖掘后再来补充,我们先继续分析docker hang死的问题。

3. docker hang死

我们对docker hang死并不陌生,因为已经发生了好多起。其发生时的现象也多种多样。以往针对docker 1.13.1版本的排查都发现了一些线索,但是并没有定位到根因,最终绝大多数也是通过重启docker解决。而这一次发生在docker 18.06.3版本的docker hang死行为,经过我们4人小分队接近一周的望闻问切,终于确定了其病因。注意,docker hang死的原因不止一种,因此本处方并非是个万能药。

现在,我们掌握的知识仅仅是docker异常了,无法响应特定容器的docker inspect操作,而对详细信息则一无所知。

3.1 链路跟踪

首先,我们希望对docker运行的全局状况有一个大致的了解,熟悉go语言开发的用户自然能联想到神器pprof。我们借助pprof描绘出了docker当时运行的蓝图:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81

goroutine profile: total 722373 717594 @ 0x7fe8bc202980 0x7fe8bc202a40 0x7fe8bc2135d8 0x7fe8bc2132ef 0x7fe8bc238c1a 0x7fe8bd56f7fe 0x7fe8bd56f6bd 0x7fe8bcea8719 0x7fe8bcea938b 0x7fe8bcb726ca 0x7fe8bcb72b01 0x7fe8bc71c26b 0x7fe8bcb85f4a 0x7fe8bc4b9896 0x7fe8bc72a438 0x7fe8bcb849e2 0x7fe8bc4bc67e 0x7fe8bc4b88a3 0x7fe8bc230711 # 0x7fe8bc2132ee sync.runtime_SemacquireMutex 0x3e /usr/local/go/src/runtime/sema.go:71 # 0x7fe8bc238c19 sync.(*Mutex).Lock 0x109 /usr/local/go/src/sync/mutex.go:134 # 0x7fe8bd56f7fd github.com/docker/docker/daemon.(*Daemon).ContainerInspectCurrent 0x8d /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:40 # 0x7fe8bd56f6bc github.com/docker/docker/daemon.(*Daemon).ContainerInspect 0x11c /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:29 # 0x7fe8bcea8718 github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersByName 0x118 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/inspect.go:15 # 0x7fe8bcea938a github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersByName)-fm 0x6a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:39 # 0x7fe8bcb726c9 github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1 0xd9 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26 # 0x7fe8bcb72b00 github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1 0x400 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62 # 0x7fe8bc71c26a github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1 0x7aa /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59 # 0x7fe8bcb85f49 github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1 0x199 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141 # 0x7fe8bc4b9895 net/http.HandlerFunc.ServeHTTP 0x45 /usr/local/go/src/net/http/server.go:1947 # 0x7fe8bc72a437 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP 0x227 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103 # 0x7fe8bcb849e1 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP 0x71 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29 # 0x7fe8bc4bc67d net/http.serverHandler.ServeHTTP 0xbd /usr/local/go/src/net/http/server.go:2694 # 0x7fe8bc4b88a2 net/http.(*conn).serve 0x652 /usr/local/go/src/net/http/server.go:1830 4175 @ 0x7fe8bc202980 0x7fe8bc202a40 0x7fe8bc2135d8 0x7fe8bc2132ef 0x7fe8bc238c1a 0x7fe8bcc2eccf 0x7fe8bd597af4 0x7fe8bcea2456 0x7fe8bcea956b 0x7fe8bcb73dff 0x7fe8bcb726ca 0x7fe8bcb72b01 0x7fe8bc71c26b 0x7fe8bcb85f4a 0x7fe8bc4b9896 0x7fe8bc72a438 0x7fe8bcb849e2 0x7fe8bc4bc67e 0x7fe8bc4b88a3 0x7fe8bc230711 # 0x7fe8bc2132ee sync.runtime_SemacquireMutex 0x3e /usr/local/go/src/runtime/sema.go:71 # 0x7fe8bc238c19 sync.(*Mutex).Lock 0x109 /usr/local/go/src/sync/mutex.go:134 # 0x7fe8bcc2ecce github.com/docker/docker/container.(*State).IsRunning 0x2e /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/container/state.go:240 # 0x7fe8bd597af3 github.com/docker/docker/daemon.(*Daemon).ContainerStats 0xb3 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/stats.go:30 # 0x7fe8bcea2455 github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersStats 0x1e5 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container_routes.go:115 # 0x7fe8bcea956a github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersStats)-fm 0x6a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:42 # 0x7fe8bcb73dfe github.com/docker/docker/api/server/router.cancellableHandler.func1 0xce /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/local.go:92 # 0x7fe8bcb726c9 github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1 0xd9 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26 # 0x7fe8bcb72b00 github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1 0x400 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62 # 0x7fe8bc71c26a github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1 0x7aa /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59 # 0x7fe8bcb85f49 github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1 0x199 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141 # 0x7fe8bc4b9895 net/http.HandlerFunc.ServeHTTP 0x45 /usr/local/go/src/net/http/server.go:1947 # 0x7fe8bc72a437 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP 0x227 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103 # 0x7fe8bcb849e1 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP 0x71 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29 # 0x7fe8bc4bc67d net/http.serverHandler.ServeHTTP 0xbd /usr/local/go/src/net/http/server.go:2694 # 0x7fe8bc4b88a2 net/http.(*conn).serve 0x652 /usr/local/go/src/net/http/server.go:1830 1 @ 0x7fe8bc202980 0x7fe8bc202a40 0x7fe8bc2135d8 0x7fe8bc2131fb 0x7fe8bc239a3b 0x7fe8bcbb679d 0x7fe8bcc26774 0x7fe8bd570b20 0x7fe8bd56f81c 0x7fe8bd56f6bd 0x7fe8bcea8719 0x7fe8bcea938b 0x7fe8bcb726ca 0x7fe8bcb72b01 0x7fe8bc71c26b 0x7fe8bcb85f4a 0x7fe8bc4b9896 0x7fe8bc72a438 0x7fe8bcb849e2 0x7fe8bc4bc67e 0x7fe8bc4b88a3 0x7fe8bc230711 # 0x7fe8bc2131fa sync.runtime_Semacquire 0x3a /usr/local/go/src/runtime/sema.go:56 # 0x7fe8bc239a3a sync.(*RWMutex).RLock 0x4a /usr/local/go/src/sync/rwmutex.go:50 # 0x7fe8bcbb679c github.com/docker/docker/daemon/exec.(*Store).List 0x4c /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/exec/exec.go:140 # 0x7fe8bcc26773 github.com/docker/docker/container.(*Container).GetExecIDs 0x33 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/container/container.go:423 # 0x7fe8bd570b1f github.com/docker/docker/daemon.(*Daemon).getInspectData 0x5cf /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:178 # 0x7fe8bd56f81b github.com/docker/docker/daemon.(*Daemon).ContainerInspectCurrent 0xab /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:42 # 0x7fe8bd56f6bc github.com/docker/docker/daemon.(*Daemon).ContainerInspect 0x11c /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.go:29 # 0x7fe8bcea8718 github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersByName 0x118 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/inspect.go:15 # 0x7fe8bcea938a github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersByName)-fm 0x6a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:39 # 0x7fe8bcb726c9 github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1 0xd9 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26 # 0x7fe8bcb72b00 github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1 0x400 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62 # 0x7fe8bc71c26a github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1 0x7aa /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59 # 0x7fe8bcb85f49 github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1 0x199 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141 # 0x7fe8bc4b9895 net/http.HandlerFunc.ServeHTTP 0x45 /usr/local/go/src/net/http/server.go:1947 # 0x7fe8bc72a437 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP 0x227 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103 # 0x7fe8bcb849e1 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP 0x71 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29 # 0x7fe8bc4bc67d net/http.serverHandler.ServeHTTP 0xbd /usr/local/go/src/net/http/server.go:2694 # 0x7fe8bc4b88a2 net/http.(*conn).serve 0x652 /usr/local/go/src/net/http/server.go:1830 1 @ 0x7fe8bc202980 0x7fe8bc212946 0x7fe8bc8b6881 0x7fe8bc8b699d 0x7fe8bc8e259b 0x7fe8bc8e1695 0x7fe8bc8c47d5 0x7fe8bd2e0c06 0x7fe8bd2eda96 0x7fe8bc8c42fb 0x7fe8bc8c4613 0x7fe8bd2a6474 0x7fe8bd2e6976 0x7fe8bd3661c5 0x7fe8bd56842f 0x7fe8bcea7bdb 0x7fe8bcea9f6b 0x7fe8bcb726ca 0x7fe8bcb72b01 0x7fe8bc71c26b 0x7fe8bcb85f4a 0x7fe8bc4b9896 0x7fe8bc72a438 0x7fe8bcb849e2 0x7fe8bc4bc67e 0x7fe8bc4b88a3 0x7fe8bc230711 # 0x7fe8bc8b6880 github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).waitOnHeader 0x100 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:222 # 0x7fe8bc8b699c github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).RecvCompress 0x2c /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:233 # 0x7fe8bc8e259a github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg 0x63a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:515 # 0x7fe8bc8e1694 github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg 0x44 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:395 # 0x7fe8bc8c47d4 github.com/docker/docker/vendor/google.golang.org/grpc.invoke 0x184 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:83 # 0x7fe8bd2e0c05 github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary 0xf5 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:35 # 0x7fe8bd2eda95 github.com/docker/docker/vendor/github.com/containerd/containerd.(namespaceInterceptor).(github.com/docker/docker/vendor/github.com/containerd/containerd.unary)-fm 0xf5 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:51 # 0x7fe8bc8c42fa github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke 0x10a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:35 # 0x7fe8bc8c4612 github.com/docker/docker/vendor/google.golang.org/grpc.Invoke 0xc2 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:60 # 0x7fe8bd2a6473 github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Start 0xd3 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:421 # 0x7fe8bd2e6975 github.com/docker/docker/vendor/github.com/containerd/containerd.(*process).Start 0xf5 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/process.go:109 # 0x7fe8bd3661c4 github.com/docker/docker/libcontainerd.(*client).Exec 0x4b4 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/client_daemon.go:381 # 0x7fe8bd56842e github.com/docker/docker/daemon.(*Daemon).ContainerExecStart 0xb4e /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/exec.go:251 # 0x7fe8bcea7bda github.com/docker/docker/api/server/router/container.(*containerRouter).postContainerExecStart 0x34a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/exec.go:125 # 0x7fe8bcea9f6a github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.postContainerExecStart)-fm 0x6a /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:59 # 0x7fe8bcb726c9 github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1 0xd9 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26 # 0x7fe8bcb72b00 github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1 0x400 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62 # 0x7fe8bc71c26a github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1 0x7aa /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59 # 0x7fe8bcb85f49 github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1 0x199 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141 # 0x7fe8bc4b9895 net/http.HandlerFunc.ServeHTTP 0x45 /usr/local/go/src/net/http/server.go:1947 # 0x7fe8bc72a437 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP 0x227 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103 # 0x7fe8bcb849e1 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP 0x71 /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29 # 0x7fe8bc4bc67d net/http.serverHandler.ServeHTTP 0xbd /usr/local/go/src/net/http/server.go:2694 # 0x7fe8bc4b88a2 net/http.(*conn).serve 0x652 /usr/local/go/src/net/http/server.go:1830

注意,这是一份精简后的docker协程栈信息。从上面的蓝图,我们可以总结出如下结论:

  1. 有 717594 个协程被阻塞在docker inspect
  2. 有 4175 个协程被阻塞在docker stats
  3. 有 1 个协程被阻塞在获取 docker exec的任务ID
  4. 有 1 个协程被阻塞在docker exec的执行过程

从上面的结论,我们基本了解了异常容器hang死的原因,在于该容器执行docker exec后未返回(4),进而导致获取docker exec的任务ID阻塞(3),由于(3)实现获取了容器锁,进而导致了docker inspect (1)与docker stats (2) 卡死。所以病因并非是docker inspect,而是docker exec。

要想继续往下挖掘,我们现在有必要补充一下背景知识。kubelet启动容器或者在容器内执行命令的完整调用路径如下:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

-------------------------------------------------------------- | | | ------------ | | | | | | | kubelet | | | | | | | ------|----- | | | | | | | | ------v----- --------------- | | | | | | | | | dockerd ------->| containerd | | | | | | | | | ------------ -------|------- | | | | | | | | -------v------- ----------- | | | | | | | | |containerd-shim----->| runc | | | | | | | | | --------------- ----------- | | | --------------------------------------------------------------

dockerd与containerd可以当做两层nginx代理,containerd-shim是容器的监护人,而runc则是容器启动与命令执行的真正工具人。runc干的事情也非常简单:按照用户指定的配置创建NS,或者进入特定NS,然后执行用户命令。说白了,创建容器就是新建NS,然后在该NS内执行用户指定的命令。

按照上面介绍的背景知识,我们继续往下探索containerd。幸运的是,借助pprof,我们也可以描绘出containerd此时的运行蓝图:

1 2 3 4 5 6 7 8 9 10 11 12 13 14

goroutine profile: total 430 1 @ 0x7f6e55f82740 0x7f6e55f92616 0x7f6e56a8412c 0x7f6e56a83d6d 0x7f6e56a911bf 0x7f6e56ac6e3b 0x7f6e565093de 0x7f6e5650dd3b 0x7f6e5650392b 0x7f6e56b51216 0x7f6e564e5909 0x7f6e563ec76a 0x7f6e563f000a 0x7f6e563f6791 0x7f6e55fb0151 # 0x7f6e56a8412b github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).dispatch 0x24b /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:102 # 0x7f6e56a83d6c github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).Call 0x15c /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:73 # 0x7f6e56a911be github.com/containerd/containerd/linux/shim/v1.(*shimClient).Start 0xbe /go/src/github.com/containerd/containerd/linux/shim/v1/shim.pb.go:1745 # 0x7f6e56ac6e3a github.com/containerd/containerd/linux.(*Process).Start 0x8a /go/src/github.com/containerd/containerd/linux/process.go:125 # 0x7f6e565093dd github.com/containerd/containerd/services/tasks.(*local).Start 0x14d /go/src/github.com/containerd/containerd/services/tasks/local.go:187 # 0x7f6e5650dd3a github.com/containerd/containerd/services/tasks.(*service).Start 0x6a /go/src/github.com/containerd/containerd/services/tasks/service.go:72 # 0x7f6e5650392a github.com/containerd/containerd/api/services/tasks/v1._Tasks_Start_Handler.func1 0x8a /go/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:624 # 0x7f6e56b51215 github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.UnaryServerInterceptor 0xa5 /go/src/github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server.go:29 # 0x7f6e564e5908 github.com/containerd/containerd/api/services/tasks/v1._Tasks_Start_Handler 0x168 /go/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:626 # 0x7f6e563ec769 github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC 0x849 /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:920 # 0x7f6e563f0009 github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).handleStream 0x1319 /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:1142 # 0x7f6e563f6790 github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1 0xa0 /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:637

同样,我们仅保留了关键的协程信息,从上面的协程栈可以看出,containerd阻塞在接收exec返回结果处,附上关键代码佐证:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

func (c *Client) dispatch(ctx context.Context, req *Request, resp *Response) error { errs := make(chan error, 1) call := &callRequest{ req: req, resp: resp, errs: errs, } select { case c.calls <- call: case <-c.done: return c.err } select { // 此处对应上面协程栈 /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:102 case err := <-errs: return filterCloseErr(err) case <-c.done: return c.err } }

containerd将请求传递至containerd-shim之后,一直在等待containerd-shim的返回。

正常情况下,如果我们能够按照调用链路逐个分析每个组件的协程调用栈信息,我们能够很快的定位问题所在。不幸的是,由于线上docker没有开启debug模式,我们无法收集containerd-shim的pprof信息,并且runc也没有开启pprof。因此单纯依赖协程调用链路定位问题这条路被堵死了。

截至目前,我们已经收集了部分关键信息,同时也将问题排查范围更进一步地缩小在containerd-shim与runc之间。接下来我们换一种思路继续排查。

3.2 进程排查

当组件的运行状态无法继续获取时,我们转换一下思维,获取容器的运行状态,也即异常容器此时的进程状态。

既然docker ps执行正常,而docker inspect hang死,首先我们定位异常容器,命令如下:

1

docker ps | grep -v NAME | awk '{print $1}' | while read cid; do echo $cid; docker inspect -f {{.State.Pid}} $cid; done

拿到异常容器的ID之后,我们就能扫描与该容器相关的所有进程:

1 2 3 4

root 11646 6655 0 Jun17 ? 00:01:04 docker-containerd-shim -namespace moby -workdir /home/docker_rt/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc root 11680 11646 0 Jun17 ? 00:00:00 /dockerinit root 15581 11646 0 Jun17 ? 00:00:00 docker-runc --root /var/run/docker/runtime-runc/moby --log /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/log.json --log-format json exec --process /tmp/runc-process616674997 --detach --pid-file /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/0594c5897a41d401e4d1d7ddd44dacdd316c7e7d53bfdae7f16b0f6b26fcbcda.pid bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5 root 15638 15581 0 Jun17 ? 00:00:00 docker-runc init

核心进程列表如上,简单备注下:

  • 6655:containerd进程
  • 11646:异常容器的containerd-shim进程
  • 11680:异常容器的容器启动进程。在容器内查看,因PID NS的隔离,该进程ID是1
  • 15581:在异常容器内执行用户命令的进程
  • 15638:在异常容器内执行用户命令时,进入容器NS的进程

这里再补充一个背景知识:当我们启动容器时,首先会创建runc init进程,创建并进入新的容器NS;而当我们在容器内执行命令时,首先也会创建runc init进程,进入容器的NS。进入容器的隔离NS中,才会执行用户指定的命令。

面对上面的进程列表,我们无法直观地感受问题究竟由哪个进程引起。因此,我们还需要了解进程当前所处的状态。借助strace,我们逐一展示进程的活动状态:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

// 11646 (container-shim) Process 11646 attached with 10 threads [pid 37342] epoll_pwait(5, <unfinished ...> [pid 11656] futex(0x818cc0, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 11655] restart_syscall(<... resuming interrupted call ...> <unfinished ...> [pid 11654] futex(0x818bd8, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 11653] futex(0x7fc730, FUTEX_WAKE, 1 <unfinished ...> [pid 11651] futex(0xc4200b4148, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 11650] futex(0xc420082948, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 11649] futex(0xc420082548, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 11647] restart_syscall(<... resuming interrupted call ...> <unfinished ...> [pid 11646] futex(0x7fd008, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 11653] <... futex resumed> ) = 0 [pid 11647] <... restart_syscall resumed> ) = -1 EAGAIN (Resource temporarily unavailable) [pid 11653] epoll_wait(4, <unfinished ...> [pid 11647] pselect6(0, NULL, NULL, NULL, {0, 20000}, 0) = 0 (Timeout) [pid 11647] futex(0x7fc730, FUTEX_WAIT, 0, {60, 0} // 11581 (runc exec) Process 15581 attached with 7 threads [pid 15619] read(6, <unfinished ...> [pid 15592] futex(0xc4200be148, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 15591] futex(0x7fd6d25f6238, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 15590] futex(0xc420084d48, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 15586] futex(0x7fd6d25f6320, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 15584] restart_syscall(<... resuming interrupted call ...> <unfinished ...> [pid 15581] futex(0x7fd6d25d9b28, FUTEX_WAIT, 0, NULL // 11638 (runc init) Process 15638 attached with 7 threads [pid 15648] futex(0x7f512cea5320, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 15647] futex(0x7f512cea5238, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 15645] futex(0xc4200bc148, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 15643] futex(0xc420082d48, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 15642] futex(0xc420082948, FUTEX_WAIT, 0, NULL <unfinished ...> [pid 15639] restart_syscall(<... resuming interrupted call ...> <unfinished ...> [pid 15638] write(2, "/usr/local/go/src/runtime/proc.g"..., 33

从关联进程的活动状态,我们可以得出如下结论:

  • runc exec在等待从6号FD读取数据
  • runc init在等待从2号FD写入数据

这些FD究竟对应的是什么文件呢?我们借助lsof可以查看:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64

// 11638 (runc init) COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME runc:[2:I 15638 root cwd DIR 0,41 192 1066743071 / runc:[2:I 15638 root rtd DIR 0,41 192 1066743071 / runc:[2:I 15638 root txt REG 0,4 7644224 1070360467 /memfd:runc_cloned:/proc/self/exe (deleted) runc:[2:I 15638 root mem REG 8,3 2107816 1053962 /usr/lib64/libc-2.17.so runc:[2:I 15638 root mem REG 8,3 19512 1054285 /usr/lib64/libdl-2.17.so runc:[2:I 15638 root mem REG 8,3 266688 1050626 /usr/lib64/libseccomp.so.2.3.1 runc:[2:I 15638 root mem REG 8,3 142296 1055698 /usr/lib64/libpthread-2.17.so runc:[2:I 15638 root mem REG 8,3 27168 3024893 /usr/local/gundam/gundam_client/preload/lib64/gundam_preload.so runc:[2:I 15638 root mem REG 8,3 164432 1054515 /usr/lib64/ld-2.17.so runc:[2:I 15638 root 0r FIFO 0,8 0t0 1070361745 pipe runc:[2:I 15638 root 1w FIFO 0,8 0t0 1070361746 pipe runc:[2:I 15638 root 2w FIFO 0,8 0t0 1070361747 pipe runc:[2:I 15638 root 3u unix 0xffff881ff8273000 0t0 1070361341 socket runc:[2:I 15638 root 5u a_inode 0,9 0 7180 [eventpoll] // 11581 (runc exec) COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME docker-ru 15581 root cwd DIR 0,18 120 1066743076 /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5 docker-ru 15581 root rtd DIR 8,3 4096 2 / docker-ru 15581 root txt REG 8,3 7644224 919775 /usr/bin/docker-runc docker-ru 15581 root mem REG 8,3 2107816 1053962 /usr/lib64/libc-2.17.so docker-ru 15581 root mem REG 8,3 19512 1054285 /usr/lib64/libdl-2.17.so docker-ru 15581 root mem REG 8,3 266688 1050626 /usr/lib64/libseccomp.so.2.3.1 docker-ru 15581 root mem REG 8,3 142296 1055698 /usr/lib64/libpthread-2.17.so docker-ru 15581 root mem REG 8,3 27168 3024893 /usr/local/gundam/gundam_client/preload/lib64/gundam_preload.so docker-ru 15581 root mem REG 8,3 164432 1054515 /usr/lib64/ld-2.17.so docker-ru 15581 root 0r FIFO 0,8 0t0 1070361745 pipe docker-ru 15581 root 1w FIFO 0,8 0t0 1070361746 pipe docker-ru 15581 root 2w FIFO 0,8 0t0 1070361747 pipe docker-ru 15581 root 3w REG 0,18 5456 1066709902 /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/log.json docker-ru 15581 root 4u a_inode 0,9 0 7180 [eventpoll] docker-ru 15581 root 6u unix 0xffff881ff8275400 0t0 1070361342 socket // 11646 (container-shim) COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME docker-co 11646 root cwd DIR 0,18 120 1066743076 /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5 docker-co 11646 root rtd DIR 8,3 4096 2 / docker-co 11646 root txt REG 8,3 4173632 919772 /usr/bin/docker-containerd-shim docker-co 11646 root 0r CHR 1,3 0t0 2052 /dev/null docker-co 11646 root 1w CHR 1,3 0t0 2052 /dev/null docker-co 11646 root 2w CHR 1,3 0t0 2052 /dev/null docker-co 11646 root 3r FIFO 0,8 0t0 1070361745 pipe docker-co 11646 root 4u a_inode 0,9 0 7180 [eventpoll] docker-co 11646 root 5u a_inode 0,9 0 7180 [eventpoll] docker-co 11646 root 6u unix 0xffff881e8cac2800 0t0 1066743079 @/containerd-shim/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/shim.sock docker-co 11646 root 7u unix 0xffff881e8cac3400 0t0 1066743968 @/containerd-shim/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/shim.sock docker-co 11646 root 8r FIFO 0,8 0t0 1066743970 pipe docker-co 11646 root 9w FIFO 0,8 0t0 1070361745 pipe docker-co 11646 root 10r FIFO 0,8 0t0 1066743971 pipe docker-co 11646 root 11u FIFO 0,18 0t0 1066700778 /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stdout docker-co 11646 root 12r FIFO 0,8 0t0 1066743972 pipe docker-co 11646 root 13w FIFO 0,18 0t0 1066700778 /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stdout docker-co 11646 root 14u FIFO 0,18 0t0 1066700778 /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stdout docker-co 11646 root 15r FIFO 0,18 0t0 1066700778 /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stdout docker-co 11646 root 16u FIFO 0,18 0t0 1066700779 /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stderr docker-co 11646 root 17w FIFO 0,18 0t0 1066700779 /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stderr docker-co 11646 root 18u FIFO 0,18 0t0 1066700779 /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stderr docker-co 11646 root 19r FIFO 0,18 0t0 1066700779 /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stderr docker-co 11646 root 20r FIFO 0,8 0t0 1070361746 pipe docker-co 11646 root 26r FIFO 0,8 0t0 1070361747 pipe

有心人结合strace与lsof的结果,已经能够自己得出结论:

runc init往2号FD内写数据时阻塞,2号FD对应的类型是pipe类型。而linux pipe有一个默认的数据大小,当写入的数据超过该Size(这个Size可以借助ulimit获取)时,同时读端并未读取数据,写段就会被阻塞。总结一下:containerd-shim启动runc exec去容器内执行用户命令,runc exec启动runc init进入容器时,由于往2号FD写数据超过限制大小而被阻塞。当最底层的runc init被阻塞时,造成了调用链路上所有进程都被阻塞:

1

runc init → runc exec → containerd-shim exec → containerd exec → dockerd exec

问题定位至此,我们已经了解了docker hang死的原因。但是,现在我们还有如下问题并未解决:

  • 为什么runc init会往2号FD (对应go语言的os.Stderr) 中写入超过linux pipe大小限制的数据?
  • 为什么runc init出现问题只发生在特定容器?

如果常态下runc init就需要往os.Stdout或者os.Stderr中写入很多数据,那么所有容器的创建都应该有问题。所以,我们可以确定是该异常容器出现了什么未知原因,导致runc init非预期往os.Stderr写入了大量数据。而这被写入的数据就很有可能揭示非预期的异常。

所以,我们需要获取runc init当前正在写入的数据。由于runc init的2号FD是个匿名pipe,我们无法使用常规文件读取的方式获取pipe内的数据。这里找到了一种读取匿名pipe内容的方法:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121

# cat /proc/15638/fd/2 runtime/cgo: pthread_create failed: Resource temporarily unavailable SIGABRT: abort PC=0x7f512b7365f7 m=0 sigcode=18446744073709551610 goroutine 0 [idle]: runtime: unknown pc 0x7f512b7365f7 stack: frame={sp:0x7ffe1121a658, fp:0x0} stack=[0x7ffe0ae1bb28,0x7ffe1121ab50) 00007ffe1121a558: 00007ffe1121a6d8 00007ffe1121a6b0 00007ffe1121a568: 0000000000000001 00007f512c527660 00007ffe1121a578: 00007f512c54d560 00007f512c54d208 00007ffe1121a588: 00007f512c333e6f 0000000000000000 00007ffe1121a598: 00007f512c527660 0000000000000005 00007ffe1121a5a8: 0000000000000000 0000000000000001 00007ffe1121a5b8: 00007f512c54d208 00007f512c528000 00007ffe1121a5c8: 00007ffe1121a600 00007f512b704b0c 00007ffe1121a5d8: 00007f512b7110c0 0000000000000000 00007ffe1121a5e8: 00007f512c54d560 00007ffe1121a620 00007ffe1121a5f8: 00007ffe1121a610 000000000f11ed7d 00007ffe1121a608: 00007f512c550153 00000000ffffffff 00007ffe1121a618: 00007f512c550a9b 00007f512b707d00 00007ffe1121a628: 00007f512babc868 00007f512c9e9e5e 00007ffe1121a638: 00007f512d3bb080 00000000000000f1 00007ffe1121a648: 0000000000000011 0000000000000000 00007ffe1121a658: <00007f512b737ce8 0000000000000020 00007ffe1121a668: 0000000000000000 0000000000000000 00007ffe1121a678: 0000000000000000 0000000000000000 00007ffe1121a688: 0000000000000000 0000000000000000 00007ffe1121a698: 0000000000000000 0000000000000000 00007ffe1121a6a8: 0000000000000000 0000000000000000 00007ffe1121a6b8: 0000000000000000 0000000000000000 00007ffe1121a6c8: 0000000000000000 0000000000000000 00007ffe1121a6d8: 0000000000000000 00007f512babc868 00007ffe1121a6e8: 00007f512c9e9e5e 00007f512d3bb080 00007ffe1121a6f8: 00007f512c33f260 00007f512babc1c0 00007ffe1121a708: 00007f512babc1c0 0000000000000001 00007ffe1121a718: 00007f512babc243 00000000000000f1 00007ffe1121a728: 00007f512b7787ec 0000000000000001 00007ffe1121a738: 00007f512babc1c0 000000000000000a 00007ffe1121a748: 00007f512b7e8a4d 000000000000000a runtime: unknown pc 0x7f512b7365f7 stack: frame={sp:0x7ffe1121a658, fp:0x0} stack=[0x7ffe0ae1bb28,0x7ffe1121ab50) 00007ffe1121a558: 00007ffe1121a6d8 00007ffe1121a6b0 00007ffe1121a568: 0000000000000001 00007f512c527660 00007ffe1121a578: 00007f512c54d560 00007f512c54d208 00007ffe1121a588: 00007f512c333e6f 0000000000000000 00007ffe1121a598: 00007f512c527660 0000000000000005 00007ffe1121a5a8: 0000000000000000 0000000000000001 00007ffe1121a5b8: 00007f512c54d208 00007f512c528000 00007ffe1121a5c8: 00007ffe1121a600 00007f512b704b0c 00007ffe1121a5d8: 00007f512b7110c0 0000000000000000 00007ffe1121a5e8: 00007f512c54d560 00007ffe1121a620 00007ffe1121a5f8: 00007ffe1121a610 000000000f11ed7d 00007ffe1121a608: 00007f512c550153 00000000ffffffff 00007ffe1121a618: 00007f512c550a9b 00007f512b707d00 00007ffe1121a628: 00007f512babc868 00007f512c9e9e5e 00007ffe1121a638: 00007f512d3bb080 00000000000000f1 00007ffe1121a648: 0000000000000011 0000000000000000 00007ffe1121a658: <00007f512b737ce8 0000000000000020 00007ffe1121a668: 0000000000000000 0000000000000000 00007ffe1121a678: 0000000000000000 0000000000000000 00007ffe1121a688: 0000000000000000 0000000000000000 00007ffe1121a698: 0000000000000000 0000000000000000 00007ffe1121a6a8: 0000000000000000 0000000000000000 00007ffe1121a6b8: 0000000000000000 0000000000000000 00007ffe1121a6c8: 0000000000000000 0000000000000000 00007ffe1121a6d8: 0000000000000000 00007f512babc868 00007ffe1121a6e8: 00007f512c9e9e5e 00007f512d3bb080 00007ffe1121a6f8: 00007f512c33f260 00007f512babc1c0 00007ffe1121a708: 00007f512babc1c0 0000000000000001 00007ffe1121a718: 00007f512babc243 00000000000000f1 00007ffe1121a728: 00007f512b7787ec 0000000000000001 00007ffe1121a738: 00007f512babc1c0 000000000000000a 00007ffe1121a748: 00007f512b7e8a4d 000000000000000a goroutine 1 [running, locked to thread]: runtime.systemstack_switch() /usr/local/go/src/runtime/asm_amd64.s:363 fp=0xc4200a3ed0 sp=0xc4200a3ec8 pc=0x7f512c7281d0 runtime.startTheWorld() /usr/local/go/src/runtime/proc.go:978 0x2f fp=0xc4200a3ee8 sp=0xc4200a3ed0 pc=0x7f512c70221f runtime.GOMAXPROCS(0x1, 0xc42013d9a0) /usr/local/go/src/runtime/debug.go:30 0xa0 fp=0xc4200a3f10 sp=0xc4200a3ee8 pc=0x7f512c6d9810 main.init.0() /go/src/github.com/opencontainers/runc/init.go:14 0x61 fp=0xc4200a3f30 sp=0xc4200a3f10 pc=0x7f512c992801 main.init() <autogenerated>:1 0x624 fp=0xc4200a3f88 sp=0xc4200a3f30 pc=0x7f512c9a1014 runtime.main() /usr/local/go/src/runtime/proc.go:186 0x1d2 fp=0xc4200a3fe0 sp=0xc4200a3f88 pc=0x7f512c6ff962 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2361 0x1 fp=0xc4200a3fe8 sp=0xc4200a3fe0 pc=0x7f512c72ad71 goroutine 6 [syscall]: os/signal.signal_recv(0x0) /usr/local/go/src/runtime/sigqueue.go:139 0xa8 os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:22 0x24 created by os/signal.init.0 /usr/local/go/src/os/signal/signal_unix.go:28 0x43 rax 0x0 rbx 0x7f512babc868 rcx 0xffffffffffffffff rdx 0x6 rdi 0x271 rsi 0x271 rbp 0x7f512c9e9e5e rsp 0x7ffe1121a658 r8 0xa r9 0x7f512c524740 r10 0x8 r11 0x206 r12 0x7f512d3bb080 r13 0xf1 r14 0x11 r15 0x0 rip 0x7f512b7365f7 rflags 0x206 cs 0x33 fs 0x0 gs 0x0 exec failed: container_linux.go:348: starting container process caused "read init-p: connection reset by peer"

额,runc init因资源不足创建线程失败???这种输出显然不是runc的输出,而是go runtime非预期的输出内容。那么资源不足,究竟是什么资源类型资源不足呢?我们在结合 /var/log/message 日志分析:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Jun 17 03:18:17 host-xx kernel: runc:[2:INIT] invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=997 Jun 17 03:18:17 host-xx kernel: CPU: 14 PID: 12788 Comm: runc:[2:INIT] Tainted: G W OE ------------ T 3.10.0-514.16.1.el7.stable.v1.4.x86_64 #1 Jun 17 03:18:17 host-xx kernel: Hardware name: Inspur SA5212M4/YZMB-00370-107, BIOS 4.1.10 11/14/2016 Jun 17 03:18:17 host-xx kernel: ffff88103841dee0 00000000c4394691 ffff880263e4bcb8 ffffffff8168863d Jun 17 03:18:17 host-xx kernel: ffff880263e4bd50 ffffffff81683585 ffff88203cc5e300 ffff880ee02b2380 Jun 17 03:18:17 host-xx kernel: 0000000000000001 0000000000000000 0000000000000000 0000000000000046 Jun 17 03:18:17 host-xx kernel: Call Trace: Jun 17 03:18:17 host-xx kernel: [<ffffffff8168863d>] dump_stack 0x19/0x1b Jun 17 03:18:17 host-xx kernel: [<ffffffff81683585>] dump_header 0x85/0x27f Jun 17 03:18:17 host-xx kernel: [<ffffffff81185b06>] ? find_lock_task_mm 0x56/0xc0 Jun 17 03:18:17 host-xx kernel: [<ffffffff81185fbe>] oom_kill_process 0x24e/0x3c0 Jun 17 03:18:17 host-xx kernel: [<ffffffff81093c2e>] ? has_capability_noaudit 0x1e/0x30 Jun 17 03:18:17 host-xx kernel: [<ffffffff811f4d91>] mem_cgroup_oom_synchronize 0x551/0x580 Jun 17 03:18:17 host-xx kernel: [<ffffffff811f41b0>] ? mem_cgroup_charge_common 0xc0/0xc0 Jun 17 03:18:17 host-xx kernel: [<ffffffff81186844>] pagefault_out_of_memory 0x14/0x90 Jun 17 03:18:17 host-xx kernel: [<ffffffff816813fa>] mm_fault_error 0x68/0x12b Jun 17 03:18:17 host-xx kernel: [<ffffffff81694405>] __do_page_fault 0x395/0x450 Jun 17 03:18:17 host-xx kernel: [<ffffffff816944f5>] do_page_fault 0x35/0x90 Jun 17 03:18:17 host-xx kernel: [<ffffffff81690708>] page_fault 0x28/0x30 Jun 17 03:18:17 host-xx kernel: memory: usage 3145728kB, limit 3145728kB, failcnt 14406932 Jun 17 03:18:17 host-xx kernel: memory swap: usage 3145728kB, limit 9007199254740988kB, failcnt 0 Jun 17 03:18:17 host-xx kernel: kmem: usage 3143468kB, limit 9007199254740988kB, failcnt 0 Jun 17 03:18:17 host-xx kernel: Memory cgroup stats for /kubepods/burstable/pod6c4333b3-a663-11ea-b39f-6c92bf85beda: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Jun 17 03:18:17 host-xx kernel: Memory cgroup stats for /kubepods/burstable/pod6c4333b3-a663-11ea-b39f-6c92bf85beda/b761e05249245695278b3f409d2d6e5c6a5bff6995ff0cf44d03af4aa9764a30: cache:0KB rss:40KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:40KB inactive_file:0KB active_file:0KB unevictable:0KB Jun 17 03:18:17 host-xx kernel: Memory cgroup stats for /kubepods/burstable/pod6c4333b3-a663-11ea-b39f-6c92bf85beda/1d1750ecc627cc5d60d80c071b2eb4d515ee8880c5b5136883164f08319869b0: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Jun 17 03:18:17 host-xx kernel: Memory cgroup stats for /kubepods/burstable/pod6c4333b3-a663-11ea-b39f-6c92bf85beda/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5: cache:0KB rss:2220KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:2140KB inactive_file:0KB active_file:0KB unevictable:0KB Jun 17 03:18:17 host-xx kernel: Memory cgroup stats for /kubepods/burstable/pod6c4333b3-a663-11ea-b39f-6c92bf85beda/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/super-agent: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB Jun 17 03:18:17 host-xx kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Jun 17 03:18:17 host-xx kernel: [30598] 0 30598 255 1 4 0 -998 pause Jun 17 03:18:17 host-xx kernel: [11680] 0 11680 164833 1118 20 0 997 dockerinit Jun 17 03:18:17 host-xx kernel: [12788] 0 12788 150184 1146 23 0 997 runc:[2:INIT] Jun 17 03:18:17 host-xx kernel: oom-kill:,cpuset=bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5,mems_allowed=0-1,oom_memcg=/kubepods/burstable/pod6c4333b3-a663-11ea-b39f-6c92bf85beda,task_memcg=/kubepods/burstable/pod6c4333b3-a663-11ea-b39f-6c92bf85beda/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5,task=runc:[2:INIT],pid=12800,uid=0 Jun 17 03:18:17 host-xx kernel: Memory cgroup out of memory: Kill process 12800 (runc:[2:INIT]) score 997 or sacrifice child Jun 17 03:18:17 host-xx kernel: Killed process 12788 (runc:[2:INIT]) total-vm:600736kB, anon-rss:3296kB, file-rss:276kB, shmem-rss:1012kB

在 /var/log/message 中可以找到该容器在大约1个月前大量的OOM日志记录,同时时间也基本匹配。

所以总结下,在一个非常关键的时间节点,runc init由于内存资源不足,创建线程失败,触发go runtime的非预期输出,进而造成runc init阻塞在写pipe操作。

定位至此,问题的全貌已经基本描述清楚。但是我们还有一个疑问,既然runc init再往pipe中写数据,难道没有其他进程来读取这个内容吗?

大家还记得上面lsof执行的结果吗?有心人一定发现了该pipe的读端是谁了,对,就是containerd-shim,对应的pipe的inode编号为1070361747。那么,为什么containerd-shim没有来读pipe里面的内容呢?我们结合代码来分析:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

func (e *execProcess) start(ctx context.Context) (err error) { ...... if err := e.parent.runtime.Exec(ctx, e.parent.id, e.spec, opts); err != nil { // 执行runc init close(e.waitBlock) return e.parent.runtimeError(err, "OCI runtime exec failed") } ...... else if !e.stdio.IsNull() { fifoCtx, cancel := context.WithTimeout(ctx, 15*time.Second) defer cancel() if err := copyPipes(fifoCtx, e.io, e.stdio.Stdin, e.stdio.Stdout, e.stdio.Stderr, &e.wg, &copyWaitGroup); err != nil { // 读pipe return errors.Wrap(err, "failed to start io pipe copy") } } ...... } func (r *Runc) Exec(context context.Context, id string, spec specs.Process, opts *ExecOpts) error { ...... cmd := r.command(context, append(args, id)...) if opts != nil && opts.IO != nil { opts.Set(cmd) } ...... ec, err := Monitor.Start(cmd) ...... status, err := Monitor.Wait(cmd, ec) ...... }

额,containerd-shim的设计是,等待runc init执行完成之后,再来读取pipe中的内容。但是此时的runc init由于非预期的写入数据量比较大,被阻塞在了写pipe操作处。。。完美的死锁。

终于,本次docker hang死问题的核心脉络都已清楚。接下来我们聊聊怎么解决方案。

4. 解决方案

当大家了解了docker hang死的成因之后,我们可以针对性的提出如下解决办法。

4.1 最直观的办法

既然docker exec可能会引起docker hang死,那么我们禁用系统中所有的docker exec操作即可。最典型的是kubelet的probe,当前我们默认给所有Pod添加了ReadinessProbe,并且是以exec的形式进入容器内执行命令。我们调整kubelet的探测行为,修改为tcp或者http probe即可。

这里虽然改动不大,但是涉及业务容器的改造成本太大了,如何迁移存量集群是个大问题。

4.2 最根本的办法

既然当前containerd-shim读pipe需要等待runc exec执行完毕,如果我们将读pipe的操作提前至runc exec命令执行之前,理论上也可以避免死锁。

同样。这种方案的升级成本太高了,升级containerd-shim时需要重启存量的所有容器,这个方案基本不可能通过。

4.3 最简单的办法

既然runc init阻塞在写pipe,我们主动读取pipe内的内容,也能让runc init顺利推出。

再将本解决方案自动化的过程中,如何能够识别如docker hang死是由于写pipe导致的,是一个小小的挑战。但是相对于以上两种解决方案,我认为还是值得一试,毕竟影响面微乎其微。

5. 后续

docker hang死的原因远非这一种,本次排查的结果也并非适用于所有场景。希望各位看官能够根据自己的现场排查问题。另外查看linux文档,pipe capacity是可以设置的,高版本中since 4.5也可以通过设置/proc/sys/fs/pipe-user-pages-soft来解决,但是如文档所说,在4.9之前/proc/sys/fs/pipe-user-pages-soft还是存在一些bug的,至于采用什么办法解决,还得根据自己情况来做选择。

0 人点赞