背景: 最近在Azure上自建原生k8s集群,然后很不稳定有时雪崩,日志一直报如下,网上查了很多资料说是etcd数据存储磁盘io影响etcd查查询慢。 确实我这块存储盘是hdd,于是打算迁到ssd盘试下还会不会有这种情况。
代码语言:javascript复制etcdserver: read-only range request took too long with etcd 3.2.24 #70082
issue:https://github.com/kubernetes/kubernetes/issues/70082
迁移方法:
1.关闭服务并拷贝数据库文件到新的目录下
代码语言:javascript复制[root@node1 ~]# systemctl stop etcd
[root@node1 ~]# cp -ar /data/etcd/ /var/lib/
[root@node1 ~]# ll /var/lib/etcd/
total 0
drwx------. 4 root root 29 Apr 28 05:54 member
2.修改etcd.env文件新数据目录
代码语言:javascript复制[root@node1 ~]# vim /etc/etcd.env
# Environment file for etcd v3.3.12
ETCD_DATA_DIR=/var/lib/etcd
3.修改启动参数
代码语言:javascript复制[root@node1 ~]# vim /usr/local/bin/etcd
#!/bin/bash
/usr/bin/docker run
--restart=on-failure:5
--env-file=/etc/etcd.env
--net=host
-v /etc/ssl/certs:/etc/ssl/certs:ro
-v /etc/ssl/etcd/ssl:/etc/ssl/etcd/ssl:ro
-v /var/lib/etcd:/var/lib/etcd:rw
--memory=0
--blkio-weight=1000
--name=etcd1
quay.io/coreos/etcd:v3.3.12
/usr/local/bin/etcd
"$@"
4.启动服务
代码语言:javascript复制[root@node1 ~]# systemctl start etcd
[root@node1 ~]# systemctl status etcd
● etcd.service - etcd docker wrapper
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2020-04-29 12:40:24 UTC; 5s ago
Process: 4218 ExecStop=/usr/bin/docker stop etcd1 (code=exited, status=0/SUCCESS)
Process: 6071 ExecStartPre=/usr/bin/docker rm -f etcd1 (code=exited, status=0/SUCCESS)
Main PID: 6086 (etcd)
Tasks: 16
Memory: 34.4M
CGroup: /system.slice/etcd.service
├─6086 /bin/bash /usr/local/bin/etcd
└─6088 /usr/bin/docker run --restart=on-failure:5 --env-file=/etc/etcd.env --net=host -v /etc/ssl/certs:/etc/ssl/certs:ro -v /etc/ssl/etcd/ssl:/etc/ssl/etcd/ssl:ro -v /var...
Apr 29 12:40:24 node1 etcd[6086]: 2020-04-29 12:40:24.938238 I | rafthttp: established a TCP streaming connection with peer 1c74700fc9501a08 (stream Message reader)
Apr 29 12:40:24 node1 etcd[6086]: 2020-04-29 12:40:24.938385 I | rafthttp: established a TCP streaming connection with peer bfb5d71282c2db49 (stream MsgApp v2 reader)
Apr 29 12:40:24 node1 etcd[6086]: 2020-04-29 12:40:24.979248 I | etcdserver: 465aba9a8e04dd3f initialzed peer connection; fast-forwarding 3 ticks (election ticks 5) with...ctive peer(s)
Apr 29 12:40:25 node1 etcd[6086]: 2020-04-29 12:40:25.006641 I | mvcc: store.index: compact 1078020
Apr 29 12:40:25 node1 etcd[6086]: 2020-04-29 12:40:25.010060 I | mvcc: finished scheduled compaction at 1078020 (took 2.330859ms)
Apr 29 12:40:25 node1 etcd[6086]: 2020-04-29 12:40:25.016608 I | etcdserver: published {Name:etcd1 ClientURLs:[https://10.10.10.11:2379]} to cluster 4059f5ad1e3ba1cc
Apr 29 12:40:25 node1 etcd[6086]: 2020-04-29 12:40:25.016643 I | embed: ready to serve client requests
Apr 29 12:40:25 node1 etcd[6086]: 2020-04-29 12:40:25.016964 I | embed: ready to serve client requests
Apr 29 12:40:25 node1 etcd[6086]: 2020-04-29 12:40:25.019571 I | embed: serving client requests on 10.10.10.11:2379
Apr 29 12:40:25 node1 etcd[6086]: 2020-04-29 12:40:25.020805 I | embed: serving client requests on 127.0.0.1:2379
Hint: Some lines were ellipsized, use -l to show in full.
[root@node1 ~]# kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}