业务背景
默认情况下只有当单个bucket承载的object数量过多,导致omap过大才会要做reshard,所以理论上reshard操作比较少,但是在开启了Multisite的情况下,一旦对bucket进行了reshard操作,则会破坏原有的元数据对应规则,导致对应的bucket无法进行数据同步,官方从L版本的12.2.8开始才有了如何在Multisite场景下的reshard修复方案。值得注意的是开启了Multisite的环境千万不要开auto reshard。
注意事项
bucket reshard是非常耗时耗资源的一项操作,生产上应该尽量避免。一旦你决定要在Multisite下面进行reshard,并且要修复对应bucket的数据同步功能,则意味着你要停掉整个集群的RGW服务,因此进行整个修复操作之前务必知道这样做的风险,避免造成重大损失。
操作流程
查看主从集群的同步状态,确保对应bucket已经完成数据同步,建议停掉对应bucket的数据写入并等待同步完成。
代码语言:javascript复制[root@master supdev]# radosgw-admin bucket sync status --bucket=demo1
realm f6ab846d-fb50-4f02-b129-98c13dce3376 (cn)
zonegroup e56bf383-f61f-4dc7-9f59-9a8aaa801e3a (cn-bj)
zone e8921092-c7e8-42d8-80d1-5e83c25d5661 (cn-bj-test2)
bucket demo1[e8921092-c7e8-42d8-80d1-5e83c25d5661.72254.1]
source zone d83f9891-c31b-4a80-ae58-ebfbfc74e49a (cn-bj-test1)
full sync: 0/16 shards
incremental sync: 16/16 shards
bucket is behind on 6 shards
behind shards: [3,5,8,9,11,15]
[root@master supdev]# radosgw-admin sync status
realm f6ab846d-fb50-4f02-b129-98c13dce3376 (cn)
zonegroup e56bf383-f61f-4dc7-9f59-9a8aaa801e3a (cn-bj)
zone e8921092-c7e8-42d8-80d1-5e83c25d5661 (cn-bj-test2)
metadata sync no sync (zone is master)
data sync source: d83f9891-c31b-4a80-ae58-ebfbfc74e49a (cn-bj-test1)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
从集群检查
代码语言:javascript复制[root@slave supdev]# radosgw-admin bucket sync status --bucket=demo1
realm f6ab846d-fb50-4f02-b129-98c13dce3376 (cn)
zonegroup e56bf383-f61f-4dc7-9f59-9a8aaa801e3a (cn-bj)
zone d83f9891-c31b-4a80-ae58-ebfbfc74e49a (cn-bj-test1)
bucket demo1[e8921092-c7e8-42d8-80d1-5e83c25d5661.72254.1]
source zone e8921092-c7e8-42d8-80d1-5e83c25d5661 (cn-bj-test2)
full sync: 0/16 shards
incremental sync: 5/16 shards
bucket is behind on 5 shards
behind shards: [3,5,9,11,15]
[root@slave supdev]# radosgw-admin sync status
realm f6ab846d-fb50-4f02-b129-98c13dce3376 (cn)
zonegroup e56bf383-f61f-4dc7-9f59-9a8aaa801e3a (cn-bj)
zone d83f9891-c31b-4a80-ae58-ebfbfc74e49a (cn-bj-test1)
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: e8921092-c7e8-42d8-80d1-5e83c25d5661 (cn-bj-test2)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 1 shards
behind shards: [26]
oldest incremental change not applied: 2019-01-30 15:26:00.0.117999s
[root@slave supdev]# radosgw-admin sync status
realm f6ab846d-fb50-4f02-b129-98c13dce3376 (cn)
zonegroup e56bf383-f61f-4dc7-9f59-9a8aaa801e3a (cn-bj)
zone d83f9891-c31b-4a80-ae58-ebfbfc74e49a (cn-bj-test1)
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: e8921092-c7e8-42d8-80d1-5e83c25d5661 (cn-bj-test2)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
主集群上关闭对应bucket的sync
代码语言:javascript复制[root@master supdev]# radosgw-admin bucket sync disable --bucket=demo1
[root@master supdev]# radosgw-admin bucket sync status --bucket=demo1
realm f6ab846d-fb50-4f02-b129-98c13dce3376 (cn)
zonegroup e56bf383-f61f-4dc7-9f59-9a8aaa801e3a (cn-bj)
zone e8921092-c7e8-42d8-80d1-5e83c25d5661 (cn-bj-test2)
bucket demo1[e8921092-c7e8-42d8-80d1-5e83c25d5661.72254.1]
Sync is disabled for bucket demo1
关停主、从所有机器的rgw服务
代码语言:javascript复制[root@master supdev]# systemctl stop ceph-radosgw@`hostname -s`
[root@slave supdev]# systemctl stop ceph-radosgw@`hostname -s`
之后在主集群上的任意节点执行下面操作,手工对指定bucket做reshard,记录对应的old bucket instance id
代码语言:javascript复制[root@master supdev]# radosgw-admin bucket reshard --bucket=demo1 --num-shards=32
*** NOTICE: operation will not remove old bucket index objects ***
*** these will need to be removed manually ***
tenant:
bucket name: demo1
old bucket instance id: e8921092-c7e8-42d8-80d1-5e83c25d5661.72254.1
new bucket instance id: e8921092-c7e8-42d8-80d1-5e83c25d5661.72437.1
total entries: 6
之后在从集群删除bucket所有数据,同时清除bucket (数据量大会耗时很长)
代码语言:javascript复制[root@slave supdev]# radosgw-admin bucket rm --purge-objects --bucket=demo1
[root@slave supdev]# radosgw-admin bucket list
[
"demo8",
"demo4",
"demo5",
"demo2",
"demo6",
"demo7",
"demo3",
"demo9"
]
回到主集群,删除旧bucket的bi信息
代码语言:javascript复制[root@master supdev]# radosgw-admin bi purge --bucket-id="e8921092-c7e8-42d8-80d1-5e83c25d5661.72254.1" --bucket=demo1
启动所有主、从节点上的rgw服务
代码语言:javascript复制[root@master supdev]# systemctl start ceph-radosgw@`hostname -s`
[root@slave supdev]# systemctl start ceph-radosgw@`hostname -s`
最后重新开启对应bucket的sync,等待主集群里面的数据慢慢同步回从集群。
代码语言:javascript复制[root@master supdev]# radosgw-admin bucket sync enable --bucket=demo1
[root@master supdev]# radosgw-admin bucket sync status --bucket=demo1
realm f6ab846d-fb50-4f02-b129-98c13dce3376 (cn)
zonegroup e56bf383-f61f-4dc7-9f59-9a8aaa801e3a (cn-bj)
zone e8921092-c7e8-42d8-80d1-5e83c25d5661 (cn-bj-test2)
bucket demo1[e8921092-c7e8-42d8-80d1-5e83c25d5661.72254.1]
source zone d83f9891-c31b-4a80-ae58-ebfbfc74e49a (cn-bj-test1)
full sync: 0/32 shards
incremental sync: 32/32 shards
bucket is behind on 26 shards
behind shards: [2,3,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,25,26,27,29,30]