修复Multisite下的reshard同步

2019-05-09 14:15:42 浏览数 (2)

业务背景

默认情况下只有当单个bucket承载的object数量过多,导致omap过大才会要做reshard,所以理论上reshard操作比较少,但是在开启了Multisite的情况下,一旦对bucket进行了reshard操作,则会破坏原有的元数据对应规则,导致对应的bucket无法进行数据同步,官方从L版本的12.2.8开始才有了如何在Multisite场景下的reshard修复方案。值得注意的是开启了Multisite的环境千万不要开auto reshard。

注意事项

bucket reshard是非常耗时耗资源的一项操作,生产上应该尽量避免。一旦你决定要在Multisite下面进行reshard,并且要修复对应bucket的数据同步功能,则意味着你要停掉整个集群的RGW服务,因此进行整个修复操作之前务必知道这样做的风险,避免造成重大损失。

操作流程

查看主从集群的同步状态,确保对应bucket已经完成数据同步,建议停掉对应bucket的数据写入并等待同步完成。

代码语言:javascript复制
[root@master supdev]# radosgw-admin bucket sync status --bucket=demo1
          realm f6ab846d-fb50-4f02-b129-98c13dce3376 (cn)
      zonegroup e56bf383-f61f-4dc7-9f59-9a8aaa801e3a (cn-bj)
           zone e8921092-c7e8-42d8-80d1-5e83c25d5661 (cn-bj-test2)
         bucket demo1[e8921092-c7e8-42d8-80d1-5e83c25d5661.72254.1]

    source zone d83f9891-c31b-4a80-ae58-ebfbfc74e49a (cn-bj-test1)
                full sync: 0/16 shards
                incremental sync: 16/16 shards
                bucket is behind on 6 shards
                behind shards: [3,5,8,9,11,15]
[root@master supdev]# radosgw-admin  sync status
          realm f6ab846d-fb50-4f02-b129-98c13dce3376 (cn)
      zonegroup e56bf383-f61f-4dc7-9f59-9a8aaa801e3a (cn-bj)
           zone e8921092-c7e8-42d8-80d1-5e83c25d5661 (cn-bj-test2)
  metadata sync no sync (zone is master)
      data sync source: d83f9891-c31b-4a80-ae58-ebfbfc74e49a (cn-bj-test1)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

从集群检查

代码语言:javascript复制
[root@slave supdev]# radosgw-admin bucket sync status --bucket=demo1
          realm f6ab846d-fb50-4f02-b129-98c13dce3376 (cn)
      zonegroup e56bf383-f61f-4dc7-9f59-9a8aaa801e3a (cn-bj)
           zone d83f9891-c31b-4a80-ae58-ebfbfc74e49a (cn-bj-test1)
         bucket demo1[e8921092-c7e8-42d8-80d1-5e83c25d5661.72254.1]

    source zone e8921092-c7e8-42d8-80d1-5e83c25d5661 (cn-bj-test2)
                full sync: 0/16 shards
                incremental sync: 5/16 shards
                bucket is behind on 5 shards
                behind shards: [3,5,9,11,15]
[root@slave supdev]# radosgw-admin  sync status
          realm f6ab846d-fb50-4f02-b129-98c13dce3376 (cn)
      zonegroup e56bf383-f61f-4dc7-9f59-9a8aaa801e3a (cn-bj)
           zone d83f9891-c31b-4a80-ae58-ebfbfc74e49a (cn-bj-test1)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: e8921092-c7e8-42d8-80d1-5e83c25d5661 (cn-bj-test2)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is behind on 1 shards
                        behind shards: [26]
                        oldest incremental change not applied: 2019-01-30 15:26:00.0.117999s
[root@slave supdev]# radosgw-admin  sync status
          realm f6ab846d-fb50-4f02-b129-98c13dce3376 (cn)
      zonegroup e56bf383-f61f-4dc7-9f59-9a8aaa801e3a (cn-bj)
           zone d83f9891-c31b-4a80-ae58-ebfbfc74e49a (cn-bj-test1)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: e8921092-c7e8-42d8-80d1-5e83c25d5661 (cn-bj-test2)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

主集群上关闭对应bucket的sync

代码语言:javascript复制
[root@master supdev]# radosgw-admin bucket sync disable --bucket=demo1
[root@master supdev]# radosgw-admin bucket sync status --bucket=demo1
          realm f6ab846d-fb50-4f02-b129-98c13dce3376 (cn)
      zonegroup e56bf383-f61f-4dc7-9f59-9a8aaa801e3a (cn-bj)
           zone e8921092-c7e8-42d8-80d1-5e83c25d5661 (cn-bj-test2)
         bucket demo1[e8921092-c7e8-42d8-80d1-5e83c25d5661.72254.1]

Sync is disabled for bucket demo1

关停主、从所有机器的rgw服务

代码语言:javascript复制
[root@master supdev]# systemctl stop ceph-radosgw@`hostname -s`

[root@slave supdev]# systemctl stop ceph-radosgw@`hostname -s`

之后在主集群上的任意节点执行下面操作,手工对指定bucket做reshard,记录对应的old bucket instance id

代码语言:javascript复制
[root@master supdev]# radosgw-admin bucket reshard --bucket=demo1 --num-shards=32
*** NOTICE: operation will not remove old bucket index objects ***
***         these will need to be removed manually             ***
tenant:
bucket name: demo1
old bucket instance id: e8921092-c7e8-42d8-80d1-5e83c25d5661.72254.1
new bucket instance id: e8921092-c7e8-42d8-80d1-5e83c25d5661.72437.1
total entries: 6

之后在从集群删除bucket所有数据,同时清除bucket (数据量大会耗时很长)

代码语言:javascript复制
[root@slave supdev]# radosgw-admin bucket rm --purge-objects --bucket=demo1
[root@slave supdev]# radosgw-admin bucket list
[
    "demo8",
    "demo4",
    "demo5",
    "demo2",
    "demo6",
    "demo7",
    "demo3",
    "demo9"
]

回到主集群,删除旧bucket的bi信息

代码语言:javascript复制
[root@master supdev]# radosgw-admin bi purge --bucket-id="e8921092-c7e8-42d8-80d1-5e83c25d5661.72254.1" --bucket=demo1

启动所有主、从节点上的rgw服务

代码语言:javascript复制
[root@master supdev]# systemctl start ceph-radosgw@`hostname -s`

[root@slave supdev]# systemctl start ceph-radosgw@`hostname -s`

最后重新开启对应bucket的sync,等待主集群里面的数据慢慢同步回从集群。

代码语言:javascript复制
[root@master supdev]# radosgw-admin bucket sync enable --bucket=demo1
[root@master supdev]# radosgw-admin bucket sync status --bucket=demo1
          realm f6ab846d-fb50-4f02-b129-98c13dce3376 (cn)
      zonegroup e56bf383-f61f-4dc7-9f59-9a8aaa801e3a (cn-bj)
           zone e8921092-c7e8-42d8-80d1-5e83c25d5661 (cn-bj-test2)
         bucket demo1[e8921092-c7e8-42d8-80d1-5e83c25d5661.72254.1]

    source zone d83f9891-c31b-4a80-ae58-ebfbfc74e49a (cn-bj-test1)
                full sync: 0/32 shards
                incremental sync: 32/32 shards
                bucket is behind on 26 shards
                behind shards: [2,3,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,25,26,27,29,30]

0 人点赞