Bluestore下的SSD故障排查

2019-05-09 14:15:05 浏览数 (1)

线上发现L版本一个OSD down,不确定是否磁盘故障,之前的filestore排查起来比较熟,换成Bluestore以后,有些细节上的操作不一样,因为用到的是SSD,所以有了这篇排查文档。

排查过程

定位故障节点

代码语言:javascript复制
[root@demo-host ceph]# ceph osd tree|grep down
  20        1.00000                 osd.20              down        0 1.00000
[root@demo-host ceph]# ceph osd find 20
{
    "osd": 20,
    "ip": "192.168.8.124:6800/1298894",
    "osd_fsid": "a99bc25c-4cf4-5429-9171-4084555af14b",
    "crush_location": {
        "host": "demo-host-ssd",
        "media": "site1-rack1-ssd",
        "mediagroup": "site1-ssd",
        "root": "default"
    }
}

登录到上面的192.168.8.124,执行命令"dmesg -T",发现有dm-0设备发生io错误

代码语言:javascript复制
[Wed Feb 27 16:24:02 2019] hpsa 0000:03:00.0: Acknowledging event: 0x40000032 (HP SSD Smart Path state change)
[Wed Feb 27 16:24:02 2019] hpsa 0000:03:00.0: hpsa_update_device_info: LV failed, device will be skipped.
[Wed Feb 27 16:24:02 2019] hpsa 0000:03:00.0: scsi 0:1:0:0: updated Direct-Access     HP       LOGICAL VOLUME   RAID-1( 0) SSDSmartPathCap  En  Exp=1
[Wed Feb 27 16:24:02 2019] hpsa 0000:03:00.0: scsi 0:1:0:2: updated Direct-Access     HP       LOGICAL VOLUME   RAID-0 SSDSmartPathCap  En  Exp=1
[Wed Feb 27 16:24:21 2019] buffer_io_error: 1 callbacks suppressed
[Wed Feb 27 16:24:21 2019] Buffer I/O error on dev dm-0, logical block 0, async page read
[Wed Feb 27 16:24:21 2019] Buffer I/O error on dev dm-0, logical block 0, async page read
[Wed Feb 27 16:24:21 2019] Buffer I/O error on dev dm-0, logical block 0, async page read
[Wed Feb 27 16:24:21 2019] Buffer I/O error on dev dm-0, logical block 0, async page read
[Wed Feb 27 16:24:21 2019] Buffer I/O error on dev dm-0, logical block 0, async page read
[Wed Feb 27 16:24:21 2019] Buffer I/O error on dev dm-0, logical block 0, async page read
[Wed Feb 27 16:24:21 2019] Buffer I/O error on dev dm-0, logical block 0, async page read
[Wed Feb 27 16:24:21 2019] Buffer I/O error on dev dm-0, logical block 468834288, async page read
[Wed Feb 27 16:24:21 2019] Buffer I/O error on dev dm-0, logical block 468834288, async page read
[Wed Feb 27 16:24:22 2019] Buffer I/O error on dev dm-0, logical block 468834288, async page read

检查osd日志出现“ERROR: osd init failed: (5) Input/output error ”

代码语言:javascript复制
[root@demo-host ceph]# tail -100 /var/log/ceph/ceph-osd.20.log
2019-02-27 16:31:34.492858 7fc0f33aed80  1 bdev(0x55d1a3c16000 /var/lib/ceph/osd/ceph-20/block) open size 1920345309184 (0x1bf1d800000, 1.75TiB) block_size 4096 (4KiB) non-rotational
2019-02-27 16:31:34.492906 7fc0f33aed80 -1 bluestore(/var/lib/ceph/osd/ceph-20/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-20/block: (5) Input/output error
2019-02-27 16:31:34.492917 7fc0f33aed80  1 bdev(0x55d1a3c16000 /var/lib/ceph/osd/ceph-20/block) close
2019-02-27 16:31:34.751175 7fc0f33aed80  1 bluestore(/var/lib/ceph/osd/ceph-20) _mount path /var/lib/ceph/osd/ceph-20
2019-02-27 16:31:34.751738 7fc0f33aed80 -1 bluestore(/var/lib/ceph/osd/ceph-20/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-20/block: (5) Input/output error
2019-02-27 16:31:34.751776 7fc0f33aed80  1 bdev create path /var/lib/ceph/osd/ceph-20/block type kernel
2019-02-27 16:31:34.751779 7fc0f33aed80  1 bdev(0x55d1a3c16200 /var/lib/ceph/osd/ceph-20/block) open path /var/lib/ceph/osd/ceph-20/block
2019-02-27 16:31:34.751978 7fc0f33aed80  1 bdev(0x55d1a3c16200 /var/lib/ceph/osd/ceph-20/block) open size 1920345309184 (0x1bf1d800000, 1.75TiB) block_size 4096 (4KiB) non-rotational
2019-02-27 16:31:34.752485 7fc0f33aed80 -1 bluestore(/var/lib/ceph/osd/ceph-20/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-20/block: (5) Input/output error
2019-02-27 16:31:34.752495 7fc0f33aed80  1 bdev(0x55d1a3c16200 /var/lib/ceph/osd/ceph-20/block) close
2019-02-27 16:31:35.009776 7fc0f33aed80 -1 osd.20 0 OSD:init: unable to mount object store
2019-02-27 16:31:35.009796 7fc0f33aed80 -1  ** ERROR: osd init failed: (5) Input/output error
2019-02-27 16:31:55.220715 7ff4da40cd80  0 set uid:gid to 167:167 (ceph:ceph)
2019-02-27 16:31:55.220746 7ff4da40cd80  0 ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable), process ceph-osd, pid 1564222
2019-02-27 16:31:55.221547 7ff4da40cd80 -1 bluestore(/var/lib/ceph/osd/ceph-20/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-20/block: (5) Input/output error
2019-02-27 16:31:55.221977 7ff4da40cd80 -1 bluestore(/var/lib/ceph/osd/ceph-20/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-20/block: (5) Input/output error
2019-02-27 16:31:55.222331 7ff4da40cd80 -1 bluestore(/var/lib/ceph/osd/ceph-20/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-20/block: (5) Input/output error
2019-02-27 16:31:55.222747 7ff4da40cd80 -1 bluestore(/var/lib/ceph/osd/ceph-20/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-20/block: (5) Input/output error
2019-02-27 16:31:55.226811 7ff4da40cd80  0 pidfile_write: ignore empty --pid-file
2019-02-27 16:31:55.235463 7ff4da40cd80  0 load: jerasure load: lrc load: isa
2019-02-27 16:31:55.235531 7ff4da40cd80  1 bdev create path /var/lib/ceph/osd/ceph-20/block type kernel
2019-02-27 16:31:55.235538 7ff4da40cd80  1 bdev(0x5608d71b6000 /var/lib/ceph/osd/ceph-20/block) open path /var/lib/ceph/osd/ceph-20/block
2019-02-27 16:31:55.236101 7ff4da40cd80  1 bdev(0x5608d71b6000 /var/lib/ceph/osd/ceph-20/block) open size 1920345309184 (0x1bf1d800000, 1.75TiB) block_size 4096 (4KiB) non-rotational
2019-02-27 16:31:55.236467 7ff4da40cd80 -1 bluestore(/var/lib/ceph/osd/ceph-20/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-20/block: (5) Input/output error
2019-02-27 16:31:55.236478 7ff4da40cd80  1 bdev(0x5608d71b6000 /var/lib/ceph/osd/ceph-20/block) close
2019-02-27 16:31:55.494201 7ff4da40cd80  1 bluestore(/var/lib/ceph/osd/ceph-20) _mount path /var/lib/ceph/osd/ceph-20
2019-02-27 16:31:55.494686 7ff4da40cd80 -1 bluestore(/var/lib/ceph/osd/ceph-20/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-20/block: (5) Input/output error
2019-02-27 16:31:55.494724 7ff4da40cd80  1 bdev create path /var/lib/ceph/osd/ceph-20/block type kernel
2019-02-27 16:31:55.494727 7ff4da40cd80  1 bdev(0x5608d71b6200 /var/lib/ceph/osd/ceph-20/block) open path /var/lib/ceph/osd/ceph-20/block
2019-02-27 16:31:55.494921 7ff4da40cd80  1 bdev(0x5608d71b6200 /var/lib/ceph/osd/ceph-20/block) open size 1920345309184 (0x1bf1d800000, 1.75TiB) block_size 4096 (4KiB) non-rotational
2019-02-27 16:31:55.495323 7ff4da40cd80 -1 bluestore(/var/lib/ceph/osd/ceph-20/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-20/block: (5) Input/output error
2019-02-27 16:31:55.495335 7ff4da40cd80  1 bdev(0x5608d71b6200 /var/lib/ceph/osd/ceph-20/block) close
2019-02-27 16:31:55.758790 7ff4da40cd80 -1 osd.20 0 OSD:init: unable to mount object store
2019-02-27 16:31:55.758804 7ff4da40cd80 -1  ** ERROR: osd init failed: (5) Input/output error

接下来确定dm-0是不是和osd-20有关联,熟悉的顺藤摸瓜操作如下,注意warning提示有PV丢失

代码语言:javascript复制
[root@demo-host ceph]# ls -l  /var/lib/ceph/osd/ceph-20/
total 48
-rw-r--r-- 1 ceph ceph 456 Feb 25 19:56 activate.monmap
lrwxrwxrwx 1 ceph ceph  93 Feb 25 19:56 block -> /dev/ceph-72de7913-115e-4df5-868d-7f4cf7ea2b37/osd-block-a99bc25c-4cf4-5429-9171-4084555af14b #注意对应的LV和VG
-rw-r--r-- 1 ceph ceph   2 Feb 25 19:56 bluefs
-rw-r--r-- 1 ceph ceph  37 Feb 25 19:56 ceph_fsid
-rw-r--r-- 1 ceph ceph  37 Feb 25 19:56 fsid
-rw------- 1 ceph ceph  56 Feb 25 19:56 keyring
-rw-r--r-- 1 ceph ceph   8 Feb 25 19:56 kv_backend
-rw-r--r-- 1 ceph ceph  21 Feb 25 19:56 magic
-rw-r--r-- 1 ceph ceph   4 Feb 25 19:56 mkfs_done
-rw-r--r-- 1 ceph ceph  41 Feb 25 19:56 osd_key
-rw-r--r-- 1 ceph ceph   6 Feb 25 19:56 ready
-rw-r--r-- 1 ceph ceph  10 Feb 25 19:56 type
-rw-r--r-- 1 ceph ceph   3 Feb 25 19:56 whoami

[root@demo-host ceph]# vgs
  WARNING: Device for PV BLh8zq-EFYG-Yoy9-75uU-b7VM-p66e-o0C6Fu not found or rejected by a filter.
  VG                                        #PV #LV #SN Attr   VSize  VFree
  ceph-04507dae-fa43-4a5f-910e-9a7f8fd64fbd   1   1   0 wz--n- <5.46t    0
  ceph-2d626a29-6409-4edd-b3e0-df6dc0259629   1   1   0 wz--n- <5.46t    0
  ceph-72de7913-115e-4df5-868d-7f4cf7ea2b37   1   1   0 wz-pn- <1.75t    0 #注意
  ceph-782b8301-ed74-4809-b39c-755bebd86a81   1   1   0 wz--n- <1.75t    0
  ceph-94096bc2-4a6a-4ac5-922a-9129e3b96311   1   1   0 wz--n- <5.46t    0
  ceph-957c14e6-c45e-4794-a6c4-92e55b267fd6   1   1   0 wz--n- <5.46t    0
  ceph-9a7d0101-f451-4e86-b1c0-3a4e4fced4bd   1   1   0 wz--n- <5.46t    0
  ceph-a150e02b-ec98-48ac-92b9-4f754aea19c2   1   1   0 wz--n- <5.46t    0
  ceph-d3c92af2-9aee-4141-a693-9d21c329bec6   1   1   0 wz--n- <5.46t    0
  ceph-dfe4f8f2-880f-414d-af58-5b3c77ed2628   1   1   0 wz--n- <5.46t    0
[root@demo-host ceph]# lvs
  WARNING: Device for PV BLh8zq-EFYG-Yoy9-75uU-b7VM-p66e-o0C6Fu not found or rejected by a filter.
  LV                                             VG                                        Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  osd-block-737138bb-53f8-5f20-b131-d776fec5e62e ceph-04507dae-fa43-4a5f-910e-9a7f8fd64fbd -wi-ao---- <5.46t
  osd-block-31724c12-5cab-54ba-a0ea-f7bd0c5bdb39 ceph-2d626a29-6409-4edd-b3e0-df6dc0259629 -wi-ao---- <5.46t
  osd-block-a99bc25c-4cf4-5429-9171-4084555af14b ceph-72de7913-115e-4df5-868d-7f4cf7ea2b37 -wi-a---p- <1.75t #注意
  osd-block-8505d8f5-4ea3-59d0-870e-59d360f5015c ceph-782b8301-ed74-4809-b39c-755bebd86a81 -wi-ao---- <1.75t
  osd-block-e9a70833-590b-5993-9638-179baaa782a5 ceph-94096bc2-4a6a-4ac5-922a-9129e3b96311 -wi-ao---- <5.46t
  osd-block-31541688-fb32-5337-af90-09d185613075 ceph-957c14e6-c45e-4794-a6c4-92e55b267fd6 -wi-ao---- <5.46t
  osd-block-df6cd15a-1b5c-5443-a062-50fa64fa9d07 ceph-9a7d0101-f451-4e86-b1c0-3a4e4fced4bd -wi-ao---- <5.46t
  osd-block-b28a126d-0a7b-503d-80c5-7cbaa04d0a9b ceph-a150e02b-ec98-48ac-92b9-4f754aea19c2 -wi-ao---- <5.46t
  osd-block-377ff375-d2bf-5ad9-94b4-2127b6dcf9e7 ceph-d3c92af2-9aee-4141-a693-9d21c329bec6 -wi-ao---- <5.46t
  osd-block-4f147edf-9cb7-5263-bec0-3fa34dc0373f ceph-dfe4f8f2-880f-414d-af58-5b3c77ed2628 -wi-ao---- <5.46t

[root@demo-host ceph]# ls -l  /dev/ceph-72de7913-115e-4df5-868d-7f4cf7ea2b37/osd-block-a99bc25c-4cf4-5429-9171-4084555af14b
lrwxrwxrwx 1 ceph ceph 7 Feb 27 16:32 /dev/ceph-72de7913-115e-4df5-868d-7f4cf7ea2b37/osd-block-a99bc25c-4cf4-5429-9171-4084555af14b -> ../dm-0 #验明正身

接下来查看Raid卡信息,确定磁盘状态

代码语言:javascript复制
[root@demo-host ceph]# hpssacli ctrl slot=0  show  config detail


  Array: B
      Interface Type: Solid State SATA
      Unused Space: 0  MB (0.0%)
      Used Space: 1.7 TB (100.0%)
      Status: Failed Physical Drive #盘丢了,warning也有提示
      MultiDomain Status: OK
      Array Type: Data       HPE SSD Smart Path: enable

      Warning: One of the drives on this array have failed or has been removed.




      Logical Drive: 2
         Size: 1.7 TB
         Fault Tolerance: 0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 256 KB
         Status: Failed #挂了
         MultiDomain Status: OK
         Caching:  Disabled
         Unique Identifier: 600508B1001C3FBF225890CDE3612E98
         Logical Drive Label: 0606978APVYKH0BRH9507N6082
         Drive Type: Data
         LD Acceleration Method: HPE SSD Smart Path

      physicaldrive 2I:4:1 #记录插槽序号,后面备用
         Port: 2I
         Box: 4
         Bay: 1
         Status: Failed
         Last Failure Reason: Hot removed
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 1920.3 GB
         Drive exposed to OS: False
         Native Block Size: 4096
         Firmware Revision: 4IYVHPG1
         Serial Number: BTYS802201ZJ1P9DGN
         Model: ATA     VK001920GWJPH
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Maximum Temperature (C): 41
         Usage remaining: 99.80%
         Power On Hours: 4868 #真是一块短命的SSD
         Estimated Life Remaining based on workload to date: 101213 days
         SSD Smart Trip Wearout: False
         PHY Count: 1
         PHY Transfer Rate: Unknown
         Drive Authentication Status: Not Applicable
         Sanitize Erase Supported: False

基本确定是SSD损坏,接下来开始清除系统残留的LVM信息,先umount掉对应目录

代码语言:javascript复制
[root@demo-host ceph]# mount -l|grep ceph
tmpfs on /var/lib/ceph/osd/ceph-20 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-21 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-22 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-23 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-24 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-25 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-26 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-27 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-28 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-29 type tmpfs (rw,relatime)
[root@demo-host ceph]# umount  /var/lib/ceph/osd/ceph-20
[root@demo-host ceph]# mount -l|grep ceph
tmpfs on /var/lib/ceph/osd/ceph-21 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-22 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-23 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-24 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-25 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-26 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-27 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-28 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-29 type tmpfs (rw,relatime)

检查vg和lv信息,注意pv有warning,提示丢了盘

代码语言:javascript复制
[root@demo-host ceph]# vgs
  WARNING: Device for PV BLh8zq-EFYG-Yoy9-75uU-b7VM-p66e-o0C6Fu not found or rejected by a filter.
  VG                                        #PV #LV #SN Attr   VSize  VFree
  ceph-04507dae-fa43-4a5f-910e-9a7f8fd64fbd   1   1   0 wz--n- <5.46t    0
  ceph-2d626a29-6409-4edd-b3e0-df6dc0259629   1   1   0 wz--n- <5.46t    0
  ceph-72de7913-115e-4df5-868d-7f4cf7ea2b37   1   1   0 wz-pn- <1.75t    0
  ceph-782b8301-ed74-4809-b39c-755bebd86a81   1   1   0 wz--n- <1.75t    0
  ceph-94096bc2-4a6a-4ac5-922a-9129e3b96311   1   1   0 wz--n- <5.46t    0
  ceph-957c14e6-c45e-4794-a6c4-92e55b267fd6   1   1   0 wz--n- <5.46t    0
  ceph-9a7d0101-f451-4e86-b1c0-3a4e4fced4bd   1   1   0 wz--n- <5.46t    0
  ceph-a150e02b-ec98-48ac-92b9-4f754aea19c2   1   1   0 wz--n- <5.46t    0
  ceph-d3c92af2-9aee-4141-a693-9d21c329bec6   1   1   0 wz--n- <5.46t    0
  ceph-dfe4f8f2-880f-414d-af58-5b3c77ed2628   1   1   0 wz--n- <5.46t    0
[root@demo-host ceph]# lvs
  WARNING: Device for PV BLh8zq-EFYG-Yoy9-75uU-b7VM-p66e-o0C6Fu not found or rejected by a filter.
  LV                                             VG                                        Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  osd-block-737138bb-53f8-5f20-b131-d776fec5e62e ceph-04507dae-fa43-4a5f-910e-9a7f8fd64fbd -wi-ao---- <5.46t
  osd-block-31724c12-5cab-54ba-a0ea-f7bd0c5bdb39 ceph-2d626a29-6409-4edd-b3e0-df6dc0259629 -wi-ao---- <5.46t
  osd-block-a99bc25c-4cf4-5429-9171-4084555af14b ceph-72de7913-115e-4df5-868d-7f4cf7ea2b37 -wi-a---p- <1.75t
  osd-block-8505d8f5-4ea3-59d0-870e-59d360f5015c ceph-782b8301-ed74-4809-b39c-755bebd86a81 -wi-ao---- <1.75t
  osd-block-e9a70833-590b-5993-9638-179baaa782a5 ceph-94096bc2-4a6a-4ac5-922a-9129e3b96311 -wi-ao---- <5.46t
  osd-block-31541688-fb32-5337-af90-09d185613075 ceph-957c14e6-c45e-4794-a6c4-92e55b267fd6 -wi-ao---- <5.46t
  osd-block-df6cd15a-1b5c-5443-a062-50fa64fa9d07 ceph-9a7d0101-f451-4e86-b1c0-3a4e4fced4bd -wi-ao---- <5.46t
  osd-block-b28a126d-0a7b-503d-80c5-7cbaa04d0a9b ceph-a150e02b-ec98-48ac-92b9-4f754aea19c2 -wi-ao---- <5.46t
  osd-block-377ff375-d2bf-5ad9-94b4-2127b6dcf9e7 ceph-d3c92af2-9aee-4141-a693-9d21c329bec6 -wi-ao---- <5.46t
  osd-block-4f147edf-9cb7-5263-bec0-3fa34dc0373f ceph-dfe4f8f2-880f-414d-af58-5b3c77ed2628 -wi-ao---- <5.46t

遵从LVM三级结构,LV->VG->PV,先删掉lv和vg,发现有残留

代码语言:javascript复制
[root@demo-host ceph]# vgremove ceph-72de7913-115e-4df5-868d-7f4cf7ea2b37
  WARNING: Device for PV BLh8zq-EFYG-Yoy9-75uU-b7VM-p66e-o0C6Fu not found or rejected by a filter.
  WARNING: 1 physical volumes are currently missing from the system.
Do you really want to remove volume group "ceph-72de7913-115e-4df5-868d-7f4cf7ea2b37" containing 1 logical volumes? [y/n]: y
Do you really want to remove active logical volume ceph-72de7913-115e-4df5-868d-7f4cf7ea2b37/osd-block-a99bc25c-4cf4-5429-9171-4084555af14b? [y/n]: y
  Aborting vg_write: No metadata areas to write to!
[root@demo-host ceph]# lvs
  WARNING: Device for PV BLh8zq-EFYG-Yoy9-75uU-b7VM-p66e-o0C6Fu not found or rejected by a filter.
  LV                                             VG                                        Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  osd-block-737138bb-53f8-5f20-b131-d776fec5e62e ceph-04507dae-fa43-4a5f-910e-9a7f8fd64fbd -wi-ao---- <5.46t
  osd-block-31724c12-5cab-54ba-a0ea-f7bd0c5bdb39 ceph-2d626a29-6409-4edd-b3e0-df6dc0259629 -wi-ao---- <5.46t
  osd-block-a99bc25c-4cf4-5429-9171-4084555af14b ceph-72de7913-115e-4df5-868d-7f4cf7ea2b37 -wi-----p- <1.75t
  osd-block-8505d8f5-4ea3-59d0-870e-59d360f5015c ceph-782b8301-ed74-4809-b39c-755bebd86a81 -wi-ao---- <1.75t
  osd-block-e9a70833-590b-5993-9638-179baaa782a5 ceph-94096bc2-4a6a-4ac5-922a-9129e3b96311 -wi-ao---- <5.46t
  osd-block-31541688-fb32-5337-af90-09d185613075 ceph-957c14e6-c45e-4794-a6c4-92e55b267fd6 -wi-ao---- <5.46t
  osd-block-df6cd15a-1b5c-5443-a062-50fa64fa9d07 ceph-9a7d0101-f451-4e86-b1c0-3a4e4fced4bd -wi-ao---- <5.46t
  osd-block-b28a126d-0a7b-503d-80c5-7cbaa04d0a9b ceph-a150e02b-ec98-48ac-92b9-4f754aea19c2 -wi-ao---- <5.46t
  osd-block-377ff375-d2bf-5ad9-94b4-2127b6dcf9e7 ceph-d3c92af2-9aee-4141-a693-9d21c329bec6 -wi-ao---- <5.46t
  osd-block-4f147edf-9cb7-5263-bec0-3fa34dc0373f ceph-dfe4f8f2-880f-414d-af58-5b3c77ed2628 -wi-ao---- <5.46t
[root@demo-host ceph]# vgs
  WARNING: Device for PV BLh8zq-EFYG-Yoy9-75uU-b7VM-p66e-o0C6Fu not found or rejected by a filter.
  VG                                        #PV #LV #SN Attr   VSize  VFree
  ceph-04507dae-fa43-4a5f-910e-9a7f8fd64fbd   1   1   0 wz--n- <5.46t    0
  ceph-2d626a29-6409-4edd-b3e0-df6dc0259629   1   1   0 wz--n- <5.46t    0
  ceph-72de7913-115e-4df5-868d-7f4cf7ea2b37   1   1   0 wz-pn- <1.75t    0
  ceph-782b8301-ed74-4809-b39c-755bebd86a81   1   1   0 wz--n- <1.75t    0
  ceph-94096bc2-4a6a-4ac5-922a-9129e3b96311   1   1   0 wz--n- <5.46t    0
  ceph-957c14e6-c45e-4794-a6c4-92e55b267fd6   1   1   0 wz--n- <5.46t    0
  ceph-9a7d0101-f451-4e86-b1c0-3a4e4fced4bd   1   1   0 wz--n- <5.46t    0
  ceph-a150e02b-ec98-48ac-92b9-4f754aea19c2   1   1   0 wz--n- <5.46t    0
  ceph-d3c92af2-9aee-4141-a693-9d21c329bec6   1   1   0 wz--n- <5.46t    0
  ceph-dfe4f8f2-880f-414d-af58-5b3c77ed2628   1   1   0 wz--n- <5.46t    0

检查pv状态,发现有unknown设备,提示有pv设备“BLh8zq-EFYG-Yoy9-75uU-b7VM-p66e-o0C6Fu” 丢失

代码语言:javascript复制
[root@demo-host ceph]# pvs
  WARNING: Device for PV BLh8zq-EFYG-Yoy9-75uU-b7VM-p66e-o0C6Fu not found or rejected by a filter.
  WARNING: Device for PV BLh8zq-EFYG-Yoy9-75uU-b7VM-p66e-o0C6Fu not found or rejected by a filter.
  PV         VG                                        Fmt  Attr PSize  PFree
  /dev/sdc   ceph-782b8301-ed74-4809-b39c-755bebd86a81 lvm2 a--  <1.75t    0
  /dev/sdd   ceph-9a7d0101-f451-4e86-b1c0-3a4e4fced4bd lvm2 a--  <5.46t    0
  /dev/sde   ceph-04507dae-fa43-4a5f-910e-9a7f8fd64fbd lvm2 a--  <5.46t    0
  /dev/sdf   ceph-d3c92af2-9aee-4141-a693-9d21c329bec6 lvm2 a--  <5.46t    0
  /dev/sdg   ceph-a150e02b-ec98-48ac-92b9-4f754aea19c2 lvm2 a--  <5.46t    0
  /dev/sdh   ceph-2d626a29-6409-4edd-b3e0-df6dc0259629 lvm2 a--  <5.46t    0
  /dev/sdi   ceph-957c14e6-c45e-4794-a6c4-92e55b267fd6 lvm2 a--  <5.46t    0
  /dev/sdj   ceph-dfe4f8f2-880f-414d-af58-5b3c77ed2628 lvm2 a--  <5.46t    0
  /dev/sdk   ceph-94096bc2-4a6a-4ac5-922a-9129e3b96311 lvm2 a--  <5.46t    0
  [unknown]  ceph-72de7913-115e-4df5-868d-7f4cf7ea2b37 lvm2 a-m  <1.75t    0

手工删除pv是不行的,这里需要用到一个pvscan --cache命令去刷新缓存,之后再看pv、vg、lv通通都被清理掉了

代码语言:javascript复制
[root@demo-host ceph]# pvscan --cache
[root@demo-host ceph]# pvs
  PV         VG                                        Fmt  Attr PSize  PFree
  /dev/sdc   ceph-782b8301-ed74-4809-b39c-755bebd86a81 lvm2 a--  <1.75t    0
  /dev/sdd   ceph-9a7d0101-f451-4e86-b1c0-3a4e4fced4bd lvm2 a--  <5.46t    0
  /dev/sde   ceph-04507dae-fa43-4a5f-910e-9a7f8fd64fbd lvm2 a--  <5.46t    0
  /dev/sdf   ceph-d3c92af2-9aee-4141-a693-9d21c329bec6 lvm2 a--  <5.46t    0
  /dev/sdg   ceph-a150e02b-ec98-48ac-92b9-4f754aea19c2 lvm2 a--  <5.46t    0
  /dev/sdh   ceph-2d626a29-6409-4edd-b3e0-df6dc0259629 lvm2 a--  <5.46t    0
  /dev/sdi   ceph-957c14e6-c45e-4794-a6c4-92e55b267fd6 lvm2 a--  <5.46t    0
  /dev/sdj   ceph-dfe4f8f2-880f-414d-af58-5b3c77ed2628 lvm2 a--  <5.46t    0
  /dev/sdk   ceph-94096bc2-4a6a-4ac5-922a-9129e3b96311 lvm2 a--  <5.46t    0
[root@demo-host ceph]# vgs
  VG                                        #PV #LV #SN Attr   VSize  VFree
  ceph-04507dae-fa43-4a5f-910e-9a7f8fd64fbd   1   1   0 wz--n- <5.46t    0
  ceph-2d626a29-6409-4edd-b3e0-df6dc0259629   1   1   0 wz--n- <5.46t    0
  ceph-782b8301-ed74-4809-b39c-755bebd86a81   1   1   0 wz--n- <1.75t    0
  ceph-94096bc2-4a6a-4ac5-922a-9129e3b96311   1   1   0 wz--n- <5.46t    0
  ceph-957c14e6-c45e-4794-a6c4-92e55b267fd6   1   1   0 wz--n- <5.46t    0
  ceph-9a7d0101-f451-4e86-b1c0-3a4e4fced4bd   1   1   0 wz--n- <5.46t    0
  ceph-a150e02b-ec98-48ac-92b9-4f754aea19c2   1   1   0 wz--n- <5.46t    0
  ceph-d3c92af2-9aee-4141-a693-9d21c329bec6   1   1   0 wz--n- <5.46t    0
  ceph-dfe4f8f2-880f-414d-af58-5b3c77ed2628   1   1   0 wz--n- <5.46t    0
[root@demo-host ceph]# lvs
  LV                                             VG                                        Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  osd-block-737138bb-53f8-5f20-b131-d776fec5e62e ceph-04507dae-fa43-4a5f-910e-9a7f8fd64fbd -wi-ao---- <5.46t
  osd-block-31724c12-5cab-54ba-a0ea-f7bd0c5bdb39 ceph-2d626a29-6409-4edd-b3e0-df6dc0259629 -wi-ao---- <5.46t
  osd-block-8505d8f5-4ea3-59d0-870e-59d360f5015c ceph-782b8301-ed74-4809-b39c-755bebd86a81 -wi-ao---- <1.75t
  osd-block-e9a70833-590b-5993-9638-179baaa782a5 ceph-94096bc2-4a6a-4ac5-922a-9129e3b96311 -wi-ao---- <5.46t
  osd-block-31541688-fb32-5337-af90-09d185613075 ceph-957c14e6-c45e-4794-a6c4-92e55b267fd6 -wi-ao---- <5.46t
  osd-block-df6cd15a-1b5c-5443-a062-50fa64fa9d07 ceph-9a7d0101-f451-4e86-b1c0-3a4e4fced4bd -wi-ao---- <5.46t
  osd-block-b28a126d-0a7b-503d-80c5-7cbaa04d0a9b ceph-a150e02b-ec98-48ac-92b9-4f754aea19c2 -wi-ao---- <5.46t
  osd-block-377ff375-d2bf-5ad9-94b4-2127b6dcf9e7 ceph-d3c92af2-9aee-4141-a693-9d21c329bec6 -wi-ao---- <5.46t
  osd-block-4f147edf-9cb7-5263-bec0-3fa34dc0373f ceph-dfe4f8f2-880f-414d-af58-5b3c77ed2628 -wi-ao---- <5.46t

最后保守起见还是手工点亮故障灯,通知机房换盘

代码语言:javascript复制
[root@demo-host ceph]# hpssacli ctrl slot=0 pd 2I:4:1 modify led=on

总结

Bluestore用到了LVM,因此卸载物理磁盘之前,一定要遵循LVM规范去删除残留数据,不然之后换上新盘会造成管理上的混乱。

0 人点赞