分析 PG 故障并修复

目标

解决下面故障

# ceph -s
  cluster:
    id:     7e720238-7ada-4922-ba2e-xxxxxx4e4
    health: HEALTH_WARN
            Degraded data redundancy: 85 pgs unclean, 85 pgs degraded, 85 pgs undersized

  services:
    mon: 3 daemons, quorum ns-storage-020100,ns-storage-020101,ns-storage-020102
    mgr: ns-storage-020100(active), standbys: ns-storage-020101, ns-storage-020102
    osd: 18 osds: 18 up, 18 in; 43 remapped pgs

  data:
    pools:   3 pools, 1152 pgs
    objects: 250 objects, 631 MB
    usage:   40579 MB used, 66966 GB / 67006 GB avail
    pgs:     1024 active+clean
             85   active+undersized+degraded
             43   active+clean+remapped

尝试使用 ceph pg repair 对 pg 执行修复失败
检测 ceph pg 状态 如下

# ceph health detail
.......
    pg 3.c is stuck undersized for 6137651.255431, current state active+undersized+degraded, last acting [14,13]
    pg 3.d is stuck undersized for 6137651.146218, current state active+undersized+degraded, last acting [15,13]
.......

状态分析

pg状态解释
uncleanPG 故障, 没有完成指定副本数量
degradedPG中的一些对象还没有被复制到规定的份数
undersized该PG的副本数量小于存储池所配置的副本数量

故障原因

# ceph osd tree
ID  CLASS WEIGHT   TYPE NAME                              STATUS REWEIGHT PRI-AFF
-12       16.00000 root noah
 -9        8.00000     host ns-storage-020100.vclound.com
 12   hdd  4.00000         osd.12                             up  1.00000 1.00000
 13   hdd  4.00000         osd.13                             up  1.00000 1.00000
-10        8.00000     host ns-storage-020101.vclound.com
 14   hdd  4.00000         osd.14                             up  1.00000 1.00000
 15   hdd  4.00000         osd.15                             up  1.00000 1.00000
-11              0     host ns-storage-020102.vclound.com
 -1       55.63620 root default
 -2       15.63620     host ns-storage-020100
  0   hdd  3.63620         osd.0                              up  1.00000 1.00000
  1   hdd  4.00000         osd.1                              up  1.00000 1.00000
  2   hdd  4.00000         osd.2                              up  1.00000 1.00000
  3   hdd  4.00000         osd.3                              up  1.00000 1.00000
 -3       16.00000     host ns-storage-020101
  4   hdd  4.00000         osd.4                              up  1.00000 1.00000
  5   hdd  4.00000         osd.5                              up  1.00000 1.00000
  6   hdd  4.00000         osd.6                              up  1.00000 1.00000
  7   hdd  4.00000         osd.7                              up  1.00000 1.00000
 -4       24.00000     host ns-storage-020102
  8   hdd  4.00000         osd.8                              up  1.00000 1.00000
  9   hdd  4.00000         osd.9                              up  1.00000 1.00000
 10   hdd  4.00000         osd.10                             up  1.00000 1.00000
 11   hdd  4.00000         osd.11                             up  1.00000 1.00000
 16   hdd  4.00000         osd.16                             up  1.00000 1.00000     <--- 正常应该在  noah 根下
 17   hdd  4.00000         osd.17                             up  1.00000 1.00000     <--- 正常应该在  noah 根下

迁移 osd 至 noah 根

# ceph osd crush rm osd.16
removed item id 16 name 'osd.16' from crush map
# ceph osd crush rm osd.17
removed item id 17 name 'osd.17' from crush map
# ceph osd crush add osd.16 4.0 host=ns-storage-020102.vclound.com
add item id 16 name 'osd.16' weight 4 at location {host=ns-storage-020102.vclound.com} to crush map
# ceph osd crush add osd.17 4.0 host=ns-storage-020102.vclound.com
add item id 17 name 'osd.17' weight 4 at location {host=ns-storage-020102.vclound.com} to crush map

观察 osd tree

# ceph osd tree
ID  CLASS WEIGHT   TYPE NAME                              STATUS REWEIGHT PRI-AFF
-12       24.00000 root noah
 -9        8.00000     host ns-storage-020100.vclound.com
 12   hdd  4.00000         osd.12                             up  1.00000 1.00000
 13   hdd  4.00000         osd.13                             up  1.00000 1.00000
-10        8.00000     host ns-storage-020101.vclound.com
 14   hdd  4.00000         osd.14                             up  1.00000 1.00000
 15   hdd  4.00000         osd.15                             up  1.00000 1.00000
-11        8.00000     host ns-storage-020102.vclound.com
 16        4.00000         osd.16                             up  1.00000 1.00000
 17        4.00000         osd.17                             up  1.00000 1.00000
 -1       47.63620 root default
 -2       15.63620     host ns-storage-020100
  0   hdd  3.63620         osd.0                              up  1.00000 1.00000
  1   hdd  4.00000         osd.1                              up  1.00000 1.00000
  2   hdd  4.00000         osd.2                              up  1.00000 1.00000
  3   hdd  4.00000         osd.3                              up  1.00000 1.00000
 -3       16.00000     host ns-storage-020101
  4   hdd  4.00000         osd.4                              up  1.00000 1.00000
  5   hdd  4.00000         osd.5                              up  1.00000 1.00000
  6   hdd  4.00000         osd.6                              up  1.00000 1.00000
  7   hdd  4.00000         osd.7                              up  1.00000 1.00000
 -4       16.00000     host ns-storage-020102
  8   hdd  4.00000         osd.8                              up  1.00000 1.00000
  9   hdd  4.00000         osd.9                              up  1.00000 1.00000
 10   hdd  4.00000         osd.10                             up  1.00000 1.00000
 11   hdd  4.00000         osd.11                             up  1.00000 1.00000

ceph 自动完成修复

# ceph -s
  cluster:
    id:     7e720238-7ada-4922-ba2e-d9d9a49ac4e4
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ns-storage-020100,ns-storage-020101,ns-storage-020102
    mgr: ns-storage-020100(active), standbys: ns-storage-020101, ns-storage-020102
    osd: 18 osds: 18 up, 18 in

  data:
    pools:   3 pools, 1152 pgs
    objects: 250 objects, 631 MB
    usage:   40584 MB used, 66966 GB / 67006 GB avail
    pgs:     1152 active+clean

  io:
    recovery: 341 kB/s, 0 objects/s
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Terry_Tsang

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值