【ceph】pg状态不正常，Degraded data redundancy: 460/77222938 objects degraded (0.001%), 11 pgs degraded

最新推荐文章于 2024-03-04 15:50:04 发布

原创最新推荐文章于 2024-03-04 15:50:04 发布 · 2.3k 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#ceph

ceph 专栏收录该内容

41 篇文章

订阅专栏

文章讨论了Ceph集群的健康警告，特别是pg（PlacementGroup）状态异常，如stuckundersized，提示数据冗余度降低。文章建议检查OSD健康、存储池状态、副本数配置，并提供数据再平衡操作等解决方案。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

本站以分享各种运维经验和运维所需要的技能为主

《python零基础入门》：python零基础入门学习

《python运维脚本》： python运维脚本实践

《shell》：shell学习

《terraform》持续更新中：terraform_Aws学习零基础入门到最佳实战

《k8》暂未更新

《docker学习》暂未更新

《ceph学习》ceph日常问题解决分享

《日志收集》ELK+各种中间件

《运维日常》运维日常

《linux》运维面试100问

pg状态不正常

            Degraded data redundancy: 460/77222938 objects degraded (0.001%), 11 pgs degraded, 20 pgs undersized

    pg 1.9 is stuck undersized for 67505.094338, current state active+undersized+degraded, last acting [95,60]

    pg 1.a is stuck undersized for 67505.092252, current state active+undersized, last acting [125,26]

    pg 1.b is stuck undersized for 67505.096730, current state active+undersized, last acting [25,125]

    pg 1.c is stuck undersized for 67505.098017, current state active+undersized, last acting [79,166]

    pg 1.e is stuck undersized for 67505.096981, current state active+undersized+degraded, last acting [97,63]

    pg 2.1 is stuck undersized for 66585.680482, current state active+undersized+degraded, last acting [167,74]

    pg 2.7 is stuck undersized for 64373.451329, current state active+undersized+degraded, last acting [119,13]

    pg 2.8 is stuck undersized for 66579.748265, current state active+undersized+degraded, last acting [141,7]

    pg 2.a is stuck undersized for 63851.308379, current state active+undersized+degraded, last acting [85,0]

    pg 2.d is stuck undersized for 63752.075326, current state active+undersized+degraded, last acting [51,115]

    pg 3.a is stuck undersized for 63856.354608, current state active+undersized+degraded, last acting [119,27]

    pg 4.1 is stuck undersized for 67505.096880, current state active+undersized, last acting [30,87]

    pg 4.2 is stuck undersized for 67505.061049, current state active+undersized, last acting [158,74]

    pg 4.6 is stuck undersized for 67505.081249, current state active+undersized, last acting [111,22]

    pg 4.7 is stuck undersized for 67505.093693, current state active+undersized, last acting [95,17]

    pg 4.8 is stuck undersized for 67505.092161, current state active+undersized, last acting [70,150]

    pg 4.9 is stuck undersized for 67505.095216, current state active+undersized+degraded, last acting [107,17]

    pg 4.c is stuck undersized for 67505.097948, current state active+undersized, last acting [127,30]

    pg 4.d is stuck undersized for 67505.086830, current state active+undersized+degraded, last acting [123,2]

    pg 4.f is stuck undersized for 67505.097398, current state active+undersized+degraded, last acting [110,66]

根据信息， Ceph 集群的健康状态显示为 HEALTH_WARN，并且存在数据冗余度降低的警告。具体的警告信息是：460/76950244 个对象降级（0.001%），11 个降级的 PG（Placement Group），20 个大小不足的 PG。

警告信息中提到了一些 pg（Placement Group）被标记为 "stuck undersized"，表示这些 PG 处于大小不足的状态，并且已经持续了一段时间。每个 PG 的状态描述了它们的活动状态、大小和降级情况，以及最后一次执行操作的 OSD（Object Storage Daemon）节点。

这些警告表明你的 Ceph 集群中的一些 PG 处于不完整的状态，可能由于某些原因导致了数据冗余度的降低。这可能会影响数据的可靠性和性能。为了解决这个问题，你可以采取以下步骤：

检查集群的 OSD 健康状态：运行 ceph osd tree 命令，检查 OSD 的状态是否正常，是否有 OSD 处于故障或离线状态。

检查集群的存储池（Pool）状态：运行 ceph osd pool stats 命令，检查存储池的状态是否正常，是否有存储池达到了容量限制。

检查集群的副本数和故障域设置：确保你的存储池设置了足够的副本数，并且故障域设置正确。这样可以确保数据在集群中的多个位置进行复制，提高数据的冗余度和可靠性。

执行数据再平衡操作：运行 ceph pg repair 命令来触发数据再平衡操作，以恢复 PG 的正常状态。这将重新分布数据并恢复大小不足的 PG。

监控和调整集群性能：确保你的 Ceph 集群具备足够的计算、存储和网络资源，以满足负载需求。监控集群的性能指标，并根据需要进行调整和优化。