Oracle Exadata换盘操作-Replacing a Hard Disk Proactively

最新推荐文章于 2024-03-20 15:56:33 发布

文档搬运工

最新推荐文章于 2024-03-20 15:56:33 发布

阅读量1.5k

点赞数

CC 4.0 BY-SA版权

分类专栏： exadata 文章标签： oracle

本文链接：https://blog.csdn.net/xxzhaobb/article/details/126389951

exadata 专栏收录该内容

10 篇文章

订阅专栏

本文档详细记录了在生产环境中如何主动更换Exadata存储服务器硬盘的过程，包括确认物理硬盘、LUN、Celldisk和Griddisk，使用CellCLI和ASM命令进行操作，以及处理意外情况如更换错误或硬盘被拒绝的解决方法。在ASM层面安全地删除Griddisk，并执行rebalance操作，确保数据冗余不受影响。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

本操作已经在生产环境中实施（cell节点），记录操作过程（大概过程，部分命令为docs文档命令，部分为实际操作命令）。

参考文档：

Maintaining Oracle Exadata Storage Servers

3.3.6 Replacing a Hard Disk Proactively

How to Replace a Hard Drive in an Exadata Storage Cell Server (Hard Failure) (Doc ID 1386147.1)
How to Replace a Hard Drive in an Exadata Storage Cell Server (Predictive Failure) (Doc ID 1390836.1)
决定在什么时候应该更换Exadata服务器上的硬盘 (Doc ID 2661785.1)

Exadata ALTER PHYSICALDISK N:N DROP FOR REPLACEMENT is hung (Doc ID 2574663.1)

Exadata Storage software has a complete set of automated operations for hard disk maintenance, when a hard disk has failed or has been flagged as a problematic disk. But there are situations where a hard disk has to be removed proactively from the configuration.

In the CellCLI ALTER PHYSICALDISK command, the drop for replacement option checks if a normal functioning hard disk can be removed safely without the risk of data lost. However, after the execution of the command, the grid disks on the hard disk are inactivated on the storage cell and set to offline in the Oracle ASM disk groups.

The redundancy of the disk group is compromised until the hard disk has been replaced or re-enabled, and the subsequent rebalance completes. This is especially important for disk groups using normal redundancy.

To reduce the risk of having a disk group without full redundancy and proactively replace a hard disk, follow this procedure:

确认物理硬盘，关联的LUN、celldisk、griddisk

# cellcli –e "list diskmap" | grep 'X:Y'

结果类似下面：

 20:5            KEBTDJ          5                       normal  559G           
    CD_05_exaceladm01    /dev/sdf                
    "DATAC1_CD_05_exaceladm01, DBFS_DG_CD_05_exaceladm01, 
     RECOC1_CD_05_exaceladm01"

查看LUN的信息

CellCLI> list lun where deviceName='/dev/sdf/'
         0_5     0_5     normal

在ASM层面drop掉griddisk

SQL> ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name;

等待完成reblance

SQL> select * from v$asm_operation;

drop磁盘for replace

CellCLI> alter physicaldisk 20:4 serviceled on  -- 之前的方法，点亮灯，已经被淘汰，无法使用
ALTER PHYSICALDISK 20:4 DROP FOR REPLACEMENT;   -- 是使用这个命令，但是会hung住，具体解决方法参考前面的参考文档

执行完毕上面的drop for replace后，存储cell上，硬盘的灯会变成蓝色。（Cell上有个HDD MAP，可以看硬盘在那个插槽，为了确保准确，还是将该硬盘的灯点亮）

替换硬盘，拔掉硬盘，官方文档建议等待3分钟后插入硬盘（实际操作，没有等待3分钟）

查看LUN、celldisk、griddisk信息

CellCLI> list lun lun_name
CellCLI> list celldisk where lun=lun_name
CellCLI> list griddisk where celldisk=celldisk_name

确认磁盘已经加入到ASM中，以下查询会返回0. 如果没有加入，则需要手工加入，一般情况，LUN、Celldisk、griddisk会自动创建（在cell的alertlog中可以看到）。

SQL> SELECT path,header_status FROM v$asm_disk WHERE group_number=0;

手工加入磁盘到ASM

alter diskgroup DATA_ABC add disk 'o/192.168.0.1/DATA_ABC_CD_04_abccel02' rebalance power 4;     
alter diskgroup RECO_ABC add disk 'o/192.168.0.1/RECO_ABC_CD_04_abccel02' rebalance power 4;     
alter diskgroup DBFS_DG  add disk 'o/192.168.0.1/DBFS_DG_CD_04_abccel02'  rebalance power 4;

查看reblance。完工。

补充：如果拔错盘了。怎么处理，再插进去。官方文档有说明

3.3.9 Removing and Replacing the Same Hard Disk

What happens if you accidentally remove the wrong hard disk?

If you inadvertently remove the wrong hard disk, then put the disk back. It will automatically be added back in the Oracle ASM disk group, and its data is resynchronized.

如果盘插入到了错误的插槽，被reject了，怎么处理，官方文档，re-enable

3.3.10 Re-Enabling a Hard Disk That Was Rejected

If a physical disk was rejected because it was inserted into the wrong slot, you can re-enable the disk.

Run the following command:

Caution:

The following command removes all data on the physical disk.

CellCLI> ALTER PHYSICALDISK hard_disk_name reenable force

The following is an example of the output from the command:

Physical disk 20:0 was reenabled.

END