0% found this document useful (0 votes)
341 views153 pages

OceanStor V500R007 Troubleshooting

Troubleshooting
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
341 views153 pages

OceanStor V500R007 Troubleshooting

Troubleshooting
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

OceanStor

V500R007

Troubleshooting

Issue 17
Date 2021-09-15

HUAWEI TECHNOLOGIES CO., LTD.


Copyright © Huawei Technologies Co., Ltd. 2021. All rights reserved.
No part of this document may be reproduced or transmitted in any form or by any means without prior
written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.

Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees
or representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.


Address: Huawei Industrial Base
Bantian, Longgang
Shenzhen 518129
People's Republic of China

Website: https://e.huawei.com

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. i


OceanStor
Troubleshooting About This Document

About This Document

Purpose
This document describes the complete process of troubleshooting from the
following aspects: safety precautions, troubleshooting preparations, fault diagnosis
principles and methods, troubleshooting procedure, and methods of
troubleshooting common faults.
The following table lists the product models.

Product Series Product Model Product Version

OceanStor OceanStor 2200 V5 V500R007C73 Kunpeng


2000 V5 series
OceanStor 2600 V5 V500R007C71 Kunpeng

OceanStor OceanStor 5300 V5, 5500 V5, V500R007C00


5000 V5 series 5600 V5, and 5800 V5 V500R007C10
V500R007C20
V500R007C30
V500R007C50
V500R007C60
V500R007C61

OceanStor 5300 V5, 5500 V5, V500R007C60 Kunpeng


5600 V5, and 5800 V5 V500R007C70 Kunpeng
V500R007C71 Kunpeng

OceanStor OceanStor 6800 V5 V500R007C00


6000 V5 series V500R007C10
V500R007C20
V500R007C30
V500R007C50
V500R007C60
V500R007C61

OceanStor 6800 V5 V500R007C60 Kunpeng


V500R007C70 Kunpeng
V500R007C71 Kunpeng

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. ii


OceanStor
Troubleshooting About This Document

Product Series Product Model Product Version

OceanStor OceanStor 18500 V5 and 18800 V500R007C00


18000 V5 V5 V500R007C10
series
V500R007C20
V500R007C30
V500R007C50
V500R007C60
V500R007C61

OceanStor 18500 V5 and 18800 V500R007C60 Kunpeng


V5 V500R007C70 Kunpeng
V500R007C71 Kunpeng

OceanStor OceanStor 5300F V5, 5500F V5, V500R007C00


5000F V5 5600F V5, and 5800F V5 V500R007C10
series
V500R007C20
V500R007C30
V500R007C50
V500R007C60
V500R007C61

OceanStor 5300F V5, 5500F V5, V500R007C60 Kunpeng


5600F V5, and 5800F V5 V500R007C70 Kunpeng
V500R007C71 Kunpeng

OceanStor OceanStor 6800F V5 V500R007C00


6000F V5 V500R007C10
series
V500R007C20
V500R007C30
V500R007C50
V500R007C60
V500R007C61

OceanStor 6800F V5 V500R007C60 Kunpeng


V500R007C70 Kunpeng
V500R007C71 Kunpeng

OceanStor OceanStor 18500F V5 and V500R007C00


18000F V5 18800F V5 V500R007C10
series
V500R007C20
V500R007C30
V500R007C50
V500R007C60
V500R007C61

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. iii


OceanStor
Troubleshooting About This Document

Product Series Product Model Product Version

OceanStor 18500F V5 and V500R007C60 Kunpeng


18800F V5 V500R007C70 Kunpeng
V500R007C71 Kunpeng

OceanStor OceanStor 5110 V5 V500R007C30


5x10 V5 series V500R007C60
V500R007C61

OceanStor OceanStor 5110F V5 V500R007C30


5x10F V5 series V500R007C60
V500R007C61

Intended Audience
This document is intended for:

● Technical support engineers


● Maintenance engineers

Symbol Conventions
The symbols that may be found in this document are defined as follows.

Symbol Conventions

Symbol Description

Indicates a hazard with a high level of risk which,


if not avoided, will result in death or serious
injury.

Indicates a hazard with a medium level of risk


which, if not avoided, could result in death or
serious injury.

Indicates a hazard with a low level of risk which,


if not avoided, could result in minor or moderate
injury.

Indicates a potentially hazardous situation which,


if not avoided, could result in equipment damage,
data loss, performance deterioration, or
unanticipated results.
NOTICE is used to address practices not related to
personal injury.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. iv


OceanStor
Troubleshooting About This Document

Symbol Description

Supplements the important information in the


main text.
NOTE is used to address information not related
to personal injury, equipment damage, and
environment deterioration.

Change History
Updates between document issues are cumulative. Therefore, the latest document
issue contains all updates made in previous issues.

Issue 17 (2021-09-15)
This issue is the seventeenth official release.

Optimized some descriptions.

Issue 16 (2021-06-30)
This issue is the sixteenth official release.

Optimized some descriptions.

Issue 15 (2021-01-30)
This issue is the fifteenth official release.

Modified some description.

Issue 14 (2020-11-30)
This issue is the fourteenth official release, which incorporates the following
changes:

Added product model for 2600 V5.

Issue 13 (2020-04-10)
This issue is the thirteenth official release.

Modified some description.

Issue 12 (2020-07-15)
This issue is the twelfth official release.

Modified some description.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. v


OceanStor
Troubleshooting About This Document

Issue 11 (2020-04-10)
This issue is the eleventh official release.
Modified some description.

Issue 10 (2019-12-30)
This issue is the tenth official release.
Modified some description.

Issue 09 (2019-10-30)
This issue is the ninth official release.
Added product models for 6800 V5, 6800F V5, 18500 V5, 18800 V5, 18500F V5,
and 18800F V5 V500R007C60 Kunpeng.

Issue 08 (2019-08-30)
This issue is the eighth official release.
Added product models for OceanStor 5300 V5, 5500 V5, 5600 V5, and 5800 V5
V500R007C60 Kunpeng.
Added product models for OceanStor 5300F V5, 5500F V5, 5600F V5, and 5800F
V5 V500R007C60 Kunpeng.

Issue 07 (2019-06-30)
This issue is the seventh official release.
Modified some description.

Issue 06 (2019-05-15)
This issue is the sixth official release.
Added OceanStor 5110F V5 to product models.

Issue 05 (2019-03-30)
This issue is the fifth official release.
Added OceanStor 5110 V5 to product models.

Issue 04 (2018-12-06)
This issue is the fourth official release.
Modified some description.

Issue 03 (2018-07-30)
This issue is the third official release.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. vi


OceanStor
Troubleshooting About This Document

Modified some description.

Issue 02 (2018-01-30)
This issue is the second official release.
Modified some description.

Issue 01 (2017-11-30)
This issue is the first official release.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. vii


OceanStor
Troubleshooting Contents

Contents

About This Document................................................................................................................ ii


1 Safety Operation Guide......................................................................................................... 1
1.1 Alarm and Safety Symbols................................................................................................................................................... 1
1.2 Safety Precautions for ESD Protection............................................................................................................................. 2
1.3 Safety Precautions for Laser Protection...........................................................................................................................2
1.4 Safety Precautions for Using Fibers.................................................................................................................................. 3
1.5 Safety Precautions for Using Power Cables (Applicable to Japan)........................................................................ 4
1.6 Safety Precautions for Short Circuit Protection............................................................................................................ 4
1.7 Safety Precautions for Operating Equipment................................................................................................................ 4
1.8 Safety Precautions for Condensation Prevention......................................................................................................... 5

2 Overview....................................................................................................................................6
2.1 Requirements for Maintenance Engineers......................................................................................................................6
2.2 Tools and Meters..................................................................................................................................................................... 7
2.3 Spare Parts................................................................................................................................................................................. 8
2.4 Fault Levels................................................................................................................................................................................ 8
2.5 Fault Categories....................................................................................................................................................................... 9
2.6 Troubleshooting Categories...............................................................................................................................................10
2.6.1 Common Troubleshooting.............................................................................................................................................. 10
2.6.2 Emergency Handling........................................................................................................................................................ 13
2.7 Basic Principles....................................................................................................................................................................... 15
2.8 Troubleshooting Methods.................................................................................................................................................. 16
2.8.1 Alarm Analysis Method................................................................................................................................................... 16
2.8.2 Replacement Method....................................................................................................................................................... 16

3 Collecting Information and Reporting a Fault...............................................................17


3.1 Collecting Live Network Information............................................................................................................................. 17
3.2 Collecting Fault Information............................................................................................................................................. 19
3.2.1 (Optional) Checking the Host Information.............................................................................................................. 19
3.2.2 (Optional) Collecting File System Fault Information............................................................................................ 23
3.2.3 (Optional) Collecting Volume Management Fault Information....................................................................... 24
3.2.4 (Optional) Collecting Database Fault Information................................................................................................25
3.2.5 (Optional) Collecting the HBA Information............................................................................................................. 26
3.2.6 Collecting Switch Information.......................................................................................................................................29

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. viii


OceanStor
Troubleshooting Contents

3.2.6.1 Collecting the Ethernet Switch Information......................................................................................................... 29


3.2.6.2 Collecting Fibre Channel Switch Information.......................................................................................................30
3.2.6.3 Checking the SFP Information................................................................................................................................... 32
3.2.7 Collecting Storage System Fault Information.......................................................................................................... 33
3.2.7.1 Exporting System Data.................................................................................................................................................34
3.2.7.2 Managing All Events..................................................................................................................................................... 36
3.2.7.3 Managing Current Alarms.......................................................................................................................................... 37
3.2.7.4 Collecting Fault Information About a Storage System in Abnormal Mode...............................................38

4 Common Troubleshooting................................................................................................... 39
4.1 Troubleshooting Management Software Faults......................................................................................................... 39
4.1.1 Failure to Synchronize the Client Time Zone on the DeviceManager Due to the Browser Obtaining
System Time Zone Mechanism............................................................................................................................................... 39
4.1.2 Failure to Log In to the DeviceManager by Entering an IPv6 Address in the Address Box of a Browser
Earlier Than Firefox 24.0............................................................................................................................................................ 40
4.1.3 Browser SSL Information Is Damaged....................................................................................................................... 41
4.1.4 Alarm Sound Cannot Be Played on DeviceManager............................................................................................. 43
4.1.5 DeviceManager Has an Interface Input Exception................................................................................................ 45
4.1.6 After the DeviceManager Is Upgraded, a Picture Layout or Display Fault Occurs.....................................47
4.1.7 Current Alarms or All Events Exported from Internet Explorer 9 Are Deleted.............................................48
4.1.8 An Exception Occurs When a User Logs In to DeviceManager in Internet Explorer 10........................... 50
4.1.9 Failure to Log In to the DeviceManager Using a Firefox Web Browser......................................................... 50
4.1.10 Slow Loading of SystemReporter on the Chrome Web Browser.................................................................... 52
4.1.11 The DeviceManager Page Fails to Be Loaded or Is Displayed Incorrectly...................................................52
4.1.12 Timeout Occurs During Storage System Configuration.................................................................................... 53
4.1.13 Failed to Import the Configuration File.................................................................................................................. 54
4.1.14 Failed to Access DeviceManager and SystemReporter Using Internet Explorer....................................... 54
4.1.15 The System Responds Slowly When Excessive Maintenance Terminals Are Used to Connect to a
Storage System Simultaneously.............................................................................................................................................. 56
4.1.16 OceanStor DeviceManager Cannot Be Accessed Correctly Due To Boot Disk or Coffer Disk Faults
............................................................................................................................................................................................................ 56
4.1.17 Antivirus Scanning Failure............................................................................................................................................57
4.1.18 Failed to Install Adobe Flash Player on Windows Server 2012.......................................................................59
4.1.19 Failing to Log In to DeviceManager Using a Browser Again After a Timeout.......................................... 62
4.1.20 What Can I Do If the Alarm Sound and Quick Start of DeviceManager Do Not Function Properly
on Chrome Later Than 55?....................................................................................................................................................... 62
4.1.21 Periodically Updated Data Is Abnormal on DeviceManager........................................................................... 65
4.1.22 Failed to Deselect Items When Display Items Are Customized in SystemReporter.................................66
4.2 Troubleshooting Basic Storage Service Faults............................................................................................................. 67
4.2.1 Login Failure Through a Serial Port............................................................................................................................ 67
4.2.2 Failure to Add an iSCSI Link for a Remote Device................................................................................................. 70
4.2.3 Failure to Discover LUNs by an Application Server............................................................................................... 71
4.2.4 LUN Deletion Timeout..................................................................................................................................................... 76
4.2.5 Failure to Connect the Storage System to an AIX-Based Application Server for the First Time............77

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. ix


OceanStor
Troubleshooting Contents

4.2.6 The Storage System Does Not Detect the Initiators Provided by an HP-UX Server...................................78

5 Emergency Handling............................................................................................................ 80
5.1 Emergency Handling Of Hardware Module Faults (Applicable to V500R007)............................................... 80
5.1.1 Controller Failure............................................................................................................................................................... 80
5.1.2 Disk Failure.......................................................................................................................................................................... 83
5.1.3 Interface Module Failure................................................................................................................................................. 86
5.1.4 Expansion Module Failure.............................................................................................................................................. 88
5.1.5 Fan Module Failure........................................................................................................................................................... 90
5.1.6 BBU Failure.......................................................................................................................................................................... 93
5.1.7 Power Module Failure...................................................................................................................................................... 95
5.2 Emergency Handling Of Multipathing Software Faults.......................................................................................... 98
5.2.1 Failure to Load Multipathing Software on an Application Server....................................................................98
5.2.2 Blue Screen of Death When Multipathing Software Is Being Installed on a Windows-Based
Application Server..................................................................................................................................................................... 101
5.2.3 Controller Failure in a Non-UltraPath Environment........................................................................................... 103
5.2.4 UltraPath Software Unavailable Because Being Isolated by Antivirus Software...................................... 105
5.2.5 Failure to Detect Virtual Disks on a Windows-Based Application Server After the Multipathing
Function of Microsoft iSCSI Initiator Is Enabled............................................................................................................. 106
5.3 Emergency Handling Of Basic Storage Service Faults........................................................................................... 109
5.3.1 Fibre Channel Link Failure........................................................................................................................................... 109
5.3.2 iSCSI Link Failure............................................................................................................................................................. 113
5.3.3 Failure to Log In to a Storage System After CHAP Authentication Is Disabled........................................ 115
5.3.4 Operations in an NFS Share Are Suspended......................................................................................................... 116
5.4 Emergency Handling Of Value-added Service......................................................................................................... 117
5.4.1 Failure to Delete a Tenant Administrator on the DeviceManager.................................................................118
5.4.2 After the Local Huawei Storage System Is Powered Off Unexpectedly, the File System Created Based
on the eDevLUN Is Lost........................................................................................................................................................... 118
5.4.3 Status of a Remote Replication Consistency Group Is Invalid......................................................................... 120
5.4.4 Interrupted Secondary LUN in a Clone....................................................................................................................122
5.4.5 A Mirrored LUN Malfunctions.................................................................................................................................... 124
5.4.6 The Storage System Is Powered Off During NDMP-based Backup or Restore..........................................125
5.4.7 NetBackup Authentication Fails After a Storage System Restarts.................................................................126
5.4.8 A Message Indicating Expired Password Is Displayed When a Client Is Using a CIFS Share................127
5.5 Emergency Handling Of Other Faults......................................................................................................................... 128
5.5.1 A Storage Pool Loses Efficacy..................................................................................................................................... 128
5.5.2 File System Corrupted Due to I/O Processing Timeout..................................................................................... 130
5.5.3 Server syslog-ng Did Not Receive Some Alarm Notifications......................................................................... 134

A How to Obtain Help...........................................................................................................137


A.1 Preparations for Contacting Huawei........................................................................................................................... 137
A.1.1 Collecting Troubleshooting Information................................................................................................................. 137
A.1.2 Making Debugging Preparations.............................................................................................................................. 137
A.2 How to Use the Document............................................................................................................................................. 138

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. x


OceanStor
Troubleshooting Contents

A.3 How to Obtain Help from Website..............................................................................................................................138


A.4 Ways to Contact Huawei................................................................................................................................................. 138

B Glossary................................................................................................................................. 139
C Acronyms and Abbreviations........................................................................................... 140

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. xi


OceanStor
Troubleshooting 1 Safety Operation Guide

1 Safety Operation Guide

This chapter provides guidelines for safety operations during activities such as
installation, maintenance, and troubleshooting. The guidelines consist of safety
regulations for both personnel and equipment. You must follow these guidelines
to avoid personal injury and equipment damage.
1.1 Alarm and Safety Symbols
1.2 Safety Precautions for ESD Protection
1.3 Safety Precautions for Laser Protection
1.4 Safety Precautions for Using Fibers
1.5 Safety Precautions for Using Power Cables (Applicable to Japan)
1.6 Safety Precautions for Short Circuit Protection
1.7 Safety Precautions for Operating Equipment
1.8 Safety Precautions for Condensation Prevention

1.1 Alarm and Safety Symbols


When installing or maintaining equipment, follow the precautions indicated by
alarm and safety symbols to prevent personal injury and equipment damage.
Table 1-1 lists the alarm and safety symbols labeled on equipment.

Table 1-1 Alarm and safety symbols labeled on equipment


Symbol Description

ESD Protection Symbol


Indicates that you must wear an electrostatic discharge
(ESD) wrist strap or glove to avoid personal injury and
equipment damage caused by electrostatic discharge.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 1


OceanStor
Troubleshooting 1 Safety Operation Guide

Symbol Description

Cabinet Grounding Symbol


Indicates the position of a grounding point.

Disk Swap and Install Warning Symbol


Indicates that you should be cautious when you swap,
install, or pull out a disk.

1.2 Safety Precautions for ESD Protection


When installing or maintaining the equipment, follow the ESD safety precautions
to prevent personal injury and equipment damage.

indicates an electrostatic sensitive area. To prevent personal injury and


equipment damage when operating equipment in this area, wear an ESD wrist
strap, ESD clothing, or ESD gloves. Note the following:

● Do not wear an ESD wrist strap when powering on the equipment to prevent
an electric shock.
● Do not touch devices with bare hands to prevent damage to the electrostatic
sensitive devices (ESSDs) on the circuit boards.
● The electronic line is prone to electrostatic damage. Wear an ESD wrist strap,
ESD gloves, and ESD clothing when handling disks, especially bare disks. Hold
a disk by its edge.
● Since an ESD wrist strap only prevents static electricity from the body, the ESD
clothing is required to prevent static electricity from clothes.
● Before installing or replacing devices, wear an ESD wrist strap, ESD gloves,
and ESD clothing to protect you and the equipment from static electricity.
● Use special ESD bags to carry or transport device components.

1.3 Safety Precautions for Laser Protection


When installing or maintaining equipment, follow the laser safety precautions to
ensure the safety of personnel and equipment.

Laser safety risks include:

● Personal injury
● Equipment damage

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 2


OceanStor
Troubleshooting 1 Safety Operation Guide

Personal Injury

DANGER

The laser emitted by an optical module is an invisible infrared ray, which may
cause permanent eye injury. Do not look into the optical module during device
maintenance.

Equipment Damage
To prevent equipment damage when you handle the equipment, follow these
precautions:

● When not in use, the optical interfaces on the equipment and fiber connectors
on fiber jumpers must be covered with dust-proof caps.
● After removing a fiber jumper that connects to an optical interface on the
equipment, cover the optical interface and the fiber jumper connector with
dust-proof caps.
● When performing a hardware loopback test by connecting a fiber jumper to
an optical interface, add an attenuator to prevent the risk of damage to the
optical module caused by excessively strong optical power.
● When using the Optical Time Domain Reflectometer (OTDR), disconnect the
fiber jumper between the peer equipment and the local equipment to avoid
damage to the optical module caused by excessively strong optical power.
● Unless necessary, do not remove or insert the modules connecting to fibers.

1.4 Safety Precautions for Using Fibers


Use fibers in a safe and correct manner to ensure proper operation of the
equipment and prevent personal injury and equipment damage.

DANGER

The laser beam on an optical interface board or from a fiber may cause eye injury.
Do not look into optical interfaces or fiber connectors during installation and
maintenance.

Cleaning Fiber Connectors and Optical Interfaces


Use special cleaning tools and materials to clean fiber connectors and optical
interfaces. Common tools and materials are as follows:

● Special cleaning solvent (Isoamylol is preferred, propyl alcohol is the next


option, however ethanol and formalin are forbidden.)
● Non-woven lens tissue
● Dedicated compressed gas

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 3


OceanStor
Troubleshooting 1 Safety Operation Guide

● Cotton stick (medical or long fiber cotton)


● Special cleaning roll (used with the preceding special cleaning solvent)
● Magnifier for optical connectors

Replacing Fibers
Use dust-proof caps to cap the connectors of the fibers that are not in use.

1.5 Safety Precautions for Using Power Cables


(Applicable to Japan)
Using power cables safely and correctly can prevent personal injury and ensure
proper device running.

1.6 Safety Precautions for Short Circuit Protection


When installing or maintaining equipment, follow the regulations on operating
tools to avoid short circuits.

NOTICE

● Do not place tools on air intake boards of cabinets. Otherwise, a short circuit
may occur.
● Do not drop screws into a cabinet or the equipment. Otherwise, a short circuit
may occur.

1.7 Safety Precautions for Operating Equipment


When installing and maintaining devices, follow the electrical safety precautions
to prevent personal injury and equipment damage.

Power-on and Power-off

DANGER

● Before checking device installation and cable connections, ensure that the
system power supply is switched off. Otherwise, incorrect or loose cable
connections may result in personal injury or equipment damage.
● Do not wear an ESD wrist strap when powering on the equipment to prevent
an electric shock.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 4


OceanStor
Troubleshooting 1 Safety Operation Guide

● Do not remove or insert cables and field replaceable units (FRUs) during a
system startup. Otherwise, data loss may occur.
● After you switch off the power supply, wait at least one minute before
switching it back on.
● To avoid disk damage and data loss, do not switch the power supply off while
any disk running indicators are still blinking.

Troubleshooting

DANGER

● Do not touch the connectors of power cables or communication cables.


Otherwise, an electric shock may occur.
● Do not touch devices with bare hands in electrostatic sensitive areas. Wear an
ESD wrist strap, ESD gloves, or ESD clothing to prevent personal injury and
equipment damage.

When you perform troubleshooting, follow these precautions:


● Do not perform troubleshooting during a thunderstorm.
● Ensure that power cables are intact and the grounding measures are safe and
effective.
● Keep the troubleshooting area clean and dry.

1.8 Safety Precautions for Condensation Prevention


Before installing the equipment, ensure that no condensation is on the equipment.
Otherwise, the equipment may fail to be powered on.
If the indoor and outdoor temperature difference is 15°C or more, wait eight hours
after moving devices to the equipment room and before installing them.

NOTE

If the temperature difference cannot be determined, wait one night after moving devices to
the equipment room and then install them.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 5


OceanStor
Troubleshooting 2 Overview

2 Overview

This chapter describes the necessary qualifications of maintenance engineers, tools


and spare parts that should be on hand, troubleshooting process, and principles
and methods for fault locating.
2.1 Requirements for Maintenance Engineers
2.2 Tools and Meters
2.3 Spare Parts
2.4 Fault Levels
2.5 Fault Categories
2.6 Troubleshooting Categories
2.7 Basic Principles
2.8 Troubleshooting Methods

2.1 Requirements for Maintenance Engineers


This section describes the requirements for maintenance engineers in terms of
professional knowledge and skills.

Professional Knowledge
Maintenance engineers should be familiar with:

● Storage technologies, such as direct-attached storage (DAS), network-


attached storage (NAS), and storage area network (SAN)
● Ethernet technologies
● TCP/IP protocol

Basic Skills
Maintenance engineers should be able to operate:

● Storage devices

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 6


OceanStor
Troubleshooting 2 Overview

● Application servers
● Data transmission devices, such as Ethernet switches, Fibre Channel switches,
and routers

Test Meters
Maintenance engineers should be able to use:

● Optical power meters


● Line testers
● Multimeters

Storage Networking
Maintenance engineers should have knowledge of:

● Basic storage networking modes


● Onsite networking modes
● Running status of onsite devices

Onsite Data Collection


Onsite data is collected periodically and when a fault occurs. Before
troubleshooting a fault, maintenance engineers are required to collect and save
onsite data.

2.2 Tools and Meters


This section describes the list of tools and meters.

Table 2-1 lists the tools and meters required for troubleshooting.

Table 2-1 Tools and meters required for troubleshooting

Tool or Meter Used to

Optical power Measure optical power.


meter

Fiber patch cord Replace a faulty optical fiber.

Binding strap Bind faulty optical fibers.

Network cable Replace a faulty network cable.

Line tester Measure the connectivity of Ethernet cables.

Multimeter Test electrical parameters.

ESD wrist strap Protect electrostatic sensitive components against damage


caused by electrostatic discharges from human bodies.

ESD bag Protect components from electrostatic discharges.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 7


OceanStor
Troubleshooting 2 Overview

Tool or Meter Used to

ESD clothes Protect electrostatic sensitive components against damage


caused by electrostatic discharges from human body and
clothes.

Phillips Loosen or fasten screws.


screwdriver

Label Mark devices or cables.

2.3 Spare Parts


This section lists the spare parts that should be prepared by engineers.

After learning the general conditions of the site, engineers may need to bring the
following spare parts to the site:

● Controller
● Backup Battery Unit (BBU)
● Fan module
● Power module
● Expansion module
● Interface module
● Disk
● Optical transceiver
● Fiber patch cord
● Shielded twisted-pair cable
● Serial cable

2.4 Fault Levels


Storage system faults are classified into minor faults, major faults, and critical
faults in terms of fault impact.

Table 2-2 describes storage system fault levels and characteristics.

Table 2-2 Storage system fault levels

Fault Characteristic Handling


Level Measure

Minor The faults have no adverse impact on storage system Common


performance and host services, such as failure to log troubleshootin
in to OceanStor DeviceManager, failure to log in to or g
activate a serial port.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 8


OceanStor
Troubleshooting 2 Overview

Fault Characteristic Handling


Level Measure

Major Host I/Os are delayed or host services are interrupted Emergency
because storage system performance deteriorates, handling
such as interface module faults, failure of an
application server to load the multipathing software
after the server is restarted.

Critica Some production systems are unavailable and the risk Emergency
l of some data loss is high or some data has been lost, handling
such as controller faults.

2.5 Fault Categories


Faults can be divided in to storage faults and environment faults in terms of fault
occurrence locations.
● Storage fault: storage system fault caused by hardware or software. The fault
information can be obtained using the alarm platform of the storage system.
● Environment fault: software or hardware fault occurs when data is transferred
from the host to the storage system over a network. Such faults are caused by
network links. The fault information can be obtained from operating system
logs, application program logs, and switch logs.
Table 2-3 describes common faults and symptoms.

Table 2-3 Common faults and symptoms


Category Fault Symptom

Storage Disk fault One or more disks are faulty.


fault
Power module A power module is faulty.
fault

BBU fault One or more BBUs are faulty.

Controller fault A controller is faulty.

Interface module An interface module is faulty.


fault

Fan fault A fan module is faulty.

Environmen HBA fault HBA hardware is faulty.


t fault
Optical fiber fault An HBA link is faulty.

Switch SFP fault A front-end link is down.

Switch SFP fault

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 9


OceanStor
Troubleshooting 2 Overview

Category Fault Symptom

Switch module The interface module of a switch is faulty.


fault

Switch fault A switch is broken down.

Intermittent link An intermittent link disconnection is caused


disconnection by an HBA, SFP, front-end device, or optical
fiber.

2.6 Troubleshooting Categories


Troubleshooting refers to measures taken when a fault occurs, including common
troubleshooting and emergency handling.

2.6.1 Common Troubleshooting


This section describes the definition and process of common troubleshooting.

Definition
Common troubleshooting refers to the troubleshooting of faults that have no
adverse impact on storage system performance and host services. The fault level is
minor.

Process
Figure 2-1 shows the troubleshooting flowchart.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 10


OceanStor
Troubleshooting 2 Overview

Figure 2-1 Troubleshooting flowchart

Troubleshooting

Observe the symptoms and


collect the information about the
fault.

Can the event be No Search for the troubleshooting


queried on the case based on the symptoms in
DeviceManager? Troubleshooting.

Yes

Handle the event by referring to Is the


the suggestion provided by the troubleshooting case No
DeviceManager or Event included in
Reference. Troubleshooting?

Yes

Troubleshoot the fault following


the procedure.

No Keep the fault environment intact


Is the fault rectified? and contact technical support
engineers.

Yes

End

Table 2-4 describes operations involved in common troubleshooting.

Table 2-4 Operations involved in troubleshooting

Operation Description

Collect fault information. If a fault occurs, collect information required for


troubleshooting so that the fault can be quickly
located and rectified. For details about
information to be collected, see 3.2 Collecting
Fault Information.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 11


OceanStor
Troubleshooting 2 Overview

Operation Description

Log in to OceanStor Log in to OceanStor DeviceManager to query


DeviceManager. the operating status of a storage system and
whether an alarm has been generated.

Handle the event by taking If the event information is queried in OceanStor


the recommended action. DeviceManager, handle the event. For details,
see the recommended action in OceanStor
DeviceManager or Event Reference of the
corresponding product model.

Locate the fault cause. Find out the exact cause of the fault from
multiple possible causes, using analyzing,
comparing, and other possible methods. For
details on common fault locating methods, see
2.8 Troubleshooting Methods.

Contact Huawei technical If you cannot rectify the fault, collect fault
support. information and contact Huawei technical
support.

Figure 2-2 shows the troubleshooting flowchart for environment faults.

Figure 2-2 Troubleshooting flowchart for environment faults

Start

Application log
Application Collecting host logs
layer
Operation system log

Checking link status


Network
layer

Collecting switch logs

Collecting storage
Storage layer
system fault information

End

Table 2-5 describes operations involved in environment fault troubleshooting.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 12


OceanStor
Troubleshooting 2 Overview

Table 2-5 Operations involved in environment fault troubleshooting


Operation Description

Collecting host logs Collecting host logs includes collecting application


logs and operating system logs.
● Application log: If an application works incorrectly,
you can locate and rectify the fault based on
alarm information.
● Operating system log: An alarm will be reported if
the operating system or host hardware is faulty.
You can rectify the fault based on operating
system logs.

Checking link status System faults will occur if network links are down. If
a system fault occurs, you need to check whether
cables are correctly connected and whether indicators
on ports to which cables are connected are normal.

Collecting switch logs You can check switch status and packet loss on ports
based on collected switch information. Then rectify
faults accordingly.

Collecting storage Alarms will be generated if software or hardware of a


system fault storage system is working incorrectly. You can rectify
information faults by taking recommended actions in the alarms.

2.6.2 Emergency Handling


This section describes the definition and process of emergency handling.

Definition
Emergency handling refers to the troubleshooting of system or device faults that
occur suddenly to resume operations and reduce loss without delay. The fault level
is major or critical.

Process
As a troubleshooting method, emergency handling must comply with the
following process:

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 13


OceanStor
Troubleshooting 2 Overview

Figure 2-3 Emergency handling process

Start

Initiate an emergency plan.

Collect fault information.

Inform Huawei of the fault.

Determine the fault impact.

Rectify the fault.

No Contact Huawei technical


Is the fault rectified?
support.

Yes
Check the fault rectification
result.

Record emergency handling


information.

End

Table 2-6 describes operations involved in emergency handling.

Table 2-6 Operations involved in emergency handling


Operation Description

Initiate the emergency An emergency plan is a preset action plan for


plan. accidents. It is initiated upon occurrence of major or
critical accidents.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 14


OceanStor
Troubleshooting 2 Overview

Operation Description

Collect fault If a fault occurs, collect information required for


information. troubleshooting so that the fault can be quickly
located and rectified. For details about information to
be collected, see 3.2 Collecting Fault Information.

Inform Huawei of the If a critical fault occurs, inform Huawei of the fault in
fault. the first place and send collected fault information to
Huawei.

Determine the fault If a fault occurs, determine the fault impact based on
impact. the fault symptom to reduce loss caused by the fault.

Rectify the fault. Rectify the fault and recover host services.

Contact Huawei If you cannot rectify the fault, collect fault information
technical support. and contact Huawei technical support.

Record emergency Record fault information and operations taken during


handling information. emergency handling.

2.7 Basic Principles


Basic fault locating principles help you exclude useless information and locate
faults.

During troubleshooting, observe the following principles:

● Analyze external factors first, and then internal factors. When locating faults,
consider the external factors first.
– External factor failures include failures in optical fibers, optical cables,
power supplies, and customers' devices.
– Internal factors include disks, controllers, and interface modules.
● Analyze the alarms of higher severities and then those of lower severities. The
alarm severity sequence from high to low is critical alarms, major alarms, and
warnings.
● Analyze common alarms and then uncommon alarms. When analyzing an
event, confirm whether it is an uncommon or common fault and then
determine its impact. Determine whether the fault occurred on only one
component or on multiple components.

To improve the emergency handling efficiency and reduce losses caused by


emergency faults, emergency handling must comply with the following principles:

● If a fault that may cause data loss occurs, stop host services or switch services
to the standby host, and back up the service data in time.
● During emergency handling, completely record all operations performed.
● Emergency handling personnel must participate dedicated training courses
and understand related technologies.
● Recover core services before recovering other services.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 15


OceanStor
Troubleshooting 2 Overview

2.8 Troubleshooting Methods


The common troubleshooting methods are alarm analysis and replacement.

2.8.1 Alarm Analysis Method


When the system becomes faulty, a lot of alarms are generated. By viewing the
alarms and analyzing performance data, you can determine the type and location
of the faults.

Application Scenario
The alarm analysis method is applicable to locating any faults if alarm
information can be collected.

Application Example
A video service was running properly but suddenly the quality deteriorated. At that
time, an alarm was reported on the management software. The alarm information
specified that a disk had failed. The disk was then replaced and the fault was
rectified.

Summary
The alarm analysis method can help maintenance engineers locate faults and can
be used with other fault locating methods.

2.8.2 Replacement Method


Replacing the component that is suspected to be faulty helps identify which
component is faulty. The component can be a network cable, controller, or
expansion module.

Application Scenario
The replacement method is applicable to locating hardware faults. This method
has no special requirement for maintenance engineers, but requires spare parts to
be prepared in advance.

Summary
The advantages of the replacement method are quick fault location and minimal
requirements for maintenance engineers.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 16


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

3 Collecting Information and Reporting a


Fault

After a fault occurs, collect the basic information, fault information, and storage
device information, and send it to maintenance engineers. This can help
maintenance engineers quickly locate and rectify the fault. Note that the
information collection operations described in this chapter must be authorized by
customers in advance.
3.1 Collecting Live Network Information
3.2 Collecting Fault Information

3.1 Collecting Live Network Information


The information to be collected includes the basic information, fault information,
storage device information, network information, and application server
information.

Collect the types of information specified in Table 3-1 and send the collected
information to maintenance engineers.

Table 3-1 Types of information to be collected

Information Item Action


Type

Basic Device serial Provide the serial number and version of the
information number and storage device.
version NOTE
You can log in to the DeviceManager and query the
serial number and version of the storage device in the
General area.

Customer Provide the customer's contact person and


information contact means.

Fault Occurrence Record the time when a fault occurs.


information time

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 17


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

Information Item Action


Type

Symptom Record the symptom when a fault occurs, for


example, an error dialog box or an event
notification.

Operations Record the operations performed before a fault


performed occurs.
before a fault
occurs

Operations Record operations that are performed before


performed reporting the fault to maintenance personnel.
after a fault
occurs

Storage Hardware Record the configuration of the hardware


device module modules in the storage device.
information configuration

Indicator Record status of the storage device indicators,


status especially the indicators in orange or red.
For details about indicator status, see the
Product Description of the corresponding product
model.

System data Manually export the operating data, and system


logs of the storage device.

Alarms and Manually export the alarms and events of the


events storage device.

Network Connection Describe the networking mode between


information mode application servers and the storage device, for
example, Fibre Channel or iSCSI networking.

Switch model Record the switch models if any switches exist on


the network.

Switch Manually export switch diagnosis information,


diagnosis including startup configurations, current
information configurations, interface information, time, and
system versions.

Network Describe the network topology or provide the


topology network diagram.

IP address Describe the IP address allocation principle or


information provide an IP address allocation list if the iSCSI
networking mode is used.

Application Operating Record the types and versions of the operating


server system systems installed on the application servers.
information version

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 18


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

Information Item Action


Type

Port rate Record the port rates of the application servers


connected to the storage device. For details
about how to view a port rate, see Help.

Operating View and export the operating system logs.


system logs

3.2 Collecting Fault Information


If a fault occurs, collect information required for troubleshooting so that the fault
can be quickly located and rectified. Fault information includes information about
file system faults, volume management faults, database faults, and storage system
faults.

NOTE

Before collecting fault information, assess the fault impact on services ahead of time. Back
up data and obtain certain permissions when necessary.

3.2.1 (Optional) Checking the Host Information


Checking the host information includes checking the information about the
configuration, multipathing software, and aggregated disks of a host. This section
describes how to check the host information in different operating systems such
as Windows, Linux, AIX, and HP-UX.

In Windows
You can check whether UltraPath is working properly by viewing information
about physical paths, logical paths, virtual disk properties, performance statistics,
and alarms on the CLI or GUI of UltraPath.

Step 1 Check the physical path status.


In the CMD window, run the upadm show path command to check the status of
a specified or all physical paths, including the working status as well as the
storage system, controller, and HBA to which the physical paths belong.
Step 2 Check the virtual disk information.
In the CMD window, run the upadm show vlun command to check the
information about all LUNs or a specified virtual disk, including VLUN IDs, host
LUN IDs, disk names, VLUN names, VLUN WWNs, VLUN status, capacity, working/
owning controllers, storage device names, storage SNs, logical path IDs, and
working status.
Step 3 Check the performance statistics.
In the CMD window, run the upadm show iostat command to check the
performance statistics information about a storage system or VLUN, including the
IOPS and bandwidth.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 19


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

Step 4 Check the UltraPath configuration.

In the CMD window, run the upadm show upconfig command to check the
UltraPath configuration.

----End

In Linux
Step 1 Check whether the multipathing software is installed.

Run the rpm -qa | grep UltraPath command to check whether the UltraPath is
installed properly. If the UltraPath information is displayed, UltraPath is installed.

Step 2 Check the physical path status.

Run the upadmin show path command to check the information about a
specified or all physical paths, including physical path IDs, initiator WWNs, storage
system name, owning controllers, target WWNs, physical path status, path
detection type, path detection status, and port type.

Step 3 Check the virtual disk information.

Run the upadmin show vlun command to check the information about all virtual
disks or a specified virtual disk, including VLUN IDs, disk names, VLUN names,
VLUN WWNs, VLUN status, capacity, owning/working controllers, storage device
names, storage SNs, logical path IDs, and working status.

Step 4 Check the logical path status.

Run the upadmin show vlun id=? command to check the information about the
logical path of a VLUN whose ID is specified, including logical path ID, SCSI
address, and path status.

Step 5 Check the UltraPath configuration.

Run the upadmin show upconfig command to check the UltraPath configuration.

----End

In AIX
Step 1 Check the physical path status.
1. Run the upadm show phypath command to check the information about a
specified or all physical paths, including physical path IDs, initiator WWNs,
storage system name, owning controllers, target WWNs, physical path status,
path detection status, and port type.
2. Run the upadm start phypathcheck id=? command to check the working
status of a specified physical path.

Step 2 Check the logical path status.

Run the upadm show path command to check the information about all logical
paths or a specified VLUN's logical paths. The information includes VLUN IDs,
logical path IDs, physical path IDs, initiator WWNs, storage system name, owning
controller, target WWNs, logical path status, and port type.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 20


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

Step 3 Check the virtual disk information.


Run the upadm show vlun to view the information about a specified or all VLUNs
mapped from the storage system to the application server. The information
includes VLUN IDs, host LUN IDs, disk names, VLUN names, LUN WWNs, VLUN
status, VLUN capacity, owning/active controller, storage system name, and storage
SN.
Step 4 Check the performance statistics.
Run the upadm show iostat command to view the IOPS and bandwidth
information about the storage system or a VLUN.
Step 5 Check the UltraPath configuration.
Run the upadm show upconfig command to check the UltraPath configuration.

----End

In HP-UX
HP-UX11.31 is delivered with the NMP multipathing software. The NMP
multipathing software is installed upon the system installation.

Step 1 Check the NMP status and ensure that it is enabled.


Run the scsimgr get_attr -a leg_mpath_enable command to check the NMP
status.
Step 2 Check the disks that the system discovers and check the NMP status of LUNs that
are mapped.
1. Run the ioscan -funNC disk command to check the LUNs that are mapped.
2. Run the scsimgr get_attr -D diskname -a leg_mpath_enable command. In
the command, diskname indicates the name of the device to which the
system allocates the LUNs.
Step 3 Check the disk path information after the NMP takes over the disks.
Run the scsimgr lun_map -D ? and scsimgr get_info -D ? commands to check
the disk path information after the NMP takes over the disks.
bash-4.0# scsimgr lun_map -D /dev/rdisk/disk60
LUN PATH INFORMATION FOR LUN : /dev/rdisk/disk60
Total number of LUN paths = 2
World Wide Identifier(WWID) = 0x633383110030333807b9074d0000010a
LUN path : lunpath39
Class = lunpath
Instance = 39
Hardware path = 0/0/0/7/0/0/1.0x2019333831303338.0x4002000000000000
SCSI transport protocol = fibre_channel
State = ACTIVE
Last Open or Close state = ACTIVE
LUN path : lunpath40
Class = lunpath
Instance = 40
Hardware path = 0/0/0/7/0/0/0.0x2009333831303338.0x4002000000000000
SCSI transport protocol = fibre_channel
State = ACTIVE
Last Open or Close state = ACTIVE
bash-4.0#
bash-4.0# scsimgr get_info -D /dev/rdisk/disk60
STATUS INFORMATION FOR LUN : /dev/rdisk/disk60

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 21


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

Generic Status Information


SCSI services internal state = ONLINE
Device type = Direct_Access
EVPD page 0x83 description code = 1
EVPD page 0x83 description association = 0
EVPD page 0x83 description type = 3
World Wide Identifier (WWID) = 0x633383110030333807b9074d0000010a
Serial number = "s3900develop004723900266"
Vendor id = "HUAWEI "
Product id = "S3900-M200 "
Product revision = "2102"
Other properties = ""
SPC protocol revision = 4
Open count (includes chr/blk/pass-thru/class) = 1
Raw open count (includes class/pass-thru) = 0
Pass-thru opens = 0
LUN path count = 2
Active LUN paths = 2
Standby LUN paths = 0
Failed LUN paths = 0
Maximum I/O size allowed = 2097152
Preferred I/O size = 2097152
Outstanding I/Os = 0
I/O load balance policy = round_robin
Path fail threshold time period = 0
Transient time period = 120
Tracing buffer size = 1024
LUN Path used when policy is path_lockdown = NA
LUN access type = NA
Asymmetric logical unit access supported = No
Asymmetric states supported = NA
Preferred paths reported by device = No
Preferred LUN paths = 0
Driver esdisk Status Information :
Capacity in number of blocks = 4194304
Block size in bytes = 512
Number of active IOs = 0
Special properties =
Maximum number of IO retries = 45
IO transfer timeout in secs = 30
FORMAT command timeout in secs = 86400
START UNIT command timeout in secs = 60
Timeout in secs before starting failing IO = 120
IO infinite retries = false

In the preceding command output, focus on the following content:


● State: indicates the state of a path. Ensure that the states of two paths are
ACTIVE before using them.
● I/O load balance policy: indicates the load balancing mode. By default, if
ALUA is disabled on the storage system, the load balancing mode of the NMP
is round_robin.
● Asymmetric logical unit access supported: indicates that LUNs support the
ALUA mode. If ALUA is enabled, the supported ALUA type will be displayed
(implicit or explicit ALUA).
Step 4 After the NMP takes over disks, find the disk device name and perform read and
write operations.
1. Run the newfs -F vxfs -o largefiles ? command to create a file system.
2. Run the mount diskname mountpoint command to mount the file system to
a specified directory.
3. Run the bdf command to check the read and write information.
bash-4.0# newfs -F vxfs -o largefiles /dev/rdisk/disk60
version 7 layout

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 22


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

2097152 sectors, 2097152 blocks of size 1024, log size 16384 blocks
largefiles supported
bash-4.0#
bash-4.0# mount /dev/disk/disk60 /test/mnt3/
bash-4.0# bdf
Filesystem kbytes used avail %used Mounted on
/dev/vg00/lvol3 1048576 314600 728296 30% /
/dev/vg00/lvol1 1835008 364368 1459224 20% /stand
/dev/vg00/lvol8 8912896 1421696 7434296 16% /var
/dev/vg00/lvol7 6553600 3037552 3488696 47% /usr
/dev/vg00/lvol4 524288 20952 499536 4% /tmp
/dev/vg00/lvol6 7864320 3071152 4760808 39% /opt
/dev/vg00/lvol5 114688 37872 76352 33% /home
/dev/vg_try/lv_try00
1228800 2447 1149713 0% /test/mnt1
/dev/vg_try/lv_try01
797845 9 718051 0% /test/mnt2
/dev/disk/disk60 2097152 18006 1949207 1% /test/mnt3

----End

3.2.2 (Optional) Collecting File System Fault Information


If a file system is faulty, collect its fault information for troubleshooting.

Windows File System Fault Information


Step 1 Collect the operating system version of the host.
1. Right-click Computer and choose Properties.
2. On the System page that is displayed, collect the Windows system version.

Step 2 Collect Windows system logs.


1. Right-click Computer and choose Manage.
2. In Event Viewer, select Windows Logs.
3. Select System and collect Windows system fault log information on the left.

----End

AIX File System Fault Information


Step 1 Collect the /var/adm/ras/errlog file.

errlog records software and hardware errors of the system.

----End

ExtX File System Fault Information


Step 1 Run the fdisk -l command to view all disk and partition information in the system.

Step 2 Run the mount command to view the mounting information of the current
operating system.

Step 3 Mount the device to a mount point to collect errors.

Step 4 Run the dd command to diagnose first 10 MB of the partition and collect file
system structure information.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 23


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

Step 5 Collect the /var/log/messages log file of the host.

----End

3.2.3 (Optional) Collecting Volume Management Fault


Information
When operating systems employ volume management, collect volume
management fault information.

LVM Fault Information of SUSE Operating Systems


Step 1 Log in to the server as user root.
Step 2 Collect files in the /etc/lvm/backup directory to collect Logical Volume Manager
(LVM) configurations.
Step 3 Collect physical volume (PV) status information.
1. Run the pvscan > /home/pvscan command to rescan PVs and redirect the
scanning result to the pvscan file. Then collect the file.
2. Run the pvdisplay > /home/pvdisplay command and redirect the result to
the pvdisplay file. Then collect the file.
Step 4 Collect volume group (VG) status information.
1. Run the vgscan command to view information about all VGs.
2. Run the vgscan > /home/vgscan command to rescan VGs and redirect the
scanning result to the vgscan file. Then collect the file.
3. Run the vgdisplay -v vgname > /home/vgdisplay_vgname command for the
faulty VG and redirect the result to the vgdisplay_vgname file. Then collect
the file.
vgname indicates the name of the faulty VG.
Step 5 Collect logical volume (LV) status information.
Run the lvdisplay > /home/lvdisplay command and redirect the result to the
lvdisplay file. Then collect the file.

----End

LVM Fault Information of HP-UNIX Operating Systems


Step 1 Log in to the server as user root.
Step 2 Collect all the vgname.conf files in the etc/lvmconf directory.
vgname.conf indicates the VG configuration file.
Step 3 Collect disk information.
1. Run the ioscan -funC disk > /home/pvlinks_disk command to list disk
information in physical volume links (PVLinks) and redirect the result to the
pvlinks_disk file. Then collect the file.
2. Run the ioscan -fuNnC disk > /home/NMP_disk command to list disk
information in NMP and redirect the result to the NMP_disk file. Then collect
the file.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 24


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

Step 4 Collect VG status information.


1. Run the vgscan command to view information about all VGs.
2. Run the vgscan > /home/vgscan command to rescan VGs and redirect the
scanning result to the vgscan file. Then collect the file.
3. Run the vgdisplay -v vgname > /home/vgdisplay_vgname command for the
faulty VG and redirect the result to the vgdisplay_vgname file. Then collect
the file.
vgname indicates the name of the faulty VG.

----End

3.2.4 (Optional) Collecting Database Fault Information


This section describes how to collect fault information of various databases,
including Oracle, SQL Server, and DB2.

Oracle Fault Information


Step 1 Collect the alert log file of an Oracle database.

Ensure that the Oracle database is in the nomount or mount state and run the
show parameter background_dump_dest command. Then obtain the alert log
file in the queried path.

● Oracle 10g: Obtain the alert log file in path $ORACLE_BASE/admin/dbname/


bdump.
● Oracle 11g: Obtain the alert log file in path $ORACLE_BASE/diag/rdbms/
dbname/inst_name/trace.

Step 2 Optional: Collect the alert log file of Oracle Automatic Storage Management
(Oracle ASM).
● Oracle 10g: Obtain the alert log file in path $ORACLE_BASE/admin/+asm/
bdump using user oracle.
● Oracle 11gR1: Obtain the alert log file in path $ORACLE_BASE/diag/asm/
+asm/trace using user oracle.
● Oracle 11gR2: Obtain the alert log file in path $ORACLE_BASE/diag/asm/
+asm/trace using user grid.

Step 3 Optional: On the ASM or database instance, run the select group_number,type
from v$asm_diskgroup command to collect ASM disk group information.

----End

SQL Server Fault Information


Step 1 Collect the log file of an SQL Server database.
● SQL Server 2005: Choose Management > SQL Server Logs.
● SQL Server 2008: Choose Management > SQL Server Logs. Right-click in the
path and choose View > SQL Server and Windows Log.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 25


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

Step 2 Run the use master select name,physical_name,size from sys.master_files


command to view the location and size of the database file.

----End

DB2 Fault Information


Step 1 Run the db2support outpath -d db_name -c command. The db2support.zip
package is displayed.

outpath indicates the path for saving the result, and db_name indicates the
database name.

----End

Oracle Clusterware Fault Information


Step 1 Collect the log file of Oracle Clusterware.

Run the $CRS_HOME/bin/diagcollection.sh command as user root to collect the


generated 4*.gz file.

Step 2 Collect the process and resource status of Clusterware.


● Oracle 11gR2:
– Run the $CRS_HOME/bin/crsctl stat res -t -init >> /home/
cluster_info.txt command as user root to collect the startup information
of the Clusterware process.
– Run the $CRS_HOME/bin/crsctl stat res -t >> /home/cluster_info.txt
command as user root to collect the startup information of Clusterware
resources.
● Oracle 10g or 11gR1:
– Run the ps -ef|grep d.bin >> /home/cluster_info.txt command as user
root to collect information about the Clusterware process.
– Run the $CRS_HOME/bin/crs_stat -t -v >> /home/cluster_info.txt
command as user root to collect information about Clusterware
resources.

----End

3.2.5 (Optional) Collecting the HBA Information


This section describes how to collect the HBA information in different operating
systems such as Windows, Linux, AIX, and HP-UX.

In Windows
Step 1 Right-click Computer and choose Manage. On the Server Manager page that is
displayed, choose Diagnostics > Device Manager to open the device manager.

Step 2 Check the HBA information.


1. Select SCSI and RAID controllers. Right-click an HBA and choose Properties
to go to the HBA properties page.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 26


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

2. On the page that is displayed, click Driver and select Driver Details to view
the HBA details.

Step 3 Check the connection between the storage system and HBA.
1. Right-click Device Manager and choose View > Devices by connection.
2. Expand the PCI Bus option to check the connection.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 27


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

----End

In Linux
Step 1 Run the cat /sys/class/scsi_host/hostX/symbolic_name command to check the
vendors and models of HBAs.

Step 2 Collect the version information about HBA drivers.


● Run the modinfo qla2xxx command to check the version of the QLogic HBA
driver.
● Run the modinfo lpfc command to check the version of the Emulex HBA
driver.

----End

In AIX
Step 1 Run the lslpp -l | grep fcp command to verify that the HBA drivers are installed
successfully.

Step 2 Run the lsdev -Cc driver command to list HBAs.

Step 3 Run the lscfg | grep fc command to list device names.

Step 4 Run the lscfg -vl fcsX command to view device WWNs. In the command, fcsX
indicates an HBA device ID.

Step 5 Run the lsmcode -r -d fcsX command to view the HBA microcode information.

----End

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 28


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

In HP-UX
Step 1 Run the ioscan -fnC fc command to check the path and name of the device where
the HBAs reside.
Step 2 Run the fcmsutil command to check the HBA details.
fcmsutil /dev/td0
Vendor ID is = 0x00103c
Device ID is = 0x001028
TL Chip Revision No is = 2.3
PCI Sub-system Vendor ID is = 0x00103c
PCI Sub-system ID is = 0x000006
Topology = PRIVATE_LOOP
Local N_Port_id is = 0x000001
Local Loop_id is = 125
N_Port Node World Wide Name = 0x50060b0000010449
N_Port Port World Wide Name = 0x50060b0000010448
Driver state = ONLINE
Hardware Path is = 0/3/0/0
Number of Assisted IOs = 47983
Number of Active Login Sessions = 0
Dino Present on Card = NO
Maximum Frame Size = 960
Driver Version = @(#) libtd.a HP Fibre Channel
Tachyon TL/TS/XL2 Driver B.11.11.09 (AR1201) /ux/kern/ki

----End

3.2.6 Collecting Switch Information


This section describes how to collect log information about Ethernet and Fibre
Channel switches.

3.2.6.1 Collecting the Ethernet Switch Information


This section describes how to collect log information about a Cisco switch.

Collecting the Information About a Cisco Switch


Step 1 In the address box of your browser, enter the IP address of the Cisco switch. On
the login page of the Cisco Device Manager software, enter the user name and
password to log in to the switch and go to the management software home page.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 29


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

Step 2 Click Logs. In the drop-down list, select message....

The Message Log dialog box is displayed.

Step 3 Click Export....

The page for selecting a storage path is displayed.

Step 4 Select a path, enter the file name, and click Save to save the switch log
information.

----End

3.2.6.2 Collecting Fibre Channel Switch Information


This section describes how to collect log information about QLogic and Brocade
switches.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 30


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

Collecting Information About a QLogic Switch


Step 1 In the address box of your browser, enter the IP address of a QLogic switch and
log in to the QuickTools switch management software.

Step 2 Enter the user name and password and click Add Fabric.

The QuickTools home page is displayed.

Step 3 On the menu bar, click Switch and select Download Support File....

The Download Support File dialog box is displayed.

Step 4 Select a path for saving the log information and type the file name. You are
advised to name the file in the format of switch name_dump_date.tgz. Click
OPEN.

Step 5 In the Download Support File dialog box that is displayed, click Start.

The system starts to export the support information.

Step 6 Wait until the switch information is collected.

----End

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 31


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

Collecting Information About a Brocade Switch


To collect the information about a Brocade switch, you must use an FTP server.
Before performing operations, determine the IP address, user name, and password
of the FTP server.

Step 1 Check the IP address of the Brocade switch and choose a host located on the same
network segment. Enable the FTP function and prepare the directory to save the
switch logs on the FTP server.
Step 2 Use the SSH to log in to the switch.
Step 3 Run the supportsave command.
swd77:admin>supportsave
This command collects RASLOG,TRACE,supportShow,core file,FFDC data
and other support information from both active and standby CPs and then transfer
them to FTP/SCP server
or a USB device.This operation can take several minutes.
NOTE:supportSave will transfer existing trace dump file first,the
automatically generate and transfer latest

Step 4 Enter Y. The system informs you of entering the following information:
1. Host IP or Host Name: IP address of the FTP server
2. User Name: user name of the FTP server
3. Password: password of the FTP server
4. Protocol (ftp or scp): used transfer protocol
5. Remote Directory: directory that is prepared on the FTP server
Step 5 The system starts to collect the switch information. The information is as follows:
Saving support information for chassis:swd77, module:RAS…
……………………
Saving support information for chassis:swd77, module:CTRACE_OLD…
Saving support information for chassis:swd77, module:CTRACE_NEW…
Saving support information for chassis:swd77, module:FABRIC…
…………
…………

Step 6 Wait until the switch information is collected.

----End

3.2.6.3 Checking the SFP Information


Check whether the working status of the SFP is normal. If the absolute value of TX
Power is larger than 6 and less than or equal to 9, the light-emitting power of the
SFP is low and the SFP has a problem. If the absolute value of RX Power is larger
than 9, the link attenuation value exceeds 3 dBm and the link quality has a
problem.

Brocade Switch
● Run the sfpshow portnum command to check the SFP information about a
port.
YQA-48K-F5C1:sfmon> sfpshow 10/29
Identifier: 3 SFP
Connector: 7 LC
Transceiver: 150c402000000000 100,200,400_MB/s M5,M6 sw Inter_dist
Encoding: 1 8B10B

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 32


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

Baud Rate: 43 (units 100 megabaud)


Length 9u: 0 (units km)
Length 9u: 0 (units 100 meters)
Length 50u: 15 (units 10 meters)
Length 62.5u:7 (units 10 meters)
Length Cu: 0 (units 1 meter)
Vendor Name: AGILENT
Vendor OUI: 00:30:d3
Vendor PN: AFBR-57R5AP
Vendor Rev:
Wavelength: 850 (units nm)
Options: 001a Loss_of_Sig,Tx_Fault,Tx_Disable
BR Max: 0
BR Min: 0
Serial No: A2061204S7
Date Code: 060323
DD Type: 0x68
Enh Options: 0xf0
Status/Ctrl: 0x0
Alarm flags[0,1] = 0x0, 0x0
Warn Flags[0,1] = 0x0, 0x0
Alarm Warn
low high low high
Temperature: 41 Centigrade -10240 25600 -2560 21760
Current: 5.984 mAmps 1.000 14.000 2.000 12.000
Voltage: 3229.3 mVolts 100.0 500.0 100.0 500.0
RX Power: -6.3 dBm (235.5 uW) 0.0 uW 0.0 uW 0.0 uW 0.0 uW
TX Power: -5.0 dBm (316.6 uW) 0.0 uW 6550.0 uW 49.0 uW 1100.0 uW

Cisco Switch
● Run the show interface xxx transceiver details command to check the SFP
information about a port.
YQB-9513-F7HE1# show interface fc1/1 transceiver details
fc1/1 sfp is present
Name is CISCO-AVAGO
Manufacturer's part number is SFBR-5780APZ-CS2
Revision is G2.3
Serial number is AGA14378D9F
FC Transmitter type is short wave laser w/o OFC (SN)
FC Transmitter supports short distance link length
Transmission medium is multimode laser with 62.5 um aperture (M6)
Supported speeds are - Min speed: 2000 Mb/s, Max speed: 8000 Mb/s
Nominal bit rate is 8500 Mb/s
Link length supported for 50/125mm fiber is 50 m
Link length supported for 62.5/125mm fiber is 20 m
Cisco extended id is unknown (0x0)
No tx fault, no rx loss, in sync state, diagnostic monitoring type is 0x68
SFP Diagnostics Information:
----------------------------------------------------------------------------
Alarms Warnings
High Low High Low
----------------------------------------------------------------------------
Temperature 31.27 C 75.00 C -5.00 C 70.00 C 0.00 C
Voltage 3.31 V 3.63 V 2.97 V 3.46 V 3.13 V
Current 6.89 mA 8.50 mA 2.00 mA 8.50 mA 2.00 mA
Tx Power -2.53 dBm 1.70 dBm -14.00 dBm -1.30 dBm -10.00 dBm
Rx Power -3.44 dBm 3.00 dBm -17.30 dBm 0.00 dBm -13.30 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning

3.2.7 Collecting Storage System Fault Information


When a storage system is faulty, collect alarm and event information about the
storage system.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 33


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

3.2.7.1 Exporting System Data


Periodically export the system data of a storage system and save it in a safe place.
This helps you know the operating status of the storage system and prevent the
damage to the storage system caused by system faults and unexpected disasters.
When a system failure occurs, the exported system data can be used to locate and
analyze the failure. The system data to be exported includes running data, system
logs and disk logs.

Context
● Running data indicates the real-time running status of a storage system, such
as, the configuration information of LUN. The running data file is in *.txt
format.
● System logs record the information about the running data, events, and
debugging operations on a storage system and can be used for analyzing the
running status of the storage system. The system log file is in *.tgz format.
● DHA runtime log is the daily runtime log of disk. It mainly includes daily disk
health status and I/O information. The DHA runtime log file is in *.tgz format.
– DHA logs collect the SMART/LogPage (collected at 2 o'clock in the
morning) and I/O statistics (collected every two hours) and generate a
package (1 KB) each day. A disk on a single controller can generate a
maximum of 74 packages within a year (some old log packages will be
deleted during the collection). Packages of a disk on a single controller
and an information file will be exported each time.
– Recommended times of export during routine maintenance are listed in
the following table. The analysis of DHA logs is only performed on
samples instead of all logs. To prevent the analysis of DHA logs from
affecting the entire routine maintenance, take the recommended values
only for reference.
Disk Quantity in an Array Maximum Times of Export During
an Inspection

0 to 200 ≤3

200 to 500 ≤4

500 to 1000 ≤5

1000 to 2000 ≤6

>2000 ≤6

● HSSD log is working log of HSSD, such as, the S.M.A.R.T information of disk.
The HSSD log file is in *.tgz format.
Before the download of system logs, DHA runtime logs, or HSSD logs, the system
collects those logs of controllers and shows the collection progress. After all logs
are collected, you can download your desired logs.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 34


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

NOTICE

After the system starts collecting system logs, DHA run logs, or HSSD logs, you
need to wait for five minutes or download all the collected logs before you collect
and download other logs.

Procedure
Step 1 Log in to DeviceManager.

Step 2 Choose Settings > Export Data.


Step 3 Export data.
● In Running Data area, click Download. Confirm the information in the
security alert dialog box, select I have read and understood the
consequences associated with performing this operation., and click OK.
The system running data is exported.
● In System Log area, select Recent logs or All logs, and click Log List.
Confirm the information in the security alert dialog box, select I have read
and understood the consequences associated with performing this
operation., and click OK.
The system starts collecting logs and expands the log list.
● In Disk Log area, click DHA Runtime Log List. Confirm the information in the
security alert dialog box, select I have read and understood the
consequences associated with performing this operation., and click OK.
The system starts collecting logs and expands the log list.
● In Disk Log area, click HSSD Log List. Confirm the information in the security
alert dialog box, select I have read and understood the consequences
associated with performing this operation., and click OK.
The system starts collecting logs and expands the log list.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 35


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

NOTE

● In the System Log area, if Recent logs is selected, the system exports recent logs that
have been generated by the current point in time. The logs include the latest one
power-on and power-off log and a maximum of six messages logs. If All logs is
selected, the system exports all logs on the current node. Note that historical messages
logs are saved to the /OSM/coffer_log/log/his_debug directory.
● If you export the data using the Internet Explorer browser with the default settings, the
data will be saved in the download path which the user has selected. For example, you
can choose Save > Save as in the displayed file download dialog box and select the
download path in Internet Explorer 9 browser.
● If you export the data using the Firefox browser with the default settings, the data will
be saved in the default download path of the browser. You can choose Tools > Options
and click the General > Browser in the Options dialog box to view the default
download path.
● If you export the data using the Google Chrome browser with the default settings, the
data will be saved in the default download path of the browser. You can choose
Customize and Control Google Chrome > Settings and view the default download
path in the Download Content area of the Settings page.
● When using Chrome to export for the first time, click Allow if the This site is
attempting to download multiple files. Do you want to allow this message?
message is displayed. Otherwise, at the upper right corner of the browser, choose
Customize and control Google Chrome > Settings > Privacy > Content Settings... >
Automatic downloads > Manage exceptions, select Allow in Behaviour, and click
Finished. Then, reopen the web page and you can successfully download multiple files.
Alternatively, delete Block from Behaviour and click Finished. Then, reopen the web
page again and you can download multiple files. In such a case, a message asking
whether to allow multiple files to be downloaded will be displayed.
● If the exported logs cannot be viewed, export the logs again. If the new logs still cannot
be viewed, contact Huawei technical support.

Step 4 Click Close.

----End

3.2.7.2 Managing All Events


The event list on the DeviceManager contains information about Info alarms and
the others that have been cleared from the storage device. You can learn about
the operating and historical status of the storage device by referring to the list.

Precaution
Exported alarms and events are saved in *.tgz (Save All) or *.xls (Save Selected)
file. Do not change the content of the file.

Procedure
Step 1 Log in to DeviceManager.

Step 2 Choose Monitor > Alarms and Events > All Events.

Step 3 Optional: Set Occurred At to All or Custom based on site requirements.

Step 4 Optional: Set search criteria and click Search to search for desired events.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 36


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

Step 5 Select an event and handle it by taking actions described in Suggestion.

Step 6 Optional: Export events.

Click Save As > Save All or select the events that you want to export and click
Save As > Save Selected. In the dialog box that is displayed, perform operations
as prompted.

----End

3.2.7.3 Managing Current Alarms


The alarm list on the DeviceManager contains information about existing alarms
on the storage device. You can easily clear a listed alarm by referring to the
detailed description and troubleshooting suggestions on the alarm.

Precaution
Exported alarms and events are saved in *.tgz (Save All) or *.xls (Save Selected)
file. Do not change the content of the file.

Procedure
Step 1 Log in to DeviceManager.

Step 2 Choose Monitor > Alarms and Events > Current Alarms.

Step 3 Optional: Set Occurred At to All or Custom based on site requirements.

Step 4 Optional: Set search criteria and click Search to search for desired alarms.

Step 5 Select an alarm and handle it by taking actions described in Suggestion.

Step 6 Optional: Clear alarms.


1. In the alarm list, select the alarms that you want to clear and click Clear.
2. In the security alert dialog box that is displayed, click OK.
3. In the Execution Result dialog box that is displayed, click Close.

Step 7 Optional: Export alarms.

Click Save As > Save All or select the alarms that you want to export and click
Save As > Save Selected. In the dialog box that is displayed, perform operations
as prompted.

Step 8 Optional: Click Send Simulated Alarm to simulate the reporting of a fault alarm.

Send this simulated alarm to test the alarm function of the device. If this
simulated alarm already exists, this alarm will be considered invalid after being
resent. Before the test, confirm that this simulated alarm has been manually
cleared. After the test, manually clear the alarm.

----End

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 37


OceanStor
Troubleshooting 3 Collecting Information and Reporting a Fault

Follow-up Procedure
If an alarm appears on the Current Alarms tab page, select the alarm and
diagnose the problem according to its details and repair suggestions.

3.2.7.4 Collecting Fault Information About a Storage System in Abnormal


Mode
If a storage system is in abnormal mode, you cannot use OceanStor
DeviceManager normal page or CLI to collect fault information about the storage
system.

Procedure
Step 1 Open Internet Explorer, and enter https://ipaddress:8088, the IP address of the
management network port, in the address box. ipaddress indicates the IP address
of the management network port.
Step 2 Enter your user name and password.
The fault page is displayed.
Step 3 Click Download Log.
The system automatically downloads logs.

Figure 3-1 Download log

----End

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 38


OceanStor
Troubleshooting 4 Common Troubleshooting

4 Common Troubleshooting

This chapter explains how to troubleshoot common faults.


4.1 Troubleshooting Management Software Faults
4.2 Troubleshooting Basic Storage Service Faults

4.1 Troubleshooting Management Software Faults


This chapter explains how to troubleshoot management software faults.

4.1.1 Failure to Synchronize the Client Time Zone on the


DeviceManager Due to the Browser Obtaining System Time
Zone Mechanism
Log in to the DeviceManager and synchronize the time zone of the client. The
system displays a message indicating that the operation is successful. However,
the time zone information of the client is not synchronized to the device.

Symptom
The system displays a message indicating that the time zone is modified
successfully. However, the time zone information of the client is not synchronized
to the device.

Alarm Information
None

Possible Causes
The time zone of the client is modified after you log in to the DeviceManager.
Therefore, the time zone information of the client is not synchronized to the
device.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 39


OceanStor
Troubleshooting 4 Common Troubleshooting

Fault Diagnosis

Figure 4-1 Fault diagnosis of the failure of synchronizing the client time zone on
the DeviceManager

Failed to synchronize the client


time zone on the OceanStor
DeviceManager

Close the browser, re-


Is the client time zone Yes log in to the OceanStor
modified on the OceanStor DeviceManager, and
DeviceManager? synchronize the client
time zone.

No

Unknown faults exist. Contact


technical support engineers.

End

Procedure
Step 1 Close the browser window.

Step 2 Reopen the browser, log in to the DeviceManager, and synchronize the time zone
of the client.

Check whether the time zone of the client is synchronized.

● If yes, no further action is required.


● If no, keep the fault environment intact and contact technical support
engineers.

----End

4.1.2 Failure to Log In to the DeviceManager by Entering an


IPv6 Address in the Address Box of a Browser Earlier Than
Firefox 24.0
When an IPv6 address is entered in the address box of a browser earlier than
Firefox 24.0 to log in to the DeviceManager, the login fails.

Symptom
When an IPv6 address is entered in the address box of a browser earlier than
Firefox 24.0 to log in to the DeviceManager, the IP address cannot be added to the
exception or trusted site list. As a result, the login fails.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 40


OceanStor
Troubleshooting 4 Common Troubleshooting

Alarm Information
None

Possible Causes
The Firefox web browser is incompatible with the DeviceManager.

Fault Diagnosis

Figure 4-2 Troubleshooting flowchart for a failure to log in to the DeviceManager


by entering an IPv6 address in the address box of a browser earlier than Firefox
24.0
Failure to log in to the
DeviceManager using an IPv6
address
Log in to the
DeviceManager by entering
an IPv6 address in the
Yes address box of Firefox 24.0
Is the browser version
earlier than Firefox24.0? or a later version, or
Chrome. Or log in to the
DeviceManager using an
No IPv4 address.
Unknown faults exist. Contact
technical support engineers.

End

Procedure
Step 1 When using an IPv6 address to log in to DeviceManager, you are advised to use
the Chrome browser and not to use Firefox 24.0 or an earlier version. Alternatively,
you can use an IPv4 address to log in to DeviceManager.

Step 2 Check whether the login is successful.


● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

4.1.3 Browser SSL Information Is Damaged


The current way to access to DeviceManager is through HTTPS (HTTP + SSL)
protocol. Between browser and DeviceManager server, a symmetric key is
negotiated to encrypt and decrypt the data. Therefore, when the key information
of the browser is damaged, the server end cannot decrypt the transporting data,
which causes the browser to fail to request the correct data.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 41


OceanStor
Troubleshooting 4 Common Troubleshooting

Symptom
The possible symptoms after logging in to the DeviceManager:
● Device status remains offline.
● Device time has stopped.
● Click the content area left to the navigation bar to load, but the progress is
very slow.
● Press F5 to refresh the page. The system prompts that the web page cannot
be displayed.

Alarm Information
None

Possible Causes
Agent or other programs cause the negotiated symmetric key information damage
of the browser current tab page.

Fault Diagnosis

Figure 4-3 Flowchart for locating the cause for browser SSL information damage
Browser SSL information is
damaged

Whether the page reloading Yes


problem is rectified?

No

After clearing the


browser cache,
open the browser again and log Yes
in to the DeviceManager.
Whether the fault is
rectified?

No

The fault is unknown. Contact


technical support engineers.

End

Procedure
Step 1 Copy the address path of the damage tab page.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 42


OceanStor
Troubleshooting 4 Common Troubleshooting

Step 2 Close this tab page.

Step 3 Open a tab page again and enter the copied tab page path. Press Enter to visit
DeviceManager again. Check whether the operation is successful.
● If yes, the problem is resolved.
● If no, go to Step 4.

Step 4 If the tab page problem remains, clear the browser cache and close the browser.

Step 5 Open the browser again and log in to the DeviceManager. Check whether the
operation is successful.
● If yes, the problem is resolved.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

4.1.4 Alarm Sound Cannot Be Played on DeviceManager


A user logs in to the DeviceManager on Chrome, and an alarm is generated.
However, no alarm sound is played.

Symptom
● Symptom 1: A user logs in to the DeviceManager on Chrome (earlier than
Chrome 57) running on a Windows XP system, and the alarm sound is
enabled, as shown in Figure 4-4. However, no alarm sound is played when an
alarm is generated.

Figure 4-4 Alarm sound is enabled

● Symptom 2: A user cannot enable the alarm sound on Chrome 57 to 71. The
browser reports that the Adobe Flash Player was blocked, as shown in Figure
4-5.

Figure 4-5 Plug-in being blocked

Alarm Information
None

Possible Causes
● The Flash Player of Chrome earlier than 57 has a compatibility issue with
Windows XP.
● The Adobe Flash Player of Chrome 57 to 71 is outdated and blocked by
Chrome.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 43


OceanStor
Troubleshooting 4 Common Troubleshooting

Procedure
● The Flash Player of Chrome earlier than 57 has a compatibility issue with
Windows XP.
a. Download the Flash Player from Adobe's official website and install it.
b. Restart Chrome, type chrome://plugins in the address box, and press
Enter. The following page is displayed, as shown in Figure 4-6. If the
Flash Player is installed correctly, two Flash Player plug-ins are displayed.
The one that ends with pepflashplayer.dll in Location is Chrome's built-
in Flash Player, while the other is the newly installed one.

Figure 4-6 Flash Player plug-ins

c. Disable Chrome's built-in Flash Player (clicking Disable) and enable the
newly installed one, as shown in Figure 4-7.

Figure 4-7 Enabling Flash Player

d. Press F5 to refresh the DeviceManager. On the home page, click the bell
button to enable the alarm sound function.
e. Check whether the system plays the alarm sound correctly.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 44


OceanStor
Troubleshooting 4 Common Troubleshooting

▪ If yes, no further action is required.

▪ If no, keep the fault environment intact and contact Huawei


technical support.
● The Adobe Flash Player of Chrome 57 to 71 is outdated and blocked by
Chrome.
a. Click Update plugin or Run this time.
b. Check whether the system plays the alarm sound correctly.

▪ If yes, no further action is required.

▪ If no, keep the fault environment intact and contact Huawei


technical support.

4.1.5 DeviceManager Has an Interface Input Exception


After a user logs in to DeviceManager, characters cannot be input on the interface
or other interface input exceptions occur.

Symptom
After a user logs in to DeviceManager by using a browser, characters cannot be
input on the interface or other interface input exceptions occur. For example, after
the Create Storage Pool dialog box is displayed, a user inputs a capacity value by
using the input method. The value cannot be input or an input exception occurs.

Alarm Information
None

Possible Causes
Possible causes are as follows:
● The browser input status is abnormal. For example, the shortcut key,
intelligent word selection function, or intelligent statistics function of the
input method triggers the setting page of the input method.
● The input method is set to the full-width state.
● The input method is incompatible with the browser. An unknown bug is
reported during the input process.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 45


OceanStor
Troubleshooting 4 Common Troubleshooting

Fault Diagnosis

Figure 4-8 DeviceManager flowchart for troubleshooting an interface input


exception
DeviceManager has an
interface input exception

Is the setting page of the Yes


Close the setting page.
input method opened?

No

Is the input method set to Yes Switch the input method to the
the full-width state? half-width state.

No

Is the exception
solved after the input Yes
method is switched to
another?

No

The fault is unknown. Contact


technical support engineers.

End

Procedure
Step 1 Check whether the setting page or a sub-page of the input method is opened. If
the setting page or a sub-page of the input method is opened, close the setting
page or sub-page.
After the operation is complete, check whether the exception is solved.
● If yes, no further action is required.
● If no, go to Step 2.
Step 2 Check whether the input method is set to the full-width state. If the input method
is set to the full-width state, switch the input method to the half-width state.
After the operation is complete, check whether the exception is solved.
● If yes, no further action is required.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 46


OceanStor
Troubleshooting 4 Common Troubleshooting

● If no, go to Step 3.
Step 3 Switch the input method to another input method.
After the operation is complete, check whether the exception is solved.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

Suggestion and Summary


None

4.1.6 After the DeviceManager Is Upgraded, a Picture Layout


or Display Fault Occurs
After logging in to DeviceManager through a browser, a user cannot obtain the
latest static resources such as pictures, sample files, and JavaScript (JS) files. The
user needs to manually clear the browser's cache.

Symptom
After a user logs in to DeviceManager through a browser, a layout fault or picture
display fault occurs.

Alarm Information
None

Possible Causes
Because the browser's cache is not cleared, the user cannot obtain the latest static
resources.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 47


OceanStor
Troubleshooting 4 Common Troubleshooting

Fault Diagnosis

Figure 4-9 Flowchart for locating a layout fault or picture display fault
A layout fault or picture
display fault occurs

Whether
it is the initial login after Yes
Clear the browser's cache.
the DeviceManager is
upgraded?

No

Unknown faults exist. Contact


technical support engineers.

End

Procedure
Step 1 Check whether it is the initial login after the DeviceManager is upgraded.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
Step 2 Clear the browser's cache. For Internet Explorer, Firefox, or Chrome, press Ctrl
+Shift+Del when the browser is activated.
The page for clearing historical records is displayed. Select items that you want to
delete, and delete them. (For Internet Explorer, select the item of Internet
temporary files. For Chrome and Firefox, select the cache item.)
After the operation is complete, check whether the fault is solved.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

Suggestion and Summary


None

4.1.7 Current Alarms or All Events Exported from Internet


Explorer 9 Are Deleted
A user employs Internet Explorer 9 (9.0.8112.16421) to log in to the
DeviceManager and export Current Alarms or All Events to the user's local

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 48


OceanStor
Troubleshooting 4 Common Troubleshooting

computer. When the user clicks Open or Save, Current Alarms or All Events are
deleted with a probability.

Symptom
A user employs Internet Explorer 9 (9.0.8112.16421) to log in to the
DeviceManager and export Current Alarms or All Events to the user's local
computer. When the user clicks Open or Save, Internet Explorer 9 displays a
message indicating that Current Alarms or All Events are removed or deleted.

Alarm Information
None

Possible Causes
Current Alarms or All Events exported from Internet Explorer 9 (9.0.8112.16421)
are deleted with a probability because security configuration of Internet Explorer 9
is incorrect.

Fault Diagnosis

Figure 4-10 Flowchart for locating a deletion fault of Current Alarms or All
Events exported from Internet Explorer 9
Current Alarms or All Events
exported from IE9 are deleted

Yes Replace Internet Explorer 9 with


Is it IE9
another explorer or use Internet
(9.0.8112.16421)?
Explorer of another version.

No

The fault is unknown. Contact


technical support engineers.

End

Procedure
Step 1 Replace Internet Explorer 9 with another explorer or use Internet Explorer of
another version.

Then, export Current Alarms or All Events, and check whether the fault is
rectified.

● If yes, no further action is required.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 49


OceanStor
Troubleshooting 4 Common Troubleshooting

● If no, keep the fault environment intact and contact technical support
engineers.

----End

Suggestion and Summary


None

4.1.8 An Exception Occurs When a User Logs In to


DeviceManager in Internet Explorer 10
Symptom
When a user logs in to DeviceManager in Internet Explorer 10, the following
exception occurs:
● A blank web page is displayed or the user cannot click on the web page.
● Press F12 to open Developer Tools of Internet Explorer 10, the following
message is displayed:
/SCRIPT7002: XMLHttpRequest: Network Error 0x2f7d.

Alarm Information
None

Possible Causes
Web browsers access DeviceManager through HTTPS. Errors occur in the
certificate chain of Internet Explorer 10. This is an inherent problem of Internet
Explorer 10.

Procedure
Press F5 to refresh the web page.
Then check whether the fault is rectified.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.

Suggestion and Summary


None

4.1.9 Failure to Log In to the DeviceManager Using a Firefox


Web Browser
After a user has imported a digital certificate to the DeviceManager or rolled back
the digital certificate to the factory defaults, the user fails to log in to the
DeviceManager using the Firefox web browser on which the cache has not been
refreshed.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 50


OceanStor
Troubleshooting 4 Common Troubleshooting

Symptom
After a user has imported a digital certificate or rolled back the digital certificate
to the factory defaults, the user fails to log in to the DeviceManager using the
Firefox web browser. The This Connection Is Untrusted page is displayed, and the
Add Exception button is unavailable, as shown in the following figure.

Figure 4-11 This Connection Is Untrusted page

Alarm Information
None

Possible Causes
The cache of the Firefox web browser is not refreshed.

Procedure
In the Firefox web browser, press Ctrl+Shift+Del and clear the browsing history as
prompted.

After the operation is complete, restart the Firefox web browser and check
whether the fault is rectified.

● If yes, no further action is required.


● If no, keep the fault environment intact and contact technical support
engineers.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 51


OceanStor
Troubleshooting 4 Common Troubleshooting

Suggestion and Summary


None

4.1.10 Slow Loading of SystemReporter on the Chrome Web


Browser
If the Chrome web browser has been upgraded, web pages previously cached by
the earlier version of Chrome remain in the browser cache. This causes slow
loading or no loading of SystemReporter.

Symptom
A user logs in to SystemReporter using Chrome or DeviceManager. However, the
loading is slow and SystemReporter cannot be used.

Possible Causes
Web pages previously cached by the earlier version of Chrome are still in the
browser cache.

Procedure
Step 1 Open the Chrome web browser and press Ctrl+Shift+Delete.

Step 2 Clear the browsing history as prompted.

Step 3 Log in to SystemReporter and check whether the fault is rectified.


● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

4.1.11 The DeviceManager Page Fails to Be Loaded or Is


Displayed Incorrectly
The DeviceManager page fails to be loaded or is displayed incorrectly, and the
storage system cannot be managed.

Symptom
● The DeviceManager page is being loaded or is displayed incorrectly.
● Log in to the DeviceManager and go to the device view. When the device view
is being loaded, click other navigation paths in succession to switch to other
pages for several times. The tab page of the web browser breaks down
occasionally.

Alarm Information
None

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 52


OceanStor
Troubleshooting 4 Common Troubleshooting

Possible Causes
● The network is faulty, so the page fails to be loaded occasionally.
● The web browser is incompatible with the storage system, so the page fails to
be loaded occasionally.
● The cache data of the web browser is abnormal.

Procedure
Step 1 Press F5 to reload the page or press the Reload button on the current tab page.

Check whether the page is successfully loaded.

● If yes, no further action is required.


● If no, go to Step 2.

Step 2 Open the browser and press Ctrl+Shift+Delete. Clear the browsing history as
prompted.

Log in to the browser and check whether the page is successfully loaded.

● If yes, no further action is required.


● If no, keep the fault environment intact and contact technical support
engineers.

----End

Suggestion and Summary


None

4.1.12 Timeout Occurs During Storage System Configuration


When a storage system is being configured, timeout occurs.

Symptom
When a storage system is being configured, a message is displayed stating The
communication is abnormal or the system is busy. Please try again later.

Possible Causes
● The communication link is down, so the configuration fails or the returned
result is lost.
● The storage system is processing a system fault or abnormality, so it cannot
run a command.

Procedure
Step 1 Check whether the configuration has taken effect.
● If yes, go to Step 2.
● If no, go to Step 3.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 53


OceanStor
Troubleshooting 4 Common Troubleshooting

Step 2 Perform another configuration operation.

Step 3 Check the alarm and log information and remove the system fault or abnormality.

Step 4 Run the command after the system is recovered and check whether the command
is successfully executed.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

4.1.13 Failed to Import the Configuration File


When the primary controller malfunctions or is being restarted, importing the
configuration file may fail. You can retry later as prompted.

Symptom
The configuration file fails to be imported. A message is displayed stating Receive
message failed.

Possible Causes
● The primary controller malfunctions and cannot process any services.
● The primary controller is being restarted and cannot process any services.

Procedure
Step 1 Check whether the primary controller status is normal.
● If yes, keep the fault environment intact and contact technical support
engineers.
● If no, go to Step 2.

Step 2 Wait until the primary/secondary switchover is complete (for example, for 10s)
and import the configuration file.

Check whether the fault is rectified.

● If yes, no further action is required.


● If no, keep the fault environment intact and contact technical support
engineers.

----End

4.1.14 Failed to Access DeviceManager and SystemReporter


Using Internet Explorer
In Windows, DeviceManager and SystemReporter cannot be accessed using
Internet Explorer.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 54


OceanStor
Troubleshooting 4 Common Troubleshooting

Symptom
● The login pages of DeviceManager and SystemReporter are blank.
● The login page of SystemReporter is displayed, but nothing is displayed after
Login is clicked.

Possible Causes
The Internet Explorer browser on the operating system of the server is configured
for enhanced security.

DeviceManager and SystemReporter do not support Secure Socket Layer 3.0


(SSLv3).

Fault Diagnosis

Figure 4-12 Troubleshooting flowchart


Failed to access DeviceManager
and SystemReporter using
Internet Explorer

Can they be correctly Yes Modify the security


accessed using Chrome or enhancement settings of the
Firefox? Internet Explorer browser.

No

Contact technical support


engineers.

End

Procedure
Step 1 Open an Internet Explorer browser.

Step 2 Choose Tools > Internet Options.

The Internet Options page is displayed.

Step 3 Click Advanced.


1. In the Settings area, deselect Use SSL 2.0 and Use SSL 3.0 and select Use
TLS 1.1 and Use TLS 1.2.
2. Click Apply.

Step 4 Click Security.


1. Select Trusted sites and click Sites.
The Trusted sites dialog box is displayed.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 55


OceanStor
Troubleshooting 4 Common Troubleshooting

2. Add the websites of DeviceManager and SystemReporter, for example, https://


IP:Port, to the trusted sites.
3. Click Close.
The Internet Options page is displayed.
Step 5 Click Apply.
Step 6 Restart the Internet Explorer browser and check whether the fault is rectified.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

4.1.15 The System Responds Slowly When Excessive


Maintenance Terminals Are Used to Connect to a Storage
System Simultaneously
Symptom
On the maintenance terminal, enter the correct user name and password to log in
to the DeviceManager. The system responds slowly.

Alarm Information
None

Possible Causes
More than 3 maintenance terminals are used simultaneously to connect to the
storage system.
Conclusion: Excessive maintenance terminals are used, causing the system
response to become slow.

Procedure
Reduce the number of maintenance terminals that are simultaneously used to
connect to the storage system.

Suggestion and Summary


None

4.1.16 OceanStor DeviceManager Cannot Be Accessed


Correctly Due To Boot Disk or Coffer Disk Faults
Symptom
A storage system cannot access DeviceManager correctly, but the DeviceManager
can be accessed using the command-line interface (CLI) or the DeviceManager of

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 56


OceanStor
Troubleshooting 4 Common Troubleshooting

other controllers. In the meantime, controller or coffer disk fault alarms are
reported in the storage system.

Possible Causes
● In multi-controller scenarios, the boot disk of the current controller cannot be
detected.
● In single-controller scenarios, the coffer disk of the current controller cannot
be detected.

Procedure
Contact technical support engineers.

4.1.17 Antivirus Scanning Failure

Symptom

After logging in to OceanStor DeviceManager and choosing Settings > Anti-


Virus, a user finds a large number of antivirus scanning failures.

Possible Causes
● The antivirus server has not been added into an antivirus domain, causing
access and scanning failures.
● Antivirus Agent Watchdog of the antivirus server has not been started using
an antivirus domain user, causing access and scanning failures.
● The antivirus software on the antivirus server has not been started using an
antivirus domain user, causing access and scanning failures.
● Some files to be scanned (for example, EXCEL) are opened exclusively by
some software, causing scanning failures. This problem is normal and requires
no handling.
● The files to be scanned are deleted. Temporary files are generated when a
user uses some editors, such as vi, to edit files. These files are deleted when
the user exits the vi editor, causing scanning failures. This problem is normal
and requires no handling.
● Antivirus software has vulnerabilities, which cause scanning failures.
● The antivirus domain user has not been added to the antivirus group, causing
access and scanning failures.

Procedure
● Cause 1: The antivirus server has not been added into an antivirus domain,
causing access and scanning failures.
a. On the antivirus server, right-click Computer and choose Properties.
b. In Computer name, domain, and workgroup settings, click Change
settings.
c. On the Computer Name tab page, click Change.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 57


OceanStor
Troubleshooting 4 Common Troubleshooting

d. In Member of, select Domain and enter the full domain name.
e. Click OK. Check whether the scanning is successful.

▪ If yes, no further action is required.

▪ If no, go to •Cause 2: Antivirus Agen....


● Cause 2: Antivirus Agent Watchdog of the antivirus server has not been
started using an antivirus domain user, causing access and scanning failures.
a. On the antivirus server, choose Start > Run and enter services.msc to
open the Services (Local) console.
b. Right-click Antivirus Agent Watchdog, and choose Properties.
c. Click the Log On tab. In the Log on as: area, enter the domain user name
and password of the antivirus group and click OK. Check whether the
scanning is successful.

▪ If yes, no further action is required.

▪ If no, go to •Cause 3: The antivirus ....


● Cause 3: The antivirus software on the antivirus server has not been started
using an antivirus domain user, causing access and scanning failures.
a. On the antivirus server, choose Start > Run and enter services.msc to
open the Services (Local) console.
b. Right-click a background antivirus software service and choose
Properties.
c. Click the Log On tab. In the Log on as: area, enter the domain user name
and password of the antivirus group and click OK. Check whether the
scanning is successful.

▪ If yes, no further action is required.

▪ If no, go to •Cause 4: Some files to ....


● Cause 4: Some files to be scanned (for example, EXCEL) are opened
exclusively by some software, causing scanning failures.
a. Check whether files (for example, EXCEL) are opened exclusively for
editing by using editors. Close the editors and check whether the
scanning is successful.

▪ If yes, no further action is required.

▪ If no, go to •Causer 5: Antivirus sof....


● Causer 5: Antivirus software has vulnerabilities, which cause scanning failures.
a. Restart the antivirus software and check whether the scanning is
successful.

▪ If yes, no further action is required.

▪ If no, go to •Cause 6: The antivirus ....


● Cause 6: The antivirus domain user has not been added to the antivirus group,
causing access and scanning failures.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 58


OceanStor
Troubleshooting 4 Common Troubleshooting

a. Log in to OceanStor DeviceManager.


b. Choose Provisioning > User Authentication > Local Authentication
User Group > AntivirusGroup.
c. On the Domain user tab page, click Add.
d. Enter the antivirus domain user and click Add. Then click OK. Check
whether the scanning is successful.

▪ If yes, no further action is required.

▪ If no, keep the fault environment intact and contact Huawei


technical support.

4.1.18 Failed to Install Adobe Flash Player on Windows Server


2012
Symptom
During the installation of Adobe Flash Player on Windows Server 2012, an error
message is displayed indicating that Microsoft Internet Explorer contains Adobe
Flash Player of the latest version. The installation fails.

Possible Causes
Special Flash Player is built in the Windows Server 2012 system, but the plug-in
has not been enabled.

Procedure
Step 1 Open the server manager and click Add roles and features.

Step 2 Click Next until Select Features is displayed.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 59


OceanStor
Troubleshooting 4 Common Troubleshooting

Step 3 Expand User Interfaces and Infrastructure and select Desktop Experience.

In the Add Roles and Features Wizard dialog box that is displayed, click Add
Features.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 60


OceanStor
Troubleshooting 4 Common Troubleshooting

Step 4 Click Next.

Step 5 On the Confirm installation selections page that is displayed, click Install.

Step 6 After the installation is complete, restart the PC. Open the control panel and you
can see that Adobe Flash Player has been successfully installed.

----End

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 61


OceanStor
Troubleshooting 4 Common Troubleshooting

4.1.19 Failing to Log In to DeviceManager Using a Browser


Again After a Timeout
The browser mechanism causes another failure of logging in to DeviceManager.
Retry after refreshing the page.

Symptom
When you use a browser to log in to DeviceManager, the wait times out, and you
log out automatically. If you try to log in again, Communicating with the device
failed. Please check that the network connection or the system is normal is
displayed. The retry fails.

Possible Causes
The security certificate used by DeviceManager is not trusted by the browser. The
login request is intercepted by the browser.

Procedure
Step 1 Press F5 to refresh the browser page.
NOTE

The browser may prompt that the security certificate is questionable. Ignore this prompt
and continue visiting the storage system.

Step 2 Enter the user name and password to check whether you can log in to
DeviceManager.
● If you can log in, no further action is required.
● If you fail to log in, keep the environment intact and contact technical
support.

----End

4.1.20 What Can I Do If the Alarm Sound and Quick Start of


DeviceManager Do Not Function Properly on Chrome Later
Than 55?
Chrome later than 55 will block the Flash Player plug-in. As a result, the alarm
sound and Quick Start of DeviceManager cannot be played.

Symptom
The alarm sound is disabled by default. After you use Chrome later than 55 to log
in to DeviceManager and enable the alarm sound, sound for new alarms is still
not played. Message This plugin is not supported is displayed for Quick Start, as
shown in Figure 4-13.

NOTE

For details about the browser versions supported by DeviceManager, see the Huawei
Storage Interoperability Navigator.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 62


OceanStor
Troubleshooting 4 Common Troubleshooting

Figure 4-13 Flash Player not supported by the browser

Possible Causes
If the customer's maintenance terminal cannot access Adobe's official website or
Chrome 55's built-in Flash Player is not updated, Chrome 55 checks whether the
current Flash Player version has security vulnerabilities. If security vulnerabilities
exist, Chrome blocks the plug-in.

Procedure
Step 1 Download the Flash Player from Adobe's official website and install it.
Step 2 If the Chrome version is earlier than 57, perform the following operations:
1. Restart Chrome, type chrome://plugins in the address box, and press Enter.
The following page is displayed, as shown in Figure 4-14. If the Flash Player is
installed correctly, two Flash Player plug-ins are displayed. The one that ends
with pepflashplayer.dll in Location is Chrome's built-in Flash Player, while
the other is the newly installed one.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 63


OceanStor
Troubleshooting 4 Common Troubleshooting

Figure 4-14 Flash Player plug-in

2. Disable Chrome's built-in Flash Player (clicking Disable) and enable the newly
installed one, as shown in Figure 4-15.

Figure 4-15 Enabling Flash Player

3. Press F5 to refresh DeviceManager. Go to its home page.


4. Check whether the system plays the alarm sound correctly.
– If yes, go to Step 2.5.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 64


OceanStor
Troubleshooting 4 Common Troubleshooting

– If no, keep the fault environment intact and contact Huawei technical
support.
5. Check whether the system plays Quick Start correctly.
– If yes, no further action is required.
– If no, keep the fault environment intact and contact Huawei technical
support.

Step 3 If the Chrome version ranges from 57 to 71, perform the following operations:
1. Restart Chrome, type chrome://settings/content/flash in the address box,
and press Enter. Enable Ask first (recommended) and click ADD. Add the IP
address of the storage system's management port to the Allow list. Then save
the settings and restart Chrome, as shown in Figure 4-16.

Figure 4-16 Enabling Flash Player and configuring the Allow list

2. In Chrome's address box, enter the IP address of the storage system's


management port and click Update plugin or Run this time.
3. Check whether the system plays the alarm sound correctly.
– If yes, go to Step 3.4.
– If no, keep the fault environment intact and contact Huawei technical
support.
4. Check whether the system plays Quick Start correctly.
– If yes, no further action is required.
– If no, keep the fault environment intact and contact Huawei technical
support.

Step 4 If the Chrome version is later than 71, visit the Chrome official website to obtain
the operation method.

----End

4.1.21 Periodically Updated Data Is Abnormal on


DeviceManager
After you log in to DeviceManager and do not perform any operation for a certain
period of time, the data that should be updated periodically is abnormal.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 65


OceanStor
Troubleshooting 4 Common Troubleshooting

Symptom
After you log in to DeviceManager and do not perform any operation for a certain
period of time, the data that should be updated periodically is abnormal. For
example, the performance curve is displayed as a straight line in the performance
statistics area, or the alarm status in the current alarm area remains unchanged.

Possible Causes
The browser fails to communicate with the storage device for a certain period of
time due to a network fault. In addition, the session has not timed out during the
period. Querying data fails before the session is automatically logged out.

Procedure
Step 1 Check whether the communication between the maintenance terminal where
DeviceManager is located and the storage device is normal.
● If yes, go to Step 2.
● If no, go to Step 3.
Step 2 Check whether the network status was normal when the data was abnormal.
● If yes, keep the fault environment intact and contact technical support
engineers.
● If no, go to Step 3.
Step 3 Contact the network administrator to restore the network communication
between the maintenance terminal where DeviceManager is located and the
storage device, and re-log in to DeviceManager to check whether the fault is
rectified.
● If rectified, no further action is required.
● If not rectified, keep the fault environment intact and contact technical
support engineers.

----End

4.1.22 Failed to Deselect Items When Display Items Are


Customized in SystemReporter
Symptom
You run SystemReporter using Internet Explorer and customize items to be
displayed. After you select items and add them to the selected list, those items
cannot be deselected.

Possible Causes
The active scripting is disabled in the browser.

Procedure
Step 1 Open Internet Explorer.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 66


OceanStor
Troubleshooting 4 Common Troubleshooting

Step 2 Choose Tools > Internet Options.

The Internet Options window is displayed.

Step 3 Click Security.


1. Select Internet and click Custom level.
The Security Settings – Internet Zone dialog box is displayed.
2. Set Active scripting to Enable.
3. Click OK.
The Internet Options window is displayed.

Step 4 Click Apply.

Step 5 Restart the Internet Explorer browser and check whether the fault is rectified.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

4.2 Troubleshooting Basic Storage Service Faults


This section explains how to troubleshoot basic storage service faults.

4.2.1 Login Failure Through a Serial Port


A login attempt through a serial port fails. As a result, the storage system cannot
be accessed through the serial port.

Symptom
After a maintenance terminal is connected to the serial port on a storage device
with a serial cable, the maintenance terminal cannot receive messages from the
serial port, the serial port outputs bit errors, or the login prompt is not displayed.

Alarm Information
None

Possible Causes
Possible causes for a login failure through a serial port:

● The serial port is being used.


● The serial cable is connected improperly.
● The serial port connection parameters are incorrectly configured.
● The serial port is disabled for remote connection.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 67


OceanStor
Troubleshooting 4 Common Troubleshooting

Fault Diagnosis

Figure 4-17 Troubleshooting flowchart for a login failure through a serial port

Login failure through a serial port

Stop the process that


Yes
Is the serial port being used? is using the serial
port.

No

Is the serial Remove and


Yes
cable connected reinsert, or replace
improperly? the serial cable.

No

Modify the serial port


Are the serial port
Yes connection
connection parameters
incorrectly configured? parameters to the
correct values.

No

Is the serial port Enable the serial port


Yes
disabled for remote desktop for remote desktop
connection? connection.

No

Unknown faults exist. Contact


technical support engineers.

End

Procedure
● Cause 1: The serial port is being used.
a. After running the serial port login tool, check whether a message is
displayed indicating that the serial port cannot be opened.

▪ If yes, go to •Cause 2: The serial cab....

▪ If no, go to b.
b. Stop the program or process that is using the serial port.
c. On the maintenance terminal, reattempt the login using the serial port.
Check whether the maintenance terminal receives messages outputted by
the serial port.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 68


OceanStor
Troubleshooting 4 Common Troubleshooting

▪ If yes, no further action is required.

▪ If no, go to •Cause 2: The serial cab....


● Cause 2: The serial cable is connected improperly.
a. Check whether the serial port has output. Press Enter and check whether
the corresponding output is displayed.

▪ If yes, go to •Cause 3: The serial por....

▪ If no, go to b.
b. Remove and reinsert, or replace the serial cable.
c. On the maintenance terminal, retry the login through the serial port.
Check whether the maintenance terminal receives messages from the
serial port.

▪ If yes, no further action is required.

▪ If no, go to •Cause 3: The serial por....


● Cause 3: The serial port connection parameters are incorrectly configured.
a. On the maintenance terminal, quit the running serial port management
program.
b. Restart the serial port management program and check the settings of
bits per second, data bits, parity, and stop bits.
NOTE

Correct serial port management program settings are as follows:


● Bits per second: 115200
● Data bits: 8
● Parity: none
● Stop bits: 1
c. On the maintenance terminal, retry the login through the serial port.
Check whether the maintenance terminal receives messages from the
serial port.

▪ If yes, no further action is required.

▪ If no, go to d.
d. Check by trying the login from a remote desktop.

▪ If yes, go to •Cause 4: The serial por....

▪ If no, keep the fault environment intact and contact technical


support engineers.
● Cause 4: The serial port is disabled for remote connection.
a. On the maintenance terminal, check whether the serial port is disabled
for remote connection.
You can check whether the serial port is disabled in Computer
Management > Device Manager > Ports.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 69


OceanStor
Troubleshooting 4 Common Troubleshooting

NOTE

The state of the disabled serial port is .

▪ If yes, go to b.

▪ If no, keep the fault environment intact and contact technical


support engineers.
b. Reconnect to the remote desktop after enable the serial port.
c. On the maintenance terminal, retry the login through the serial port.
Check whether the maintenance terminal receives messages from the
serial port.

▪ If yes, no further action is required.

▪ If no, keep the fault environment intact and contact technical


support engineers.

Suggestion and Summary


None

4.2.2 Failure to Add an iSCSI Link for a Remote Device


After the iSCSI initiator is renamed, an iSCSI link fails to be added for the remote
device.

Symptom
● When an iSCSI link is added for the remote device immediately after the iSCSI
initiator is renamed, a message is displayed stating The communication is
abnormal or the system is busy. Please try again later.

● Log in to the DeviceManager and choose Data Protection > Remote


Device. It is found that the new iSCSI link is not displayed in iSCSI Links.

Possible Causes
An iSCSI link is added for the remote device within 30 seconds after the iSCSI
initiator is renamed.

Procedure
Step 1 Check whether the iSCSI initiator is renamed in 30 seconds.
● If yes, go to Step 2.
● If no, go to Step 4.
Step 2 Check whether an iSCSI link has already been added for the remote device.
● If yes, go to Step 3.
● If no, go to Step 4.
Step 3 Delete the existing iSCSI link.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 70


OceanStor
Troubleshooting 4 Common Troubleshooting

Step 4 Add an iSCSI link after 30 seconds.


Step 5 Check whether the iSCSI link is successfully added.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

4.2.3 Failure to Discover LUNs by an Application Server


An application server fails to discover LUNs, causing it unable to use storage
resources.

Symptom
LUNs have been mapped to the application server but cannot be discovered on
the application server.

Alarm Information
None

Possible Causes
Possible causes for a failure to discover LUNs by an application server:
● The storage pool is faulty.
● The link is abnormal.
● The node file on the application server is lost (for Linux or UNIX).
● The dynamic detection mechanism of the application server is not triggered
(for Mac OS X).
● A LUN whose host LUN ID is 0 is not mapped to the application server (for
HP-UX).
● The automatic LUN scan function is disabled on the application server (for
Solaris 9).

Impact
An application server fails to discover LUNs, causing it unable to use storage
resources.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 71


OceanStor
Troubleshooting 4 Common Troubleshooting

Fault Diagnosis

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 72


OceanStor
Troubleshooting 4 Common Troubleshooting

Figure 4-18 Troubleshooting flowchart for a failure to discover LUNs by an


application server
Failure to discover LUNs by an
application server

Handle the fault according


Yes to the suggestion to the
Is the storage pool faulty?
alarm "Storage Pool Is
Faulty".
No

Handle the fault by


Yes referring to "Fibre Channel
Is the link abnormal?
Link Failure" or "iSCSI Link
Failure".
No

Is the node file


Yes Run the mknod command
on the application server lost (for
to create a node.
Linux or UNIX)?

No

Is not the dynamic The operating system of


detection mechanism of the Yes the application server has a
application server triggered (for fault. Restart the
Mac OS X)? application server.

No

Does not the LUN Map a LUN whose Host


Yes
whose Host LUN ID is 0 exist (for LUN ID is 0 to the
HP-UX)? application server.

No

Is the automatic
Restart LUN scan on the
LUN scan function disabled on Yes
port on the application
the application server
server.
(for Solaris 9)?

No

Unknown faults exist. Contact technical


support engineers.

End

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 73


OceanStor
Troubleshooting 4 Common Troubleshooting

Procedure
● Cause 1: The storage pool is faulty.
a. Check whether there is the alarm Storage Pool Is Faulty on the storage
system.

▪ If yes, go to b.

▪ If no, go to •Cause 2: The link is ab....


b. Handle the fault according to the suggestion.
c. Scan for LUNs again on the application server. Check whether the fault is
rectified.

▪ If yes, the procedure is complete.

▪ If no, go to •Cause 2: The link is ab....


● Cause 2: The link is abnormal.

a. On the right navigation bar, click Provisioning.


b. On the Storage Configuration and Optimization page, click Port, and
check whether Running Status of the Fibre Channel port or iSCSI front-
end port connected to the application server is Link down.

▪ If yes, go to c.

▪ If no, go to •Cause 3: The node file ....


c. Check whether the network is a Fibre Channel or iSCSI network.

▪ If it is a Fibre Channel network, go to d.

▪ If it is an iSCSI network, go to e.
d. Troubleshoot the Fibre Channel link failure by referring to 5.3.1 Fibre
Channel Link Failure. Go to f.
e. Troubleshoot the iSCSI link failure by referring to 5.3.2 iSCSI Link Failure.
Go to f.
f. Scan for LUNs again on the application server. Check whether the fault is
rectified.

▪ If yes, the procedure is complete.

▪ If no, go to •Cause 3: The node file ....


● Cause 3: The node file on the application server is lost (for Linux or UNIX).
a. Check whether the application server runs the Linux or UNIX operating
system.

▪ If yes, go to b.

▪ If no, keep the fault environment intact and contact technical


support engineers.
b. In the dev directory on the application server, check whether a node file
exists, for example, dev/sdb.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 74


OceanStor
Troubleshooting 4 Common Troubleshooting

▪ If yes, go to •Cause 4: The dynamic de....

▪ If no, go to c.
c. In Terminal on the application server, run the mknod command to create
a node.
NOTE

The format of the mknod command is mknod Name {b | c} Major Minor, where
Name indicates the name of the device, b | c indicates whether the device is a
block device or a character device, Major indicates the ID of the major device,
and Minor indicates the ID of the minor device.
d. Scan for LUNs again on the application server. Check whether the fault is
rectified.

▪ If yes, the procedure is complete.

▪ If no, go to •Cause 4: The dynamic de....

● Cause 4: The dynamic detection mechanism of the application server is not


triggered (for Mac OS X).
a. Check whether the application server runs the Mac OS X operating
system.

▪ If yes, go to b.

▪ If no, go to •Cause 5: A LUN whose Ho....


b. Restart the application server.
NOTE

The dynamic detection mechanism is not triggered when the Mac OS X based
application server has no LUN mapping. You need to restart the application
server to trigger the mechanism.
c. Scan for LUNs again on the application server. Check whether the fault is
rectified.

▪ If yes, the procedure is complete.

▪ If no, keep the fault environment intact and contact technical


support engineers.
● Cause 5: A LUN whose Host LUN ID is 0 does not exist (for HP-UX).
a. Check whether the application server runs the HP-UX operating system.

▪ If yes, go to b.

▪ If no, go to •Cause 6: The automatic ....


b. On the DeviceManager, view the LUNs mapped to the application server,
and check whether a LUN whose Host LUN ID is 0 exists.

▪ If yes, keep the fault environment intact and contact technical


support engineers.

▪ If no, go to c.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 75


OceanStor
Troubleshooting 4 Common Troubleshooting

c. Map a LUN whose Host LUN ID is 0 to the application server.


d. Scan for LUNs again on the application server. Check whether the fault is
rectified.

▪ If yes, the procedure is complete.

▪ If no, keep the fault environment intact and contact technical


support engineers.
● Cause 6: The automatic LUN scan function is disabled on the application
server (for Solaris 9).
a. Check whether the application server runs the Solaris 9 operating system.

▪ If yes, go to b.

▪ If no, keep the fault environment intact and contact technical


support engineers.
b. Check whether SAN Foundation Software is installed on the application
server.

▪ If yes, go to c.

▪ If no, keep the fault environment intact and contact technical


support engineers.
c. Run the cfgadm al command to query the World Wide Port Name
(WWPN) of the port.
d. Run the cfgadm al -o show_FCP_dev c2::WWPN command to restart LUN
scan on the port. In this command, show_FCP_dev indicates LUN
information, c2 indicates the port number which is queried in c, and
WWPN indicates the WWPN of the port which is queried in c.
Check whether the fault is rectified.

▪ If yes, the procedure is complete.

▪ If no, keep the fault environment intact and contact technical


support engineers.

Suggestion and Summary


None

4.2.4 LUN Deletion Timeout


Symptom
After the LUN mapping is deleted from a LUN group, delete the LUN. In the dialog
box that is displayed, State is Timeout.

Possible Causes
If the LUN mapping is deleted when the host is sending a write I/O request to the
disk array, the write I/O cannot reach the disk array. As a result, the disk array
resource allocated to the I/O cannot be released, causing LUN deletion timeout.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 76


OceanStor
Troubleshooting 4 Common Troubleshooting

Procedure
Step 1 Wait about 10 minutes and click Refresh on the LUN page of DeviceManager to
check whether the LUN is successfully deleted.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

4.2.5 Failure to Connect the Storage System to an AIX-Based


Application Server for the First Time
Symptom
When connecting the storage system to an AIX-based application server for the
first time, set the port rate of the Fibre Channel front-end port to be consistent
with that of the Fibre Channel HBA. However, on the DeviceManager, Running
Status of the Fibre Channel front-end port remains Link down and the initiator
for the application server cannot be detected on the DeviceManager.

Alarm Information
None

Possible Causes
1. The health status of the storage system and the application server is normal.
2. The link between the storage system and the application server is normal.
Conclusion: The special mechanism of AIX causes the failure to connect the
storage system to the AIX-based application server for the first time.

Procedure
Step 1 Run the lsdev -Cc adapter|grep fcs command on the application server to view
the HBA information.
-bash-3.00# lsdev -Cc adapter | grep fcs
fcs0 Aailable 05-00 4Gb FC PCI Express Adapter (df1000fe)
fcs1 Aailable 05-00 4Gb FC PCI Express Adapter (df1000fe)
fcs2 Aailable 06-00 4Gb FC PCI Express Adapter (df1000fe)
fcs3 Aailable 06-00 4Gb FC PCI Express Adapter (df1000fe)
-bash-3.00#

Step 2 Run the lscfg -vpl fcsx command (where x indicates the HBA ID) to view the
World Wide Port Names (WWPNs) of the HBAs (in bold type).
-bash-3.00# lscfg -vpl fcs1
fcs1 P2-I2/Q1 FC Adapter

Part Number...............LP9802-F2
Serial Number.............BG50l99256
Network Address...........l00000000C9447DA7
ROS Level and ID..........02E0l99l
Device Specific.(Z0)......2003806D
Device Specific.(Z1)......00000000
Device Specific.(Z2)......00000000

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 77


OceanStor
Troubleshooting 4 Common Troubleshooting

Device Specific.(Z3)......03000909
Device Specific.(Z4)......FF60l4l6
Device Specific.(Z5)......02E01991
Device Specific.(Z6)......06631991
Device Specific.(Z7)......07631991
Device Specific.(Z8)......20000000C9447DA7
Device Specific.(Z9)......HS1.92A1
Device Specific.(ZA)......H1D1.92A1
Device Specific.(ZB)......H2D1.92A1
Device Specific.(YL)......P2-I2/Q1

PLATFORM SPECIFIC

Name: fibre-channel
Node: fibre-channel@c
Physical Location: P2-I2/Q1
-bash-3.00#

Step 3 Check whether the HBA is connected to the storage system based on the WWPN.

If the WWPN is consistent with that on the HBA label, the HBA is connected to the
storage system.

Step 4 Run the rmdev -dl fcsx -R command (where x indicates 0 or 1) to delete the HBA
connected to the storage system.

Run the lsdev -Cc adapter|grep fcs command to view the HBA information and
confirm that the HBA is successfully deleted.

Step 5 Run the cfgmgr command to scan for an HBA again.

Step 6 Run the lsdev -Cc adapter|grep fcs to view that the HBA connected to the
storage system is displayed again.

After the HBA is detected, you can view and add the initiator for the application
server on the DeviceManager.

----End

Suggestion and Summary


None

Related Information
None

4.2.6 The Storage System Does Not Detect the Initiators


Provided by an HP-UX Server

Symptom
After an HP-UX server is connected to the storage system using Fibre Channel
links, its initiators cannot be detected by the storage system.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 78


OceanStor
Troubleshooting 4 Common Troubleshooting

Possible Causes
If no LUNs are mapped to an HP-UX server, the HP-UX server does not connect to
the storage system.

Procedure
Step 1 Log in to the storage system and create initiators for the HP-UX server.
Step 2 Remap LUNs to the HP-UX server.
Step 3 Remove and reinsert the links and check whether the fault is rectified.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact Huawei technical
support.

----End

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 79


OceanStor
Troubleshooting 5 Emergency Handling

5 Emergency Handling

When a burst storage event that remarkably affects services occurs, start the
emergency plan, including the emergency process and emergency measures, to
minimize the impact.
5.1 Emergency Handling Of Hardware Module Faults (Applicable to V500R007)
5.2 Emergency Handling Of Multipathing Software Faults
5.3 Emergency Handling Of Basic Storage Service Faults
5.4 Emergency Handling Of Value-added Service
5.5 Emergency Handling Of Other Faults

5.1 Emergency Handling Of Hardware Module Faults


(Applicable to V500R007)
Typically, the indicators on hardware modules become abnormal when faults
occur in the hardware modules. This section explains how to troubleshoot
common hardware module faults, including the faults in the controller, interface
module, disk, fan module, BBU, power module, and expansion module.

5.1.1 Controller Failure

Symptom

Log in to the DeviceManager and click System.

● For a 2 U controller enclosure, click on the main page to switch to the


rear view of the storage device. On the rear view of the storage device, click
the controller in the red square. Health Status of the controller is Faulty.
● For a 3 U/6 U controller enclosure, on the front view of the storage device,
click the controller in the red square. Health Status of the controller is
Faulty.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 80


OceanStor
Troubleshooting 5 Emergency Handling

The controller Alarm indicator on the storage device is steady red. Figure 5-1 and
Figure 5-2 show the location of the controller Alarm indicator.

Figure 5-1 Location of the controller Alarm indicator in a 2 U controller enclosure

Figure 5-2 Location of the controller Alarm indicator in a 3 U or 6 U controller


enclosure

Alarm Information
On the Alarms and Events page, click the Current Alarms tab. Alarms related to
the controller are displayed.

Possible Causes
The controller is faulty.

Impact
Controller faults may deteriorate system performance and reliability.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 81


OceanStor
Troubleshooting 5 Emergency Handling

Fault Diagnosis

Figure 5-3 Troubleshooting flowchart for a controller failure

Start

Rectify the fault based on


Yes
Is the controller the suggestion in the
faulty? Suggestion area.
No

Unknown faults exist.


Contact technical support
engineers.

End

Procedure
Step 1 Check whether the controller Alarm indicator on the storage device is steady red.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.

Step 2 Rectify the fault by following the handling suggestions on the DeviceManager
alarm details page.
1. Log in to DeviceManager.
2. Choose Insight > Alarms and Events.
3. Click a fault to view details, and rectify the fault based on the suggestion in
the Suggestion area.

Step 3 Check whether the controller Alarm indicator is steady green and Health Status
of the controller on the DeviceManager is Normal.
● If yes, the procedure is complete.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 82


OceanStor
Troubleshooting 5 Emergency Handling

● If no, keep the fault environment intact and contact technical support
engineers.

----End

Suggestion and Summary


Failure of one controller causes the storage system performance to deteriorate
and increases the risk of data loss. In this case, stop the operations of reading or
writing a large amount of data and recover the storage system as soon as
possible.

5.1.2 Disk Failure


Symptom

Log in to the DeviceManager and click System. On the front view of a 2 U


controller or disk enclosure, click the disk in the red square. Health Status of the
disk is Faulty.
The disk Alarm/Location indicator on the storage device is steady red. Figure 5-4
and Figure 5-5 show the location of the disk Alarm/Location indicator.

Figure 5-4 Location of the disk Alarm/Location indicator (2.5 inches)

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 83


OceanStor
Troubleshooting 5 Emergency Handling

Figure 5-5 Location of the disk Alarm/Location indicator (3.5 inches)

Alarm Information
On the Alarms and Events page of the DeviceManager, click the Current Alarms
tab. Alarms related to the disk are displayed.

Possible Causes
The disk is faulty.

Impact
A disk failure causes the disk domain to which the disk belongs to be degraded or
fail. If the disk domain is degraded, the system read/write performance
deteriorates and data loss may occur. If the disk domain fails, data loss occurs and
services are interrupted.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 84


OceanStor
Troubleshooting 5 Emergency Handling

Fault Diagnosis

Figure 5-6 Troubleshooting flowchart for a disk failure

Start
Start

Rectify the fault based on


Yes
Is the disk faulty? the suggestion in the
Suggestion area.
No

Unknown faults exist.


Contact technical support
engineers.

End

Procedure
Step 1 Check whether the disk Alarm/Location indicator on the storage device is steady
red.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.

Step 2 Rectify the fault by following the handling suggestions on the DeviceManager
alarm details page.
1. Log in to DeviceManager.
Choose Insight > Alarms and Events.
Click a fault to view details, and rectify the fault based on the suggestion in
the Suggestion area.

Step 3 Check whether the disk Alarm/Location indicator is steady green and Health
Status of the disk on the DeviceManager is Normal.
● If yes, the procedure is complete.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 85


OceanStor
Troubleshooting 5 Emergency Handling

● If no, keep the fault environment intact and contact technical support
engineers.

----End

Suggestion and Summary


None

5.1.3 Interface Module Failure


Symptom

Log in to the DeviceManager and click System. On the system page, click

to display the rear view. Click the interface module in the red square.
Health Status of the interface module is Faulty.
The interface module Power indicator on the storage device is steady red. Figure
5-7 shows the location of the interface module Power indicator.

Figure 5-7 Location of the interface module Power indicator

Alarm Information
On the Alarms and Events page, click the Current Alarms tab. Alarms related to
the interface module are displayed.

Possible Causes
The interface module is faulty.

Impact
If an interface module malfunctions, it cannot process services and services will
work in single-link mode, resulting in service interruption risks.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 86


OceanStor
Troubleshooting 5 Emergency Handling

Fault Diagnosis

Figure 5-8 Troubleshooting flowchart for an interface module failure

Start

Is the Rectify the fault based on


Yes
interface module the suggestion in the
faulty? Suggestion area.
No

Unknown faults exist.


Contact technical support
engineers.

End

Procedure
Step 1 Check whether the power indicator of the interface module is steady yellow on
the storage device.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.

Step 2 Rectify the fault by following the handling suggestions on the DeviceManager
alarm details page.
1. Log in to DeviceManager.
2. Choose Insight > Alarms and Events.
3. Click a fault to view details, and rectify the fault based on the suggestion in
the Suggestion area.

Step 3 Check whether the interface module Power indicator is steady green and Health
Status of the interface module on the DeviceManager is Normal.
● If yes, the procedure is complete.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 87


OceanStor
Troubleshooting 5 Emergency Handling

● If no, keep the fault environment intact and contact technical support
engineers.

----End

Suggestion and Summary


None

5.1.4 Expansion Module Failure

Symptom

Log in to the DeviceManager and click System. Select a disk enclosure. On the

system page, click to display the rear view. On the rear view of the storage
device, click the expansion module in the red square. Health Status of the
expansion module is Faulty.

The expansion module Alarm indicator on the storage device is steady red. Figure
5-9 and Figure 5-10 show the location of the expansion module Alarm indicator.

Figure 5-9 Location of the expansion module Alarm indicator in a 2 U or 4 U disk


enclosure

Figure 5-10 Location of the expansion module Alarm indicator in a high-density


disk enclosure

Alarm Information
On the Alarms and Events page, click the Current Alarms tab. Alarms related to
the expansion module are displayed.

Possible Causes
The expansion module is faulty.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 88


OceanStor
Troubleshooting 5 Emergency Handling

Fault Diagnosis

Figure 5-11 Troubleshooting flowchart for an expansion module failure

Start

Yes Rectify the fault based on


Is the expansion
the suggestion in the
module faulty?
Suggestion area.
No

Unknown faults exist.


Contact technical support
engineers.

End

Procedure
Step 1 Check whether the expansion module Alarm indicator on the storage device is
steady red.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.

Step 2 Rectify the fault by following the handling suggestions on the DeviceManager
alarm details page.
1. Log in to DeviceManager.
2. Choose Insight > Alarms and Events.
3. Click a fault to view details, and rectify the fault based on the suggestion in
the Suggestion area.

Step 3 Check whether the expansion module Alarm indicator is steady green and Health
Status of the expansion module on the DeviceManager is Normal.
● If yes, the procedure is complete.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 89


OceanStor
Troubleshooting 5 Emergency Handling

● If no, keep the fault environment intact and contact technical support
engineers.

----End

Suggestion and Summary


None

5.1.5 Fan Module Failure


Symptom

Log in to the DeviceManager and click System.


● Select a 3 U/6 U controller enclosure. On the front view of the controller
enclosure, click the fan module in the red square. Health Status of the fan
module is Faulty.

● Select a 4 U disk enclosure. On the main page, click to switch to the


rear view of the 4 U disk enclosure. On the rear view, click the fan module in
the red square. Health Status of the fan module is Faulty.
The fan module Running/Alarm indicator on the storage device is steady red.
Figure 5-12 and Figure 5-13 show the location of the fan module Running/Alarm
indicator.

Figure 5-12 Location of the fan module Running/Alarm indicator of a 3 U/6 U


controller enclosure

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 90


OceanStor
Troubleshooting 5 Emergency Handling

Figure 5-13 Location of the fan module Running/Alarm indicator of a 4 U disk


enclosure

Alarm Information
On the Alarms and Events page, click the Current Alarms tab. Alarms related to
the fan module are displayed.

Possible Causes
The fan module is faulty.

Impact
If a fan module is faulty, the temperature of the controller enclosure or disk
enclosure may increase. If the storage system works at a high temperature for a
long time, the service life of the storage system may be impaired.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 91


OceanStor
Troubleshooting 5 Emergency Handling

Fault Diagnosis

Figure 5-14 Troubleshooting flowchart for a fan module failure

Start

Yes
Is the fan Replace the faulty fan
module faulty? module.

No

Unknown faults exist.


Contact technical support
engineers.

End

Procedure
Step 1 Check whether the fan module Running/Alarm indicator on the storage device is
steady red.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.

Step 2 Check whether any objects affect the rotation of the fans.
● If yes, go to Step 3.
● If no, go to Step 4.

Step 3 Remove the objects to ensure smooth fan rotation.

After the objects are removed, check whether the fan module Running/Alarm
indicator is steady green and Health Status of the fan module on the
DeviceManager is Normal.

● If yes, the procedure is complete.


● If no, go to Step 4.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 92


OceanStor
Troubleshooting 5 Emergency Handling

Step 4 Replace the faulty fan module.


For details about how to replace a faulty fan module, see the Parts Replacement
of the corresponding product model.
Step 5 Check whether the fan module Running/Alarm indicator is steady green and
Health Status of the fan module on the DeviceManager is Normal.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

Suggestion and Summary


None

5.1.6 BBU Failure


Symptom

Log in to the DeviceManager and click System. On the front view of the
storage device, click the BBU in the red square. Health Status of the BBU is
Faulty.
The BBU Running/Alarm indicator on the storage device is steady red. Figure 5-15
shows the location of the BBU Running/Alarm indicator.

Figure 5-15 Location of the BBU Running/Alarm indicator in a 3 U/6 U controller


enclosure

Alarm Information
On the Alarms and Events page, click the Current Alarms tab. Alarms related to
the BBU are displayed.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 93


OceanStor
Troubleshooting 5 Emergency Handling

Possible Causes
The BBU is faulty.

Impact
A BBU fault may reduce the reliability of the storage system.

Fault Diagnosis

Figure 5-16 Troubleshooting flowchart for a BBU failure

Start

Rectify the fault based on


Yes
Is the BBU faulty? the suggestion in the
Suggestion area.
No

Unknown faults exist.


Contact technical support
engineers.

End

Procedure
Step 1 Check whether the BBU Running/Alarm indicator on the storage device is steady
red.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.

Step 2 Rectify the fault by following the handling suggestions on the DeviceManager
alarm details page.
1. Log in to DeviceManager.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 94


OceanStor
Troubleshooting 5 Emergency Handling

2. Choose Insight > Alarms and Events.


3. Click a fault to view details, and rectify the fault based on the suggestion in
the Suggestion area.
Step 3 Check whether the BBU Running/Alarm indicator is steady green and Health
Status of the BBU on the DeviceManager is Normal.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

Suggestion and Summary


None

5.1.7 Power Module Failure


Symptom

Log in to the DeviceManager and click System. On the system page, click

to display the rear view. On the rear view of the storage device, click the
power module in the red square. Health Status of the power module is Faulty.
The power Running/Alarm indicator is steady red. Figure 5-17, Figure 5-18 and
Figure 5-19 show the location of the power Running/Alarm indicator.

Figure 5-17 Location of the power Running/Alarm indicator in a 2 U controller


enclosure

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 95


OceanStor
Troubleshooting 5 Emergency Handling

Figure 5-18 Location of the power Running/Alarm indicator in a 3 U/ 6 U


controller enclosure

Figure 5-19 Location of the power Running/Alarm indicator in a disk enclosure

Alarm Information
On the Alarms and Events page, click the Current Alarms tab. Alarms related to
the power module are displayed.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 96


OceanStor
Troubleshooting 5 Emergency Handling

Possible Causes
The power module is faulty.

Impact
● For a 2 U controller enclosure, if a power module is faulty and no redundant
power module is available for the controller, the system reliability decreases.
● For a 4 U controller enclosure, a power module fault does not affect system
reliability.

Fault Diagnosis

Figure 5-20 Troubleshooting flowchart for a power module failure

Start

Is the Rectify the fault based on


Yes
power module the suggestion in the
faulty? Suggestion area.
No

Unknown faults exist.


Contact technical support
engineers.

End

Procedure
Step 1 Check whether the power Running/Alarm indicator on the storage device is steady
red.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
Step 2 Rectify the fault by following the handling suggestions on the DeviceManager
alarm details page.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 97


OceanStor
Troubleshooting 5 Emergency Handling

1. Log in to DeviceManager.
2. Choose Insight > Alarms and Events.
3. Click a fault to view details, and rectify the fault based on the suggestion in
the Suggestion area.
Step 3 Check whether the power Running/Alarm indicator is steady green and Health
Status of the power module on the DeviceManager is Normal.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

Suggestion and Summary


None

5.2 Emergency Handling Of Multipathing Software


Faults
This chapter explains how to troubleshoot common multipathing software faults,
for example, multipathing software installation or loading faults.

5.2.1 Failure to Load Multipathing Software on an Application


Server
After a restart, an application server fails to load multipathing software, causing
the storage performance to deteriorate.

Symptom
After the multipathing software is installed on an application server that runs the
Linux or UNIX operating system and the application server is restarted, the
application server fails to load the multipathing software.

Alarm Information
None

Possible Causes
Multiple operating systems are installed on the application server and the
menu.lst file of the last installed operating system does not have the
multipathing startup option.

Impact
If an application server fails to load multipathing software, the storage system
runs with single links up, system performance deteriorates and service operation
risks increase.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 98


OceanStor
Troubleshooting 5 Emergency Handling

Fault Diagnosis

Figure 5-21 Troubleshooting flowchart for a failure to load multipathing software


on an application server
Failure to load multipathing
software on an application server

Does the
menu.lst file have Yes
Modify the menu.lst file.
the UltraPath startup
option?

No

Unknown faults exist. Contact


technical support engineers.

End

Procedure
Step 1 In the CLI, run the vi /boot/grub/menu.lst command to open the configuration file
of the operating system.

The configuration file information is as follows:


title Linux with UltraPath
root (hd0,0)
kernel /vmlinuz-2.6.16.60-0.21-smp root=/dev/system/root vga=0x314 resume=/dev/system/swap
splash=silent showopts
initrd /mpp-2.6.16.60-0.21-smp.img

Step 2 Run the fdisk -l command.

The application server partition information is as follows:


linux:~ # fdisk -l
Disk /dev/sda: 73.4 GB, 73407868928 bytes
255 heads, 63 sectors/track, 8924 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 83 Linux
/dev/sda2 14 8924 71577607+ 8e Linux LVM

NOTE

* indicates the partition in which the running operating system is located.

Step 3 Run the mount /dev/partition /filepath command to mount each partition.

The command is used to mount /dev/partition onto the /filepath directory.

Step 4 Run the ls -l /boot/grub/ command to check whether each mounted partition has
the configuration file menu.lst.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 99


OceanStor
Troubleshooting 5 Emergency Handling

If a mounted partition does not have the configuration file, check the next
partition. If a mounted partition has the configuration file, run the vi menu.lst
command to open it. The configuration file content is as follows:
# Modified by YaST2. Last modification on ?.12?.19 17:53:26 UTC 2009
default 0
timeout 8
gfxmenu (hd0,2)/boot/message
##YaST - activate

root (hd0,2)
kernel /boot/vmlinuz-2.6.16.46-0.12-default root=/dev/sda3 vga=0x317 resume=/dev/sda1 splash=silent
showopts
initrd /boot/initrd-2.6.16.46-0.12-default

###Don't change this comment - YaST2 identifier: Original name: SUSE Linux Enterprise Server 10 SP1
(/dev/sda2)###
title SUSE Linux Enterprise Server 10 SP1 (/dev/sda2)
kernel (hd0,1)/boot/vmlinuz-2.6.16.46-0.12-default root=/dev/sda2 vga=0x317 resume=/dev/sda1
splash=silent showopts
initrd (hd0,1)/boot/initrd-2.6.16.46-0.12-default

###Don't change this comment - YaST2 identifier: Original name: floppy###


title Floppy
rootnoverify (hd0,0)
chainloader (fd0)+1

###Don't change this comment - YaST2 identifier: Original name: failsafe###


title Failsafe -- SUSE Linux Enterprise Server 10 SP1
root (hd0,2)
kernel /boot/vmlinuz-2.6.16.46-0.12-default root=/dev/sda3 vga=normal showopts ide=nodma apm=off
acpi=off noresume nosmp noapic maxcpus=0 edd=off 3
initrd /boot/initrd-2.6.16.46-0.12-default

Step 5 Input i to enter the editing mode. Copy the content of the configuration file in
Step 1 into the configuration file of the current mounted partition.
The modified configuration file content is as follows:
# Modified by YaST2. Last modification on ?.12?.19 17:53:26 UTC 2009
default 0
timeout 8
gfxmenu (hd0,2)/boot/message
title Linux with UltraPath
root (hd0,0)
kernel /vmlinuz-2.6.16.60-0.21-smp root=/dev/system/root vga=0x314 resume=/dev/system/swap
splash=silent showopts
initrd /mpp-2.6.16.60-0.21-smp.img
##YaST - activate

root (hd0,2)
kernel /boot/vmlinuz-2.6.16.46-0.12-default root=/dev/sda3 vga=0x317 resume=/dev/sda1 splash=silent
showopts
initrd /boot/initrd-2.6.16.46-0.12-default

###Don't change this comment - YaST2 identifier: Original name: SUSE Linux Enterprise Server 10 SP1
(/dev/sda2)###
title SUSE Linux Enterprise Server 10 SP1 (/dev/sda2)
kernel (hd0,1)/boot/vmlinuz-2.6.16.46-0.12-default root=/dev/sda2 vga=0x317 resume=/dev/sda1
splash=silent showopts
initrd (hd0,1)/boot/initrd-2.6.16.46-0.12-default

###Don't change this comment - YaST2 identifier: Original name: floppy###


title Floppy
rootnoverify (hd0,0)
chainloader (fd0)+1

###Don't change this comment - YaST2 identifier: Original name: failsafe###


title Failsafe -- SUSE Linux Enterprise Server 10 SP1

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 100


OceanStor
Troubleshooting 5 Emergency Handling

root (hd0,2)
kernel /boot/vmlinuz-2.6.16.46-0.12-default root=/dev/sda3 vga=normal showopts ide=nodma apm=off
acpi=off noresume nosmp noapic maxcpus=0 edd=off 3
initrd /boot/initrd-2.6.16.46-0.12-default

Step 6 Press Esc to exit the editing mode.

Step 7 Type :wq and press Enter to exit and save the configuration file menu.lst.

Step 8 Repeat steps Step 4 to Step 7 to modify the menu.lst files of all the operating
systems on the application server. Go to Step 9.

Step 9 Restart the application server and check whether the fault is rectified.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

Suggestion and Summary


When the operating system is started, the application server invokes the menu.lst
file of the last installed operating system. If the running operating system is not
the one last installed on the application server, the application server cannot load
the multipathing software even if the multipathing startup item is added in the
menu.lst file of the running operating system. Therefore, you must ensure that
the multipathing startup item is added in the menu.lst file of the last installed
operating system.

NOTE

The displayed content may vary with different operating systems. The actually displayed
content may be different.

5.2.2 Blue Screen of Death When Multipathing Software Is


Being Installed on a Windows-Based Application Server
A blue screen of death (BSOD) error occurs when multipathing software is being
installed on a Windows-based application server, causing an installation failure of
the multipathing software.

Symptom
When multipathing software is being installed on a Windows-based application
server (Windows Server 2003 or Windows Server 2008), an unexpected error
occurs, resulting in a blue screen.

Alarm Information
None

Possible Causes
The latest service pack (SP) is not installed in the Windows operating system.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 101


OceanStor
Troubleshooting 5 Emergency Handling

Impact
A BSOD error occurs when multipathing software is being installed in Windows, so
the multipathing software fails to be installed. If a storage system is running with
single links up, system performance deteriorates and service operation risks
increase.

Fault Diagnosis

Figure 5-22 Troubleshooting flowchart for a BSOD when multipathing software is


being installed on a Windows-based application server
BSOD during multipathing software
installation on a Windows-based
application server

Download the latest


Is the latest SP
Yes Windows SP from the
installed on the Windows
Microsoft official website
operating system?
and install the SP.

No

Unknown faults exist. Contact


technical support engineers.

End

Procedure
Step 1 Restart the operating system. Go to Windows Advanced Options Menu and
select Last Known Good Configuration.

Step 2 Log in to the operating system. Download the latest Windows SP from the
Microsoft official website.

Step 3 Install the downloaded SP.

Step 4 Restart the application server. Check whether the fault is rectified.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

Suggestion and Summary


None

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 102


OceanStor
Troubleshooting 5 Emergency Handling

5.2.3 Controller Failure in a Non-UltraPath Environment


A fault on controller B interrupts services when UltraPath is not installed. In this
case, manually switch the application server connections from controller B to
controller A to ensure service continuity.

Symptom
● Indicators on controller A show that controller A is working properly but all
indicators on controller B are off.
● Read and write requests from the application servers connected to controller
B cannot be sent to the storage system, causing service interruption. On the
Performance monitoring page of the DeviceManager, the write or read I/O
traffic on the front-end port of controller B becomes 0.

Possible Causes
The controller is faulty.

Impact
If a controller is faulty and host services are interrupted when UltraPath is not
installed, you can manually switch the host services to another functional
controller.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 103


OceanStor
Troubleshooting 5 Emergency Handling

Fault Diagnosis

Figure 5-23 Flowchart for handling service interruption caused by a fault in


controller B in a non-UltraPath environment

Troubleshooting

Collect storage system


logs.

Switch services from


controller B to controller A.

Power off controller B and


restart it.

Whether the
indicator of No
controller B is steady
green?

Yes

Switch services from


controller A to controller B.

Keep the fault


environment intact
Are service reads No
and contact
and writes normal?
technical support
engineers.
Yes

End

Procedure
Step 1 Switch services from controller B to controller A.
1. Remove the cable between controller B and the application server.
2. Connect the application server to controller A.
3. Reconfigure the initiator of the host where services are interrupted.

Step 2 Power off controller B and restart it.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 104


OceanStor
Troubleshooting 5 Emergency Handling

Check whether the indicator of controller B is steady green.


● If yes, go to Step 3.
● If no, keep the fault environment intact and contact technical support
engineers.
Step 3 Switch services from controller A to controller B.
Step 4 Observe the service processing.
NOTE

Enable the performance monitoring function before you check the service processing on the
DeviceManager.

1. Log in to the DeviceManager. On the navigation bar, click Monitor.


2. Click Performance monitoring.
The Performance monitoring page is displayed.
3. In the Statistical Object area, select LUN from the Object Type list. Then
select the LUN that uses controller B to provide services for the server.
4. From the Statistical items list, select Total IOPS (IO/s) in the lower area.
5. Click OK.
Observe the I/O performance monitoring data and check whether the
performance line is smooth.
– If yes, the service processing is normal and the procedure is complete.
– If no, keep the fault environment intact and contact technical support
engineers.

----End

Suggestion and Summary


To rectify the fault completely:
1. Install UltraPath on the application server.
2. Replace the faulty controller.
3. Upgrade the storage system.
4. Send the collected logs to technical support engineers for subsequent
handling.

5.2.4 UltraPath Software Unavailable Because Being Isolated


by Antivirus Software
Symptom
The UltraPath installed on the application server cannot be used because it is
isolated by the antivirus software.

Alarm Information
None

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 105


OceanStor
Troubleshooting 5 Emergency Handling

Possible Causes
The antivirus software detects UltraPath as a virus and isolates it by mistake.

Procedure
Step 1 On the management page of the antivirus software, add UltraPath to the trust list.
Step 2 Restart the antivirus software.

----End

Suggestion and Summary


Disable the antivirus software before you install UltraPath on an application
server. After UltraPath has been installed, enable the antivirus software and add
UltraPath to the trust list.

Related Information
None

5.2.5 Failure to Detect Virtual Disks on a Windows-Based


Application Server After the Multipathing Function of
Microsoft iSCSI Initiator Is Enabled
Symptom
A Windows-based application server (Windows Server 2003) fails to detect virtual
disks after it is restarted.

Alarm Information
None

Possible Causes
In the CLI of the application server, run the iscsicli ListPersistentTargets
command to view the information about the initiator.
Target Name : iqn.2006-08.com.:21000022a1002828:notconfig:192.168.252.1
Address and Socket : 192.168.252.1 3260
Session Type : Data
Initiator Name : Root\SCSIADAPTER\0000_0
Port Number : <Any Port>
Security Flags : 0x0
Version :0
Information Specified: 0x20
Login Flags : 0xa
Multipath Enabled

NOTE

Login Flags is 0xa, which corresponds to 00001010b in binary. The 1 at the second bit
from left to right indicates that the multipathing function of Microsoft iSCSI Initiator is
enabled at login.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 106


OceanStor
Troubleshooting 5 Emergency Handling

Conclusion: No virtual disks can be detected on a Windows-based application


server when the multipathing function of Microsoft iSCSI Initiator is enabled.

Procedure
● If the multipathing software provided by Microsoft iSCSI Initiator has not been
installed.
When installing Microsoft iSCSI Initiator, do not install the multipathing
software by deselecting the Microsoft MPIO Multipathing Support for iSCSI,
as shown in Figure 5-24.

Figure 5-24 Clearing the Microsoft MPIO Multipathing Support for iSCSI

● If the multipathing software provided by Microsoft iSCSI Initiator has been


installed:
a. Open the iSCSI Initiator Properties dialog box and click the Persistent
Targets tab. Click Remove to delete all targets in the Select a target
area, as shown in Figure 5-25.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 107


OceanStor
Troubleshooting 5 Emergency Handling

Figure 5-25 Removing targets

b. Log in to Microsoft iSCSI Initiator and clear the Enable multi-path, as


shown in Figure 5-26.

Figure 5-26 Logging in to Microsoft iSCSI Initiator

Suggestion and Summary


● When installing Microsoft iSCSI Initiator, do not install the multipathing
software by deselecting the Microsoft MPIO Multipathing Support for iSCSI.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 108


OceanStor
Troubleshooting 5 Emergency Handling

● If the multipathing software has been installed and used, remove all targets
in the persistent target list. Then log in to Microsoft iSCSI Initiator and clear
the Enable multi-path.

Related Information
None

5.3 Emergency Handling Of Basic Storage Service


Faults
This section explains how to troubleshoot basic storage service faults.

5.3.1 Fibre Channel Link Failure


A Fibre Channel link failure may cause service interruption and data loss between
the application server and the storage system.

Symptom

Log in to the DeviceManager and click System. On the system page, click

to display the rear view. On the rear view of the storage device, click the
interface module in the red square. View the information about the Fibre Channel
front-end ports. Health Status and Running Status of a Fibre Channel front-end
port are respectively -- and Link down.
The Link indicator of the Fibre Channel front-end port is steady red or off.

Alarm Information
On the Alarms and Events page, click the Current Alarms tab. The alarm Link to
the Host Port Is Down may be displayed on the tab page.

Possible Causes
Possible causes for a Fibre Channel link failure:
● The optical transceiver is faulty.
● The optical transceiver is incompatible with the front-end port.
● The rate of the optical transceiver is different from that of the front-end port.
● The optical fiber is improperly connected or faulty.
● The port rate of the storage device is different from that of its peer end.
– In a direct-connection network, the rate of the Fibre Channel front-end
port on the storage device is different from that of the host bus adapter
(HBA) on the application server.
– In a switch-based network, the rate of the switch is different from that of
the Fibre Channel front-end port on the storage device or that of the
HBA on the application server.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 109


OceanStor
Troubleshooting 5 Emergency Handling

Impact
An unavailable Fibre Channel link causes a link down failure, service interruption,
and data loss between the application server and the storage system.

Fault Diagnosis

Figure 5-27 Troubleshooting flowchart for a Fibre Channel link failure

Fibre Channel link failure

Handle the fault according to


Is the optical Yes
suggestion to the alarm "Port Optical
transceiver faulty?
Module Is Faulty".

No

Is the optical Handle the fault according to the


Yes
transceiver incompatible with suggestion to the alarm "Optical
the front-end port? Module Incompatible With The Port".

No

Is the rate of Handle the fault according to the


the optical transceiver Yes suggestion to the alarm "Optical
different from that of the Module Rate Is Inconsistent With
front-end port? That Of Its Port".

No

Is the optical fiber


Yes Remove and reinsert, or replace the
improperly connected or
optical fiber.
faulty?

No

Adjust the rate of the Fibre Channel


Is the port rate of the
Yes front-end port, switch port, and the
storage device different from
Fibre Channel HBA card to be
that of its peer end?
consistent.

No

Unknown faults exist. Contact technical


support engineers.

End

Procedure
● Cause 1: The optical transceiver is faulty.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 110


OceanStor
Troubleshooting 5 Emergency Handling

a. On the DeviceManager, check whether the alarm Port Optical Module Is


Faulty exists.

▪ If yes, go to b.

▪ If no, go to •Cause 2: The optical tr....


b. Handle the fault according to the suggestion.
c. Check whether the Link indicator of the Fibre Channel front-end port is
steady green or blue and its Running Status on the DeviceManager is
Link up.

▪ If yes, the procedure is complete.

▪ If no, go to •Cause 4: The optical fi....


● Cause 2: The optical transceiver is incompatible with the front-end port.
a. On the DeviceManager, check whether the alarm Optical Module
Incompatible With The Port exists.

▪ If yes, go to b.

▪ If no, go to •Cause 3: The rate of th....


b. Handle the fault according to the suggestion.
c. Check whether the Link indicator of the Fibre Channel front-end port is
steady green or blue and its Running Status on the DeviceManager is
Link up.

▪ If yes, the procedure is complete.

▪ If no, go to •Cause 4: The optical fi....


● Cause 3: The rate of the optical transceiver is different from that of the front-
end port.
a. On the DeviceManager, check whether the alarm Optical Module Rate
Is Inconsistent With That Of Its Port exits.

▪ If yes, go to b.

▪ If no, go to •Cause 4: The optical fi....


b. Handle the fault according to the suggestion.
c. Check whether the Link indicator of the Fibre Channel front-end port is
steady green or blue and its Running Status on the DeviceManager is
Link up.

▪ If yes, the procedure is complete.

▪ If no, go to •Cause 4: The optical fi....


● Cause 4: The optical fiber is improperly connected or faulty.
a. Remove and reinsert, or replace the optical fiber.
b. Check whether the Link indicator of the Fibre Channel front-end port is
steady green or blue and its Running Status on the DeviceManager is
Link up.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 111


OceanStor
Troubleshooting 5 Emergency Handling

▪ If yes, the procedure is complete.

▪ If no, go to •Cause 5: The port rate ....


● Cause 5: The port rate of the storage device is different from that of its peer
end.
a. Check whether the network is a direct-connection network or a switch-
based network.

▪ If it is a switch-based network, go to b.

▪ If it is a direct-connection network, go to 7.
b. Check whether the rate of the Fibre Channel front-end port on the
storage device is the same as that of the switch port connecting to the
storage device.
NOTE

For details about how to check the rate of a switch port or a Fibre Channel HBA,
consult the switch or HBA manufacturer or refer to the product manuals.

▪ If yes, go to e.

▪ If no, go to c.
c. Adjust Working Rate of the Fibre Channel front-end port to the rate of
the switch port.
d. After the rate is adjusted to the same, check whether the Link indicator
of the Fibre Channel front-end port is steady green or blue and its
Running Status on the DeviceManager is Link up.

▪ If yes, the procedure is complete.

▪ If no, go to e.
e. Check whether the rate of the Fibre Channel HBA on the application
server is the same as that of the switch port connecting to the
application server.

▪ If yes, keep the fault environment intact and contact technical


support engineers.

▪ If no, go to f.
f. Adjust the rate of the switch port to the rate of the Fibre Channel HBA.
g. After the rate is adjusted to the same, check whether the Link indicator
of the Fibre Channel front-end port is steady green or blue and its
Running Status on the DeviceManager is Link up.

▪ If yes, the procedure is complete.

▪ If no, keep the fault environment intact and contact technical


support engineers.
h. Check whether the rate of the Fibre Channel front-end port on the
storage device is the same as that of the Fibre Channel HBA on the
application server.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 112


OceanStor
Troubleshooting 5 Emergency Handling

▪ If yes, keep the fault environment intact and contact technical


support engineers.

▪ If no, go to i.
i. Adjust Working Rate of the Fibre Channel front-end port to the rate of
the Fibre Channel HBA.
j. After the rate is adjusted to the same, check whether the Link indicator
of the Fibre Channel front-end port is steady green or blue and its
Running Status on the DeviceManager is Link up.

▪ If yes, the procedure is complete.

▪ If no, keep the fault environment intact and contact technical


support engineers.

Suggestion and Summary


None

5.3.2 iSCSI Link Failure


An iSCSI link failure may cause service interruption and data loss between the
application server and the storage system.

Symptom

Log in to the DeviceManager and click System. On the system page, click

to display the rear view. On the rear view of the storage device, click the
interface module in the red square. View the information about the iSCSI front-
end ports. Running Status of an iSCSI port is Link down.
The Link indicator of the iSCSI front-end port is steady red or off.

Alarm Information
On the Alarms and Events page, click the Current Alarms tab. The alarm Link to
the Host Port Is Down may be displayed on the tab page.

Possible Causes
Possible causes for an iSCSI link failure:
● The IP address is incorrectly configured for the iSCSI front-end port on the
storage device or the service network port on the application server.
● The network cable between the application server and storage device is
improperly connected or faulty.

Impact
An unavailable iSCSI link causes a link down failure, service interruption, and data
loss between the application server and the storage system.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 113


OceanStor
Troubleshooting 5 Emergency Handling

Fault Diagnosis

Figure 5-28 Troubleshooting flowchart for an iSCSI link failure

iSCSI link failure

Is the IP address Modify the IP address of the iSCSI


incorrectly configured for the Yes front-end port to be on the same
iSCSI front-end port on the storage network segment as the IP
device or the service network port address of the service network
on the application server? port.

No

If the fault persists after you


Is the network cable
remove and reinsert, or replace
between the application server and Yes
the network cable, contact the
storage device improperly connected
network administrator to handle
or faulty?
network problems.

No

Contact technical support engineers.

End

Procedure
● Cause 1: The IP address is incorrectly configured for the iSCSI front-end port
on the storage device or the service network port on the application server.
a. Ping the iSCSI front-end port from the application server. Check whether
the iSCSI front-end port is reachable.

▪ If yes, keep the fault environment intact and contact technical


support engineers.

▪ If no, go to b
b. Check whether the network is a direct-connection network or a switch-
based network.

▪ If it is a direct-connection network, go to c.

▪ If it is a switch-based network, go to d.
c. Modify the IP address of the iSCSI front-end port to be on the same
network segment as the IP address of the service network port. Go to e.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 114


OceanStor
Troubleshooting 5 Emergency Handling

NOTE

You can also modify the IP address of the service network port to be on the same
network segment as the IP address of the iSCSI front-end port.
d. Add a route between the iSCSI front-end port and the service network
port to enable the communication between them. Go to e.
e. Ping the iSCSI front-end port from the application server again. Check
whether the iSCSI front-end port is reachable.

▪ If yes, the procedure is complete.

▪ If no, go to •Cause 2: The network ca....


● Cause 2: The network cable between the application server and storage
device is improperly connected or faulty.
a. Remove and reinsert, or replace the network cable.
b. Ping the iSCSI front-end port from the application server again. Check
whether the iSCSI front-end port is reachable.

▪ If yes, go to c.

▪ If no, keep the fault environment intact and contact technical


support engineers.
c. Check whether the Link indicator of the iSCSI front-end port is steady
green or blue and its Running Status on the DeviceManager is Link up.

▪ If yes, the procedure is complete.

▪ If no, keep the fault environment intact and contact technical


support engineers.

Suggestion and Summary


None

5.3.3 Failure to Log In to a Storage System After CHAP


Authentication Is Disabled

Symptom
CHAP authentication is enabled for initiators on the DeviceManager and the
automatic target reconnection is configured on Windows Server 2003. However,
after CHAP authentication is disabled on the DeviceManager and the application
servers are restarted, the application servers cannot be reconnected to the targets
automatically.

Alarm Information
None

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 115


OceanStor
Troubleshooting 5 Emergency Handling

Possible Causes
After CHAP authentication is disabled on the DeviceManager, CHAP authentication
is not updated on the application servers, causing the application servers unable
to access the storage system.

Procedure
Step 1 Open Microsoft iSCSI Initiator on the application server.
Step 2 Click the Persistent Targets tab. In the Select a Target list, delete the IP
addresses of the iSCSI front-end ports on the storage system.
Step 3 Click the Targets tab. In the Targets list, select the IP addresses of the iSCSI front-
end ports on the storage device.
Step 4 Click Log On.
The Log On to Target dialog box is displayed.
Step 5 Select Automatically restore this connection when the system boots and click
OK to save the settings. Restart the application server.
If the application server automatically reconnects to the targets after it is started
up, the fault is rectified. Otherwise, keep the fault environment intact and contact
technical support.

----End

Suggestion and Summary


To disable CHAP authentication for a storage system, disable CHAP authentication
on the application servers connected to the storage system first.

Related Information
None

5.3.4 Operations in an NFS Share Are Suspended


Symptom
A Linux client uses the NFSv4 protocol to mount an NFS share. When an operation
is performed in the NFS share, the client may stop responding and the operation is
suspended but the storage system is still working properly.

Possible Causes
When the client executes a process, the client may receive a message indicating
that the TCP connection must be disabled. However, the TCP connection in Linux
client is not disabled properly. As a result, the TCP connection resources are not
completely cleared. When the TCP connection is established again, an exception
occurs and the establishment times out. The client attempts to establish the TCP
connection again and again. However, the storage system does not sense these
attempts.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 116


OceanStor
Troubleshooting 5 Emergency Handling

Fault Diagnosis

Figure 5-29 Operations in an NFS share are suspended

Start

Are I/O suspension statistics


and abnormal logs generated at No
the point in time when the fault
occurs?

Yes

Are NFSv4 packets generated No


on the client and storage
system?

Yes

No
Are logs and port status on the An unknown error exists. Contact Huawei
client normal? technical support to locate the error.

Yes

Restart the Linux client.

End

Procedure
Step 1 Restart the Linux operating system.

Step 2 Load the NFS share again and check whether the fault is rectified.
● If yes, keep the fault environment intact and contact technical support
engineers.
● If no, no further action is required.

----End

5.4 Emergency Handling Of Value-added Service


This section describes how to troubleshoot faults related to value-added services.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 117


OceanStor
Troubleshooting 5 Emergency Handling

5.4.1 Failure to Delete a Tenant Administrator on the


DeviceManager
Symptom
On the DeviceManager, a user fails to delete the tenant administrator when
deleting a vStore from the vStore management view.

Alarm Information
None

Possible Causes
System internal error.

Procedure
Step 1 Run the delete user command on the CLI to delete the tenant administrator.
After the command is executed, check whether the administrator is deleted
successfully.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact Huawei technical
support.

----End

Suggestion and Summary


None

5.4.2 After the Local Huawei Storage System Is Powered Off


Unexpectedly, the File System Created Based on the eDevLUN
Is Lost
After the local Huawei storage system is powered off unexpectedly, the file system
created based on the eDevLUN is lost and the eDevLUN becomes a raw disk.

Symptom
After the local Huawei storage system is powered off unexpectedly, the file system
created based on the eDevLUN is lost and the eDevLUN becomes a raw disk.

Possible Causes
After the local storage system is recovered, it detects the third-party LUN and
reports the eDevLUN to the host. I/Os from the host to the eDevLUN fail before
the eDevLUN is recovered. As a result, the file system created based on the
eDevLUN cannot be read, and the host detects the eDevLUN as a raw disk.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 118


OceanStor
Troubleshooting 5 Emergency Handling

Fault Diagnosis

Figure 5-30 Flowchart for troubleshooting an eDevLUN detected as a raw disk


After the local storage system
is powered off unexpectedly,
the file system created based
on the eDevLUN is lost.

Is the status of the No Contact technical support


heterogeneous storage engineers of the remote
normal? storage system.

Yes

Is the link between the No


local storage system and
host down?

Yes

Go to step 2 and recover the


link.

No An unknown error occurred.


Is the faulty rectified? Contact technical support
engineers.

Yes

End

Procedure
Step 1 Check whether the third-party LUN corresponding to the eDevLUN is connected.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers of the remote storage system.

Step 2 Check whether the local storage system is reset.


● If yes,
a. Disconnect the cable between the host and local storage system (or
disable the host's optical port that connects to the switch).
b. Wait until no physical drive letters can be detected on the host.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 119


OceanStor
Troubleshooting 5 Emergency Handling

c. Connect the cable between the host and local storage system (or enable
the host's optical port that connects to the switch).
● If no, keep the fault environment intact and contact technical support
engineers.
Step 3 Check whether the fault is rectified, the file system created based on the eDevLUN
is recovered, and original files remain.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

5.4.3 Status of a Remote Replication Consistency Group Is


Invalid
Operations cannot be performed in a remote replication consistency group due to
the invalid status of the remote replication consistency group.

Symptom
Log in to the CLI, and run the show consistency_group general command to
check information about the remote replication consistency group. The health
status of the consistency group is fault.

Alarm Information
On the Alarms and Events page, click the Current Alarms tab. The alarm
indicating that Remote Replication Consistency Group Is Unavailable may exist.

Possible Causes
Possible causes are as follows:
● The consistency group is manually deleted from the local computer upon the
link interruption.
● Adding members to the consistency group fails, removing members from the
consistency group fails, or a primary/secondary switchover of the consistency
group fails.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 120


OceanStor
Troubleshooting 5 Emergency Handling

Fault Diagnosis

Figure 5-31 Flowchart for troubleshooting the invalid status of a consistency


group
Status of a remote replication
consistency group is invalid

Whether an alarm Yes Handle the invalid status problem


indicating that Replication Link Is according to the alarm Replication
Down exists? Link Is Down.

No

Whether
adding/removing Yes Delete the consistency group,
members to the consistency group create a new consistency group,
fails, or a primary/secondary and add members to the newly
switchover of the consistency created consistency group.
group fails?

No
The fault is unknown. Contact
technical support engineers.

End

Procedure
● Cause 1: The remote replication link is interrupted.
a. Check whether an alarm indicating that Replication Link Is Down exists.

▪ If yes, go to b.

▪ If no, go to •Cause 2: Adding members....


b. Handle the fault according to the suggestion to the alarm Replication
Link Is Down.
c. Delete the consistency group from the primary end and secondary end,
and create a consistency group again.
d. Add members to the consistency group.
Check whether the invalid status is solved.

▪ If yes, no further action is required.

▪ If no, keep the fault environment intact and contact technical


support engineers.
● Cause 2: Adding members to the consistency group fails, removing members
from the consistency group fails, or a primary/secondary switchover of the
consistency group fails.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 121


OceanStor
Troubleshooting 5 Emergency Handling

a. Delete the consistency group from the primary end and secondary end,
and create a consistency group again.
b. Add members to the consistency group.
Check whether the invalid status is solved.

▪ If yes, no further action is required.

▪ If no, keep the fault environment intact and contact technical


support engineers.

Suggestion and Summary


None

5.4.4 Interrupted Secondary LUN in a Clone


The storage system shows the alarm of Clone Pair Is Abnormally Interrupted.

Symptom
Log in to the CLI of the storage system and run show clone secondary_lun
clone_id=?. Running Status of a secondary LUN is Interrupted.

Alarm Information
On the Alarms and Events page of the DeviceManager, click the Current Alarms
tab. The alarm Clone Pair Is Abnormally Interrupted is displayed.

Possible Causes
● The I/O processing mechanism is malfunctioning.
● A controller is faulty.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 122


OceanStor
Troubleshooting 5 Emergency Handling

Fault Diagnosis

Figure 5-32 Flowchart for troubleshooting an interrupted secondary LUN in a


clone
Interrupted secondary LUN in
a clone

Is the storage pool Yes


Expand the storage pool.
capacity used up?

No

Yes
Is the LUN faulty? Rectify the LUN fault.

No

Synchronize or reversely
synchronize the clone.

Yes
Is the fault rectified?

No

Unknown faults exist. Contact


technical support engineers.

End

Procedure
Step 1 Check whether the alarm Storage Pool Capacity Is About to Be Used Up exists.
● If yes, expand the storage pool. Then go to Step 3.
● If no, go to Step 2.
Step 2 Check whether the alarm LUN Is Faulty exists.
● If yes, keep the fault environment intact and contact technical support
engineers to handle the LUN fault. After the fault is rectified, go to Step 3.
● If no, go to Step 3.
Step 3 Synchronize or reversely synchronize the clone, and then verify that the Running
Status of the secondary LUN is still Interrupted.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 123


OceanStor
Troubleshooting 5 Emergency Handling

● If yes, keep the fault environment intact and contact technical support
engineers.
● If no, no further action is required.

----End

Suggestion and Summary


None

5.4.5 A Mirrored LUN Malfunctions


Alarm "A Mirrored LUN Malfunctions" is reported when a mirror copy of the
mirrored LUN or the mirrored LUN malfunctions.

Symptom
On the DeviceManager, Health Status of a mirrored LUN is Fault.

Possible Causes
● A mirror copy of the mirrored LUN malfunctions.
● The mirrored LUN malfunctions.

Fault Diagnosis

Figure 5-33 Flowchart for troubleshooting a mirrored LUN malfunctioned

A mirrored LUN malfunctions

Does a mirror copy Yes Rectify the storage pool


of the mirrored LUN
fault.
malfunction?

No

Yes
Does the mirrored Rectify the storage pool
LUN malfunction? fault.

No

Unknown faults exist. Contact


technical support engineers.

End

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 124


OceanStor
Troubleshooting 5 Emergency Handling

Procedure
● Cause 1: A mirror copy of the mirrored LUN malfunctions.
a. Log in to the DeviceManager and check whether alarm Storage Pool Is
Faulty is reported, indicating that the storage pool where the mirror copy
resides malfunctions.

▪ If yes, go to b.

▪ If no, go to •Cause 2: The mirrored L....


b. Take recommended actions to recover the storage pool.
c. Check whether the fault is rectified.

▪ If yes, no further action is required.

▪ If no, go to d.
d. The mirror copy is an eDevLUN. Check whether alarm External LUN Is
Faulty or another alarm related to heterogeneous disk arrays is reported.

▪ If yes, go to e.

▪ If no, go to •Cause 2: The mirrored L....


e. Take recommended actions to rectify the fault.
f. Check whether the fault is rectified.

▪ If yes, no further action is required.

▪ If no, keep the fault environment intact and contact technical


support engineers.
● Cause 2: The mirrored LUN malfunctions.
a. Log in to the DeviceManager and check whether alarm Storage Pool Is
Faulty is reported, indicating that the storage pool where the mirrored
LUN resides malfunctions.

▪ If yes, go to b.

▪ If no, keep the fault environment intact and contact technical


support engineers.
b. Take recommended actions to recover the storage pool.
c. Check whether the fault is rectified.

▪ If yes, no further action is required.

▪ If no, keep the fault environment intact and contact technical


support engineers.

5.4.6 The Storage System Is Powered Off During NDMP-based


Backup or Restore
The storage system is powered off during NDMP-based backup or restore,
interrupting the backup or restore service.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 125


OceanStor
Troubleshooting 5 Emergency Handling

Symptom
When NDMP is used for backup or restore, the storage system is powered off,
interrupting the backup or restore service.

Possible Causes
When NDMP is used for backup or restore, the storage system is powered off,
interrupting NDMP service. As a result, the backup server is disconnected from the
storage system, causing backup or restore failure.

Procedure
Step 1 Wait until the storage system is powered on, log in to DeviceManager, and choose
Settings > Storage Settings > File Storage Service > NDMP Settings to check
whether the NDMP service is enabled.
● If yes, go to Step 3.
● If no, go to Step 2.

Step 2 Select Enable to enable the NDMP service.

Step 3 Check the backup networking mode.


● If the networking mode is LAN-free, discover the backup media on the backup
server.
● If the networking mode is LAN, go to Step 4.

Step 4 Start the backup or restore service and check whether the fault is rectified.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.

----End

5.4.7 NetBackup Authentication Fails After a Storage System


Restarts
After a storage system normally restarts, NetBackup authentication may fail due
to network data management protocol (NDMP) service problems.

Symptom
After a storage system normally restarts, NMDP authentication in NetBackup may
fail.

Possible Causes
During the restart process of the storage system, the NDMP service has been
started before the Internet Small Computer Systems Interface (iSCSI) (Fibre
Channel) driver is successfully loaded.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 126


OceanStor
Troubleshooting 5 Emergency Handling

Procedure
Step 1 Wait until the storage system is successfully powered on, log in to OceanStor
DeviceManager, and choose Settings > Storage Settings > File Storage Service >
NDMP Settings.
Step 2 Click Restart Service.
The NDMP service is restarted.
Step 3 Perform NDMP authentication in NetBackup and check whether the fault is
rectified.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact Huawei technical
support.

----End

5.4.8 A Message Indicating Expired Password Is Displayed


When a Client Is Using a CIFS Share
Symptom
A client uses a CIFS share accessed by a local authentication user. When the client
attempts to perform read and write operations on the CIFS share, a message
indicating expired password is displayed.

Possible Causes
The validity period of the local authentication user's password is set to 180 days.
The password expires.

Procedure
Step 1 Change the password of the local authentication user.
1. Log in to DeviceManager.

2. Choose Provisioning > User Authentication.


3. Select the local authentication user whose password you want to change.
4. Click Properties.
The Local Authentication User Properties dialog box is displayed.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 127


OceanStor
Troubleshooting 5 Emergency Handling

5. Click Change password.


6. In New Password, enter a new password.
7. In Confirm Password, enter the new password again.
8. Click OK to finish modifying the password of the local authentication user.

Step 2 Use the new password to log in to the CIFS share again to check whether the fault
is rectified.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact Huawei technical
support.

----End

5.5 Emergency Handling Of Other Faults


This chapter explains how to troubleshoot other common faults.

5.5.1 A Storage Pool Loses Efficacy

Symptom
Services are interrupted, and the following alarm information is generated:

● On the CLI, enter the show lun general command. It is found that the health
status of some LUNs is fault.
● On the CLI, enter the show storage_pool general command. It is found that
the health status of some storage pools is fault.
● On the CLI, enter the show disk general command. It is found that the
health status of more than two disks in a storage pool is fault.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 128


OceanStor
Troubleshooting 5 Emergency Handling

● On the CLI, enter the show alarm command. It is found that alarms are
generated indicating disk failure or removal.

Possible Causes
● Dual or multiple disks fail.
● Disks are faulty.

Impact
The storage pool is degraded or fails, and some or all storage services are
interrupted. Host services are interrupted.

Procedure
● Cause 1: Dual or multiple disks fail.
a. Check the mapping between disk slots and disk SNs.
i. Open the alarm list and extract all alarm information, refer to 3.2.7.3
Managing Current Alarms.
ii. Open the extracted alarm list. Use Disk Slot ID as the keyword to
search the entire list. The disk SN corresponding to the disk slot is
displayed:
Record the mapping between the disk slots and disk SNs.
b. Determine the disk failure or removal sequence based on the time when
the disk alarms or messages are generated.
i. Reinsert the removed or faulty disks in reverse order of disk removal
or failure.
ii. Check that indicators on the front panel of the disk enclosure are
normally turned on. Check whether disks are displayed on the device
figure on OceanStor DeviceManager and in the Normal state.
○ If yes, go to c.
○ If no, keep the fault environment intact and contact Huawei
technical support.
c. Repeat b until all disks are successfully recovered.
d. Check whether the storage pool and LUNs are recovered.

▪ If yes, no further action is required.

▪ If no, keep the fault environment intact and contact Huawei


technical support.
● Cause 2: Disks are faulty.
a. Rectify the fault. For details, see 5.1.2 Disk Failure.
b. In the developer view, run the change disk sn src_sn tgt_sn command to
add a new disk to the original storage pool.
In the command, src_sn indicates the SN of the faulty disk, and tgt_sn
indicates the SN of the new disk.
c. Wait about 2 minutes and check the status of the Running and Alarm/
Location indicators on the disk module to determine whether the disk
module is successfully installed.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 129


OceanStor
Troubleshooting 5 Emergency Handling

▪ If the Running indicator is steady on and the Alarm/Location


indicator is off, the disk module has been successfully installed.

▪ If the Running indicator is off or the Alarm/Location indicator is


steady on, the newly installed disk module is faulty, the disk module
slot is faulty, or the disk module is incorrectly installed.

5.5.2 File System Corrupted Due to I/O Processing Timeout


Symptom
● Read and write requests from the application server cannot be sent to the
storage device, causing service interruption.
● Log in to DeviceManager, and choose Monitor > Alarms and Events. Click
the Current Alarms tab. An alarm indicating that a disk domain is degraded
is displayed in the alarm list.

Possible Causes
The disk domain is degraded.

Impact
After a disk domain is degraded, I/O processing times out, causing file system
corruption. Host services are interrupted.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 130


OceanStor
Troubleshooting 5 Emergency Handling

Fault Diagnosis

Figure 5-34 Flowchart for handling a file system corruption failure due to I/O
processing timeout

File system corrupted

Collect storage system logs.

Collect disk domain information.

Is the disk domain No


degraded?

Yes
On the application server, check the
status of the file system
corresponding to the LUN.

Is the file system No


unavailable?

Yes

Repair the file system.

Keep the fault environment


No
Is the fault rectified? intact and contact technical
support engineers.
Yes

End

Procedure
● Windows-based application server
a. Log in to the DeviceManager. Check the health status of the disk domain.

On the navigation bar, click Provisioning. In the Storage


Configuration and Optimization area on the Resource Allocation page,
click Disk Domain. Check whether Health Status of the disk domain is
Degrade.

▪ If yes, go to b.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 131


OceanStor
Troubleshooting 5 Emergency Handling

▪ If no, keep the fault environment intact and contact technical


support engineers.
b. Check the file system status on the application server.
i. Right-click Computer and choose Manage from the shortcut menu.
ii. In the Computer Management dialog box, select Disk
Management.
iii. Select the disk corresponding to the LUN mapped to the application
server. Right-click the disk and choose Open from the shortcut menu.
Check whether the disk is available.
○ If yes, keep the fault environment intact and contact technical
support engineers.
○ If no, go to c.
c. Repair the file system.
i. Choose Start > Run and type cmd to go to the CLI.
ii. Run the chkdsk drive letter name -R command to locate the faulty
sector on the disk and restore the readable information.
For example, run the following command to restore disk E.
# chkdsk E: /R
The file system type is NTFS.
As the LUN is occupied by another process, the chkdsk command cannot be executed.
If the LUN is deleted, the chkdsk command may be executed.
If you delete the LUN, all open handles on the LUN will be deleted. Do you want to delete
the LUN forcibly? <Y/N> Y
The LUN has been deleted. All open handles on the LUN are invalid.
The LUN identifier is newly-added LUN.
The chkdsk command is verifying the file system (phase 1 among 5 phases).
2416 file records have been processed.
File verification is complete.
0 large file record has been processed.
0 incorrect file record has been processed.
0 EA file record has been processed.
0 resolution file record has been processed.
The chkdsk command is verifying indexes (phase 2 among 5 phases).
7583 file records have been processed.
Index verification is complete.
Five unindexed files have been processed.
......

d. Check whether the file system is restored.


Open the Computer Management dialog box and check whether the
disk corresponding to the LUN mapped to the application server is
available.
NOTE

For details, see b.

▪ If yes, the file system is restored.

▪ If no, keep the fault environment intact and contact technical


support engineers.
● Linux-based application server
a. Log in to the DeviceManager. Check the health status of the disk domain.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 132


OceanStor
Troubleshooting 5 Emergency Handling

On the navigation bar, click Provisioning. In the Storage


Configuration and Optimization area on the Resource Allocation page,
click Disk Domain. Check whether Health Status of the disk domain is
Degrade.

▪ If yes, go to b.

▪ If no, keep the fault environment intact and contact technical


support engineers.
b. Check the file system status on the application server.
Run the xfs_check command in the root directory to check whether the
file system is corrupted.

▪ If the system displays 1, the file system is corrupted.

▪ If the system displays 0, the file system is not corrupted.


For example, if the file system type is XFS and the drive letter of the disk
is /dev/sdb1, run the xfs_check command and press Enter. The following
information is displayed:
# xfs_check /dev/sdb1;echo $?
xfs_check: cannot open /dev/sdb1: No such device or address
1

View and record the command output.

▪ If 1 is displayed, go to c.

▪ If 0 is displayed, keep the fault environment intact and contact


technical support engineers.
c. Restore the file system.
i. Suspend services on the host.
ii. Run the mount command to view the file system type and the name
of the disk on which the file system is stored.
# mount
/dev/sdb1 on /directory type xfs (rw)

According to the command output, the file system is stored on


disk /dev/sdb1 and the file system type is xfs.
iii. Run the umount command to unmount the file system.
# unmount /dev/sdb1

iv. Run the xfs_repair command to restore the file system.


For example, run the following command to restore the file system
stored on disk /dev/sdb1.
# xfs_repair /dev/sdb1

v. After the file system is restored, run the mount command to mount
the disk to the original directory.
# mount /dev/sdb1 /directory

d. Check whether the file system is restored.


i. Run the xfs_check /dev/sdb1;echo $? command to check whether
the response from the application server is 0.
○ If yes, go to b.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 133


OceanStor
Troubleshooting 5 Emergency Handling

○ If no, keep the fault environment intact and contact technical


support engineers.
ii. Run the find /directory -name xfs command to check whether the
system responds.
○ If yes, go to c.
○ If no, the file system is restored.
iii. Run the xfs_repair command repeatedly to check whether the file
system is restored.
○ If yes, the file system is restored.
○ If no, keep the fault environment intact and contact technical
support engineers.

5.5.3 Server syslog-ng Did Not Receive Some Alarm


Notifications
Symptom
A large number of alarms were reported in a storage system. However, server
syslog-ng did not receive some alarm notifications. The netstat -su | sed -n '/
Udp:/,+6p' command is executed on server syslog-ng to check whether UDP data
packet loss occurs.

If the value in the red square increases, the UDP data packet loss occurs.

Possible Causes
The processing capabilities of server syslog-ng are limited and the UDP cache is
insufficient. You must modify the cache size of the server.

Procedure
Step 1 Run the sysctl -a | grep net.core and sysctl -a | grep udp commands on the
server to check the UDP cache size.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 134


OceanStor
Troubleshooting 5 Emergency Handling

Step 2 Run the sysctl -w command on the server to set the UDP cache size to a larger
value.

Step 3 Run the sysctl -p command to enable the UDP cache configuration to take effect.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 135


OceanStor
Troubleshooting 5 Emergency Handling

Step 4 Run the sysctl -a| grepnet.core and sysctl -a |grep udp commands on the server
to check the UDP cache size after the modification.

Step 5 Run the service syslog restart command to restart the syslog service.

----End

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 136


OceanStor
Troubleshooting A How to Obtain Help

A How to Obtain Help

If a tough or critical problem persists in routine maintenance or troubleshooting,


contact Huawei for technical support.

A.1 Preparations for Contacting Huawei


To better solve the problem, you need to collect troubleshooting information and
make debugging preparations before contacting Huawei.

A.1.1 Collecting Troubleshooting Information


You need to collect troubleshooting information before troubleshooting.
You need to collect the following information:
● Name and address of the customer
● Contact person and telephone number
● Time when the fault occurred
● Description of the fault phenomena
● Device type and software version
● Measures taken after the fault occurs and the related results
● Troubleshooting level and required solution deadline

A.1.2 Making Debugging Preparations


When you contact Huawei for help, the technical support engineer of Huawei
might assist you to do certain operations to collect information about the fault or
rectify the fault directly.
Before contacting Huawei for help, you need to prepare the boards, port modules,
screwdrivers, screws, cables for serial ports, network cables, and other required
materials.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 137


OceanStor
Troubleshooting A How to Obtain Help

A.2 How to Use the Document


Huawei provides guide documents shipped with the device. The guide documents
can be used to handle the common problems occurring in daily maintenance or
troubleshooting.
To better solve the problems, use the documents before you contact Huawei for
technical support.

A.3 How to Obtain Help from Website


Huawei provides users with timely and efficient technical support through the
regional offices, secondary technical support system, telephone technical support,
remote technical support, and onsite technical support.
Contents of the Huawei technical support system are as follows:
● Huawei headquarters technical support department
● Regional office technical support center
● Customer service center
● Technical support website: https://support.huawei.com/enterprise/
You can query how to contact the regional offices at https://
support.huawei.com/enterprise/.

A.4 Ways to Contact Huawei


Huawei Technologies Co., Ltd. provides customers with comprehensive technical
support and service. For any assistance, contact our local office or company
headquarters.
Huawei Technologies Co., Ltd.
Address: Huawei Industrial Base Bantian, Longgang Shenzhen 518129 People's
Republic of China
Website: https://e.huawei.com/

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 138


OceanStor
Troubleshooting B Glossary

B Glossary

If you want to obtain information about glossaries, visit https://


support.huawei.com/enterprise/. In the search field, enter a product model, and
select a path from the paths that are automatically displayed to go to the
document page of the product. Browse or download the OceanStor V500R007
Glossary.

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 139


OceanStor
Troubleshooting C Acronyms and Abbreviations

C Acronyms and Abbreviations

B
BBU Backup Battery Unit

D
DAS Direct-attached Storage

E
ESD Electrostatic Discharge

F
FC Fiber Channel
FRU Field Replaceable Unit

H
HBA Host Bus Adapter

I
IE Internet Explorer
IP Internet Protocol
iSCSI Internet Small Computer Systems Interface

L
LUN Logical Unit Number

N
NAS Network-attached Storage

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 140


OceanStor
Troubleshooting C Acronyms and Abbreviations

O
OTDR Optical Time Domain Reflectometer

R
RAID Redundant Array of Independent Disks

S
SAN Storage Area Network
SAS Serial Attached SCSI
SCSI Small Computer System Interface

W
WWPN World Wide Port Name

Issue 17 (2021-09-15) Copyright © Huawei Technologies Co., Ltd. 141

You might also like