OceanStor V500R007 Troubleshooting
OceanStor V500R007 Troubleshooting
V500R007
Troubleshooting
Issue 17
Date 2021-09-15
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees
or representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Website: https://e.huawei.com
Purpose
This document describes the complete process of troubleshooting from the
following aspects: safety precautions, troubleshooting preparations, fault diagnosis
principles and methods, troubleshooting procedure, and methods of
troubleshooting common faults.
The following table lists the product models.
Intended Audience
This document is intended for:
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Conventions
Symbol Description
Symbol Description
Change History
Updates between document issues are cumulative. Therefore, the latest document
issue contains all updates made in previous issues.
Issue 17 (2021-09-15)
This issue is the seventeenth official release.
Issue 16 (2021-06-30)
This issue is the sixteenth official release.
Issue 15 (2021-01-30)
This issue is the fifteenth official release.
Issue 14 (2020-11-30)
This issue is the fourteenth official release, which incorporates the following
changes:
Issue 13 (2020-04-10)
This issue is the thirteenth official release.
Issue 12 (2020-07-15)
This issue is the twelfth official release.
Issue 11 (2020-04-10)
This issue is the eleventh official release.
Modified some description.
Issue 10 (2019-12-30)
This issue is the tenth official release.
Modified some description.
Issue 09 (2019-10-30)
This issue is the ninth official release.
Added product models for 6800 V5, 6800F V5, 18500 V5, 18800 V5, 18500F V5,
and 18800F V5 V500R007C60 Kunpeng.
Issue 08 (2019-08-30)
This issue is the eighth official release.
Added product models for OceanStor 5300 V5, 5500 V5, 5600 V5, and 5800 V5
V500R007C60 Kunpeng.
Added product models for OceanStor 5300F V5, 5500F V5, 5600F V5, and 5800F
V5 V500R007C60 Kunpeng.
Issue 07 (2019-06-30)
This issue is the seventh official release.
Modified some description.
Issue 06 (2019-05-15)
This issue is the sixth official release.
Added OceanStor 5110F V5 to product models.
Issue 05 (2019-03-30)
This issue is the fifth official release.
Added OceanStor 5110 V5 to product models.
Issue 04 (2018-12-06)
This issue is the fourth official release.
Modified some description.
Issue 03 (2018-07-30)
This issue is the third official release.
Issue 02 (2018-01-30)
This issue is the second official release.
Modified some description.
Issue 01 (2017-11-30)
This issue is the first official release.
Contents
2 Overview....................................................................................................................................6
2.1 Requirements for Maintenance Engineers......................................................................................................................6
2.2 Tools and Meters..................................................................................................................................................................... 7
2.3 Spare Parts................................................................................................................................................................................. 8
2.4 Fault Levels................................................................................................................................................................................ 8
2.5 Fault Categories....................................................................................................................................................................... 9
2.6 Troubleshooting Categories...............................................................................................................................................10
2.6.1 Common Troubleshooting.............................................................................................................................................. 10
2.6.2 Emergency Handling........................................................................................................................................................ 13
2.7 Basic Principles....................................................................................................................................................................... 15
2.8 Troubleshooting Methods.................................................................................................................................................. 16
2.8.1 Alarm Analysis Method................................................................................................................................................... 16
2.8.2 Replacement Method....................................................................................................................................................... 16
4 Common Troubleshooting................................................................................................... 39
4.1 Troubleshooting Management Software Faults......................................................................................................... 39
4.1.1 Failure to Synchronize the Client Time Zone on the DeviceManager Due to the Browser Obtaining
System Time Zone Mechanism............................................................................................................................................... 39
4.1.2 Failure to Log In to the DeviceManager by Entering an IPv6 Address in the Address Box of a Browser
Earlier Than Firefox 24.0............................................................................................................................................................ 40
4.1.3 Browser SSL Information Is Damaged....................................................................................................................... 41
4.1.4 Alarm Sound Cannot Be Played on DeviceManager............................................................................................. 43
4.1.5 DeviceManager Has an Interface Input Exception................................................................................................ 45
4.1.6 After the DeviceManager Is Upgraded, a Picture Layout or Display Fault Occurs.....................................47
4.1.7 Current Alarms or All Events Exported from Internet Explorer 9 Are Deleted.............................................48
4.1.8 An Exception Occurs When a User Logs In to DeviceManager in Internet Explorer 10........................... 50
4.1.9 Failure to Log In to the DeviceManager Using a Firefox Web Browser......................................................... 50
4.1.10 Slow Loading of SystemReporter on the Chrome Web Browser.................................................................... 52
4.1.11 The DeviceManager Page Fails to Be Loaded or Is Displayed Incorrectly...................................................52
4.1.12 Timeout Occurs During Storage System Configuration.................................................................................... 53
4.1.13 Failed to Import the Configuration File.................................................................................................................. 54
4.1.14 Failed to Access DeviceManager and SystemReporter Using Internet Explorer....................................... 54
4.1.15 The System Responds Slowly When Excessive Maintenance Terminals Are Used to Connect to a
Storage System Simultaneously.............................................................................................................................................. 56
4.1.16 OceanStor DeviceManager Cannot Be Accessed Correctly Due To Boot Disk or Coffer Disk Faults
............................................................................................................................................................................................................ 56
4.1.17 Antivirus Scanning Failure............................................................................................................................................57
4.1.18 Failed to Install Adobe Flash Player on Windows Server 2012.......................................................................59
4.1.19 Failing to Log In to DeviceManager Using a Browser Again After a Timeout.......................................... 62
4.1.20 What Can I Do If the Alarm Sound and Quick Start of DeviceManager Do Not Function Properly
on Chrome Later Than 55?....................................................................................................................................................... 62
4.1.21 Periodically Updated Data Is Abnormal on DeviceManager........................................................................... 65
4.1.22 Failed to Deselect Items When Display Items Are Customized in SystemReporter.................................66
4.2 Troubleshooting Basic Storage Service Faults............................................................................................................. 67
4.2.1 Login Failure Through a Serial Port............................................................................................................................ 67
4.2.2 Failure to Add an iSCSI Link for a Remote Device................................................................................................. 70
4.2.3 Failure to Discover LUNs by an Application Server............................................................................................... 71
4.2.4 LUN Deletion Timeout..................................................................................................................................................... 76
4.2.5 Failure to Connect the Storage System to an AIX-Based Application Server for the First Time............77
4.2.6 The Storage System Does Not Detect the Initiators Provided by an HP-UX Server...................................78
5 Emergency Handling............................................................................................................ 80
5.1 Emergency Handling Of Hardware Module Faults (Applicable to V500R007)............................................... 80
5.1.1 Controller Failure............................................................................................................................................................... 80
5.1.2 Disk Failure.......................................................................................................................................................................... 83
5.1.3 Interface Module Failure................................................................................................................................................. 86
5.1.4 Expansion Module Failure.............................................................................................................................................. 88
5.1.5 Fan Module Failure........................................................................................................................................................... 90
5.1.6 BBU Failure.......................................................................................................................................................................... 93
5.1.7 Power Module Failure...................................................................................................................................................... 95
5.2 Emergency Handling Of Multipathing Software Faults.......................................................................................... 98
5.2.1 Failure to Load Multipathing Software on an Application Server....................................................................98
5.2.2 Blue Screen of Death When Multipathing Software Is Being Installed on a Windows-Based
Application Server..................................................................................................................................................................... 101
5.2.3 Controller Failure in a Non-UltraPath Environment........................................................................................... 103
5.2.4 UltraPath Software Unavailable Because Being Isolated by Antivirus Software...................................... 105
5.2.5 Failure to Detect Virtual Disks on a Windows-Based Application Server After the Multipathing
Function of Microsoft iSCSI Initiator Is Enabled............................................................................................................. 106
5.3 Emergency Handling Of Basic Storage Service Faults........................................................................................... 109
5.3.1 Fibre Channel Link Failure........................................................................................................................................... 109
5.3.2 iSCSI Link Failure............................................................................................................................................................. 113
5.3.3 Failure to Log In to a Storage System After CHAP Authentication Is Disabled........................................ 115
5.3.4 Operations in an NFS Share Are Suspended......................................................................................................... 116
5.4 Emergency Handling Of Value-added Service......................................................................................................... 117
5.4.1 Failure to Delete a Tenant Administrator on the DeviceManager.................................................................118
5.4.2 After the Local Huawei Storage System Is Powered Off Unexpectedly, the File System Created Based
on the eDevLUN Is Lost........................................................................................................................................................... 118
5.4.3 Status of a Remote Replication Consistency Group Is Invalid......................................................................... 120
5.4.4 Interrupted Secondary LUN in a Clone....................................................................................................................122
5.4.5 A Mirrored LUN Malfunctions.................................................................................................................................... 124
5.4.6 The Storage System Is Powered Off During NDMP-based Backup or Restore..........................................125
5.4.7 NetBackup Authentication Fails After a Storage System Restarts.................................................................126
5.4.8 A Message Indicating Expired Password Is Displayed When a Client Is Using a CIFS Share................127
5.5 Emergency Handling Of Other Faults......................................................................................................................... 128
5.5.1 A Storage Pool Loses Efficacy..................................................................................................................................... 128
5.5.2 File System Corrupted Due to I/O Processing Timeout..................................................................................... 130
5.5.3 Server syslog-ng Did Not Receive Some Alarm Notifications......................................................................... 134
B Glossary................................................................................................................................. 139
C Acronyms and Abbreviations........................................................................................... 140
This chapter provides guidelines for safety operations during activities such as
installation, maintenance, and troubleshooting. The guidelines consist of safety
regulations for both personnel and equipment. You must follow these guidelines
to avoid personal injury and equipment damage.
1.1 Alarm and Safety Symbols
1.2 Safety Precautions for ESD Protection
1.3 Safety Precautions for Laser Protection
1.4 Safety Precautions for Using Fibers
1.5 Safety Precautions for Using Power Cables (Applicable to Japan)
1.6 Safety Precautions for Short Circuit Protection
1.7 Safety Precautions for Operating Equipment
1.8 Safety Precautions for Condensation Prevention
Symbol Description
● Do not wear an ESD wrist strap when powering on the equipment to prevent
an electric shock.
● Do not touch devices with bare hands to prevent damage to the electrostatic
sensitive devices (ESSDs) on the circuit boards.
● The electronic line is prone to electrostatic damage. Wear an ESD wrist strap,
ESD gloves, and ESD clothing when handling disks, especially bare disks. Hold
a disk by its edge.
● Since an ESD wrist strap only prevents static electricity from the body, the ESD
clothing is required to prevent static electricity from clothes.
● Before installing or replacing devices, wear an ESD wrist strap, ESD gloves,
and ESD clothing to protect you and the equipment from static electricity.
● Use special ESD bags to carry or transport device components.
● Personal injury
● Equipment damage
Personal Injury
DANGER
The laser emitted by an optical module is an invisible infrared ray, which may
cause permanent eye injury. Do not look into the optical module during device
maintenance.
Equipment Damage
To prevent equipment damage when you handle the equipment, follow these
precautions:
● When not in use, the optical interfaces on the equipment and fiber connectors
on fiber jumpers must be covered with dust-proof caps.
● After removing a fiber jumper that connects to an optical interface on the
equipment, cover the optical interface and the fiber jumper connector with
dust-proof caps.
● When performing a hardware loopback test by connecting a fiber jumper to
an optical interface, add an attenuator to prevent the risk of damage to the
optical module caused by excessively strong optical power.
● When using the Optical Time Domain Reflectometer (OTDR), disconnect the
fiber jumper between the peer equipment and the local equipment to avoid
damage to the optical module caused by excessively strong optical power.
● Unless necessary, do not remove or insert the modules connecting to fibers.
DANGER
The laser beam on an optical interface board or from a fiber may cause eye injury.
Do not look into optical interfaces or fiber connectors during installation and
maintenance.
Replacing Fibers
Use dust-proof caps to cap the connectors of the fibers that are not in use.
NOTICE
● Do not place tools on air intake boards of cabinets. Otherwise, a short circuit
may occur.
● Do not drop screws into a cabinet or the equipment. Otherwise, a short circuit
may occur.
DANGER
● Before checking device installation and cable connections, ensure that the
system power supply is switched off. Otherwise, incorrect or loose cable
connections may result in personal injury or equipment damage.
● Do not wear an ESD wrist strap when powering on the equipment to prevent
an electric shock.
● Do not remove or insert cables and field replaceable units (FRUs) during a
system startup. Otherwise, data loss may occur.
● After you switch off the power supply, wait at least one minute before
switching it back on.
● To avoid disk damage and data loss, do not switch the power supply off while
any disk running indicators are still blinking.
Troubleshooting
DANGER
NOTE
If the temperature difference cannot be determined, wait one night after moving devices to
the equipment room and then install them.
2 Overview
Professional Knowledge
Maintenance engineers should be familiar with:
Basic Skills
Maintenance engineers should be able to operate:
● Storage devices
● Application servers
● Data transmission devices, such as Ethernet switches, Fibre Channel switches,
and routers
Test Meters
Maintenance engineers should be able to use:
Storage Networking
Maintenance engineers should have knowledge of:
Table 2-1 lists the tools and meters required for troubleshooting.
After learning the general conditions of the site, engineers may need to bring the
following spare parts to the site:
● Controller
● Backup Battery Unit (BBU)
● Fan module
● Power module
● Expansion module
● Interface module
● Disk
● Optical transceiver
● Fiber patch cord
● Shielded twisted-pair cable
● Serial cable
Major Host I/Os are delayed or host services are interrupted Emergency
because storage system performance deteriorates, handling
such as interface module faults, failure of an
application server to load the multipathing software
after the server is restarted.
Critica Some production systems are unavailable and the risk Emergency
l of some data loss is high or some data has been lost, handling
such as controller faults.
Definition
Common troubleshooting refers to the troubleshooting of faults that have no
adverse impact on storage system performance and host services. The fault level is
minor.
Process
Figure 2-1 shows the troubleshooting flowchart.
Troubleshooting
Yes
Yes
Yes
End
Operation Description
Operation Description
Locate the fault cause. Find out the exact cause of the fault from
multiple possible causes, using analyzing,
comparing, and other possible methods. For
details on common fault locating methods, see
2.8 Troubleshooting Methods.
Contact Huawei technical If you cannot rectify the fault, collect fault
support. information and contact Huawei technical
support.
Start
Application log
Application Collecting host logs
layer
Operation system log
Collecting storage
Storage layer
system fault information
End
Checking link status System faults will occur if network links are down. If
a system fault occurs, you need to check whether
cables are correctly connected and whether indicators
on ports to which cables are connected are normal.
Collecting switch logs You can check switch status and packet loss on ports
based on collected switch information. Then rectify
faults accordingly.
Definition
Emergency handling refers to the troubleshooting of system or device faults that
occur suddenly to resume operations and reduce loss without delay. The fault level
is major or critical.
Process
As a troubleshooting method, emergency handling must comply with the
following process:
Start
Yes
Check the fault rectification
result.
End
Operation Description
Inform Huawei of the If a critical fault occurs, inform Huawei of the fault in
fault. the first place and send collected fault information to
Huawei.
Determine the fault If a fault occurs, determine the fault impact based on
impact. the fault symptom to reduce loss caused by the fault.
Rectify the fault. Rectify the fault and recover host services.
Contact Huawei If you cannot rectify the fault, collect fault information
technical support. and contact Huawei technical support.
● Analyze external factors first, and then internal factors. When locating faults,
consider the external factors first.
– External factor failures include failures in optical fibers, optical cables,
power supplies, and customers' devices.
– Internal factors include disks, controllers, and interface modules.
● Analyze the alarms of higher severities and then those of lower severities. The
alarm severity sequence from high to low is critical alarms, major alarms, and
warnings.
● Analyze common alarms and then uncommon alarms. When analyzing an
event, confirm whether it is an uncommon or common fault and then
determine its impact. Determine whether the fault occurred on only one
component or on multiple components.
● If a fault that may cause data loss occurs, stop host services or switch services
to the standby host, and back up the service data in time.
● During emergency handling, completely record all operations performed.
● Emergency handling personnel must participate dedicated training courses
and understand related technologies.
● Recover core services before recovering other services.
Application Scenario
The alarm analysis method is applicable to locating any faults if alarm
information can be collected.
Application Example
A video service was running properly but suddenly the quality deteriorated. At that
time, an alarm was reported on the management software. The alarm information
specified that a disk had failed. The disk was then replaced and the fault was
rectified.
Summary
The alarm analysis method can help maintenance engineers locate faults and can
be used with other fault locating methods.
Application Scenario
The replacement method is applicable to locating hardware faults. This method
has no special requirement for maintenance engineers, but requires spare parts to
be prepared in advance.
Summary
The advantages of the replacement method are quick fault location and minimal
requirements for maintenance engineers.
After a fault occurs, collect the basic information, fault information, and storage
device information, and send it to maintenance engineers. This can help
maintenance engineers quickly locate and rectify the fault. Note that the
information collection operations described in this chapter must be authorized by
customers in advance.
3.1 Collecting Live Network Information
3.2 Collecting Fault Information
Collect the types of information specified in Table 3-1 and send the collected
information to maintenance engineers.
Basic Device serial Provide the serial number and version of the
information number and storage device.
version NOTE
You can log in to the DeviceManager and query the
serial number and version of the storage device in the
General area.
NOTE
Before collecting fault information, assess the fault impact on services ahead of time. Back
up data and obtain certain permissions when necessary.
In Windows
You can check whether UltraPath is working properly by viewing information
about physical paths, logical paths, virtual disk properties, performance statistics,
and alarms on the CLI or GUI of UltraPath.
In the CMD window, run the upadm show upconfig command to check the
UltraPath configuration.
----End
In Linux
Step 1 Check whether the multipathing software is installed.
Run the rpm -qa | grep UltraPath command to check whether the UltraPath is
installed properly. If the UltraPath information is displayed, UltraPath is installed.
Run the upadmin show path command to check the information about a
specified or all physical paths, including physical path IDs, initiator WWNs, storage
system name, owning controllers, target WWNs, physical path status, path
detection type, path detection status, and port type.
Run the upadmin show vlun command to check the information about all virtual
disks or a specified virtual disk, including VLUN IDs, disk names, VLUN names,
VLUN WWNs, VLUN status, capacity, owning/working controllers, storage device
names, storage SNs, logical path IDs, and working status.
Run the upadmin show vlun id=? command to check the information about the
logical path of a VLUN whose ID is specified, including logical path ID, SCSI
address, and path status.
Run the upadmin show upconfig command to check the UltraPath configuration.
----End
In AIX
Step 1 Check the physical path status.
1. Run the upadm show phypath command to check the information about a
specified or all physical paths, including physical path IDs, initiator WWNs,
storage system name, owning controllers, target WWNs, physical path status,
path detection status, and port type.
2. Run the upadm start phypathcheck id=? command to check the working
status of a specified physical path.
Run the upadm show path command to check the information about all logical
paths or a specified VLUN's logical paths. The information includes VLUN IDs,
logical path IDs, physical path IDs, initiator WWNs, storage system name, owning
controller, target WWNs, logical path status, and port type.
----End
In HP-UX
HP-UX11.31 is delivered with the NMP multipathing software. The NMP
multipathing software is installed upon the system installation.
2097152 sectors, 2097152 blocks of size 1024, log size 16384 blocks
largefiles supported
bash-4.0#
bash-4.0# mount /dev/disk/disk60 /test/mnt3/
bash-4.0# bdf
Filesystem kbytes used avail %used Mounted on
/dev/vg00/lvol3 1048576 314600 728296 30% /
/dev/vg00/lvol1 1835008 364368 1459224 20% /stand
/dev/vg00/lvol8 8912896 1421696 7434296 16% /var
/dev/vg00/lvol7 6553600 3037552 3488696 47% /usr
/dev/vg00/lvol4 524288 20952 499536 4% /tmp
/dev/vg00/lvol6 7864320 3071152 4760808 39% /opt
/dev/vg00/lvol5 114688 37872 76352 33% /home
/dev/vg_try/lv_try00
1228800 2447 1149713 0% /test/mnt1
/dev/vg_try/lv_try01
797845 9 718051 0% /test/mnt2
/dev/disk/disk60 2097152 18006 1949207 1% /test/mnt3
----End
----End
----End
Step 2 Run the mount command to view the mounting information of the current
operating system.
Step 4 Run the dd command to diagnose first 10 MB of the partition and collect file
system structure information.
----End
----End
----End
Ensure that the Oracle database is in the nomount or mount state and run the
show parameter background_dump_dest command. Then obtain the alert log
file in the queried path.
Step 2 Optional: Collect the alert log file of Oracle Automatic Storage Management
(Oracle ASM).
● Oracle 10g: Obtain the alert log file in path $ORACLE_BASE/admin/+asm/
bdump using user oracle.
● Oracle 11gR1: Obtain the alert log file in path $ORACLE_BASE/diag/asm/
+asm/trace using user oracle.
● Oracle 11gR2: Obtain the alert log file in path $ORACLE_BASE/diag/asm/
+asm/trace using user grid.
Step 3 Optional: On the ASM or database instance, run the select group_number,type
from v$asm_diskgroup command to collect ASM disk group information.
----End
----End
outpath indicates the path for saving the result, and db_name indicates the
database name.
----End
----End
In Windows
Step 1 Right-click Computer and choose Manage. On the Server Manager page that is
displayed, choose Diagnostics > Device Manager to open the device manager.
2. On the page that is displayed, click Driver and select Driver Details to view
the HBA details.
Step 3 Check the connection between the storage system and HBA.
1. Right-click Device Manager and choose View > Devices by connection.
2. Expand the PCI Bus option to check the connection.
----End
In Linux
Step 1 Run the cat /sys/class/scsi_host/hostX/symbolic_name command to check the
vendors and models of HBAs.
----End
In AIX
Step 1 Run the lslpp -l | grep fcp command to verify that the HBA drivers are installed
successfully.
Step 4 Run the lscfg -vl fcsX command to view device WWNs. In the command, fcsX
indicates an HBA device ID.
Step 5 Run the lsmcode -r -d fcsX command to view the HBA microcode information.
----End
In HP-UX
Step 1 Run the ioscan -fnC fc command to check the path and name of the device where
the HBAs reside.
Step 2 Run the fcmsutil command to check the HBA details.
fcmsutil /dev/td0
Vendor ID is = 0x00103c
Device ID is = 0x001028
TL Chip Revision No is = 2.3
PCI Sub-system Vendor ID is = 0x00103c
PCI Sub-system ID is = 0x000006
Topology = PRIVATE_LOOP
Local N_Port_id is = 0x000001
Local Loop_id is = 125
N_Port Node World Wide Name = 0x50060b0000010449
N_Port Port World Wide Name = 0x50060b0000010448
Driver state = ONLINE
Hardware Path is = 0/3/0/0
Number of Assisted IOs = 47983
Number of Active Login Sessions = 0
Dino Present on Card = NO
Maximum Frame Size = 960
Driver Version = @(#) libtd.a HP Fibre Channel
Tachyon TL/TS/XL2 Driver B.11.11.09 (AR1201) /ux/kern/ki
----End
Step 4 Select a path, enter the file name, and click Save to save the switch log
information.
----End
Step 2 Enter the user name and password and click Add Fabric.
Step 3 On the menu bar, click Switch and select Download Support File....
Step 4 Select a path for saving the log information and type the file name. You are
advised to name the file in the format of switch name_dump_date.tgz. Click
OPEN.
Step 5 In the Download Support File dialog box that is displayed, click Start.
----End
Step 1 Check the IP address of the Brocade switch and choose a host located on the same
network segment. Enable the FTP function and prepare the directory to save the
switch logs on the FTP server.
Step 2 Use the SSH to log in to the switch.
Step 3 Run the supportsave command.
swd77:admin>supportsave
This command collects RASLOG,TRACE,supportShow,core file,FFDC data
and other support information from both active and standby CPs and then transfer
them to FTP/SCP server
or a USB device.This operation can take several minutes.
NOTE:supportSave will transfer existing trace dump file first,the
automatically generate and transfer latest
Step 4 Enter Y. The system informs you of entering the following information:
1. Host IP or Host Name: IP address of the FTP server
2. User Name: user name of the FTP server
3. Password: password of the FTP server
4. Protocol (ftp or scp): used transfer protocol
5. Remote Directory: directory that is prepared on the FTP server
Step 5 The system starts to collect the switch information. The information is as follows:
Saving support information for chassis:swd77, module:RAS…
……………………
Saving support information for chassis:swd77, module:CTRACE_OLD…
Saving support information for chassis:swd77, module:CTRACE_NEW…
Saving support information for chassis:swd77, module:FABRIC…
…………
…………
----End
Brocade Switch
● Run the sfpshow portnum command to check the SFP information about a
port.
YQA-48K-F5C1:sfmon> sfpshow 10/29
Identifier: 3 SFP
Connector: 7 LC
Transceiver: 150c402000000000 100,200,400_MB/s M5,M6 sw Inter_dist
Encoding: 1 8B10B
Cisco Switch
● Run the show interface xxx transceiver details command to check the SFP
information about a port.
YQB-9513-F7HE1# show interface fc1/1 transceiver details
fc1/1 sfp is present
Name is CISCO-AVAGO
Manufacturer's part number is SFBR-5780APZ-CS2
Revision is G2.3
Serial number is AGA14378D9F
FC Transmitter type is short wave laser w/o OFC (SN)
FC Transmitter supports short distance link length
Transmission medium is multimode laser with 62.5 um aperture (M6)
Supported speeds are - Min speed: 2000 Mb/s, Max speed: 8000 Mb/s
Nominal bit rate is 8500 Mb/s
Link length supported for 50/125mm fiber is 50 m
Link length supported for 62.5/125mm fiber is 20 m
Cisco extended id is unknown (0x0)
No tx fault, no rx loss, in sync state, diagnostic monitoring type is 0x68
SFP Diagnostics Information:
----------------------------------------------------------------------------
Alarms Warnings
High Low High Low
----------------------------------------------------------------------------
Temperature 31.27 C 75.00 C -5.00 C 70.00 C 0.00 C
Voltage 3.31 V 3.63 V 2.97 V 3.46 V 3.13 V
Current 6.89 mA 8.50 mA 2.00 mA 8.50 mA 2.00 mA
Tx Power -2.53 dBm 1.70 dBm -14.00 dBm -1.30 dBm -10.00 dBm
Rx Power -3.44 dBm 3.00 dBm -17.30 dBm 0.00 dBm -13.30 dBm
Transmit Fault Count = 0
----------------------------------------------------------------------------
Note: ++ high-alarm; + high-warning; -- low-alarm; - low-warning
Context
● Running data indicates the real-time running status of a storage system, such
as, the configuration information of LUN. The running data file is in *.txt
format.
● System logs record the information about the running data, events, and
debugging operations on a storage system and can be used for analyzing the
running status of the storage system. The system log file is in *.tgz format.
● DHA runtime log is the daily runtime log of disk. It mainly includes daily disk
health status and I/O information. The DHA runtime log file is in *.tgz format.
– DHA logs collect the SMART/LogPage (collected at 2 o'clock in the
morning) and I/O statistics (collected every two hours) and generate a
package (1 KB) each day. A disk on a single controller can generate a
maximum of 74 packages within a year (some old log packages will be
deleted during the collection). Packages of a disk on a single controller
and an information file will be exported each time.
– Recommended times of export during routine maintenance are listed in
the following table. The analysis of DHA logs is only performed on
samples instead of all logs. To prevent the analysis of DHA logs from
affecting the entire routine maintenance, take the recommended values
only for reference.
Disk Quantity in an Array Maximum Times of Export During
an Inspection
0 to 200 ≤3
200 to 500 ≤4
500 to 1000 ≤5
1000 to 2000 ≤6
>2000 ≤6
● HSSD log is working log of HSSD, such as, the S.M.A.R.T information of disk.
The HSSD log file is in *.tgz format.
Before the download of system logs, DHA runtime logs, or HSSD logs, the system
collects those logs of controllers and shows the collection progress. After all logs
are collected, you can download your desired logs.
NOTICE
After the system starts collecting system logs, DHA run logs, or HSSD logs, you
need to wait for five minutes or download all the collected logs before you collect
and download other logs.
Procedure
Step 1 Log in to DeviceManager.
NOTE
● In the System Log area, if Recent logs is selected, the system exports recent logs that
have been generated by the current point in time. The logs include the latest one
power-on and power-off log and a maximum of six messages logs. If All logs is
selected, the system exports all logs on the current node. Note that historical messages
logs are saved to the /OSM/coffer_log/log/his_debug directory.
● If you export the data using the Internet Explorer browser with the default settings, the
data will be saved in the download path which the user has selected. For example, you
can choose Save > Save as in the displayed file download dialog box and select the
download path in Internet Explorer 9 browser.
● If you export the data using the Firefox browser with the default settings, the data will
be saved in the default download path of the browser. You can choose Tools > Options
and click the General > Browser in the Options dialog box to view the default
download path.
● If you export the data using the Google Chrome browser with the default settings, the
data will be saved in the default download path of the browser. You can choose
Customize and Control Google Chrome > Settings and view the default download
path in the Download Content area of the Settings page.
● When using Chrome to export for the first time, click Allow if the This site is
attempting to download multiple files. Do you want to allow this message?
message is displayed. Otherwise, at the upper right corner of the browser, choose
Customize and control Google Chrome > Settings > Privacy > Content Settings... >
Automatic downloads > Manage exceptions, select Allow in Behaviour, and click
Finished. Then, reopen the web page and you can successfully download multiple files.
Alternatively, delete Block from Behaviour and click Finished. Then, reopen the web
page again and you can download multiple files. In such a case, a message asking
whether to allow multiple files to be downloaded will be displayed.
● If the exported logs cannot be viewed, export the logs again. If the new logs still cannot
be viewed, contact Huawei technical support.
----End
Precaution
Exported alarms and events are saved in *.tgz (Save All) or *.xls (Save Selected)
file. Do not change the content of the file.
Procedure
Step 1 Log in to DeviceManager.
Step 2 Choose Monitor > Alarms and Events > All Events.
Step 4 Optional: Set search criteria and click Search to search for desired events.
Click Save As > Save All or select the events that you want to export and click
Save As > Save Selected. In the dialog box that is displayed, perform operations
as prompted.
----End
Precaution
Exported alarms and events are saved in *.tgz (Save All) or *.xls (Save Selected)
file. Do not change the content of the file.
Procedure
Step 1 Log in to DeviceManager.
Step 2 Choose Monitor > Alarms and Events > Current Alarms.
Step 4 Optional: Set search criteria and click Search to search for desired alarms.
Click Save As > Save All or select the alarms that you want to export and click
Save As > Save Selected. In the dialog box that is displayed, perform operations
as prompted.
Step 8 Optional: Click Send Simulated Alarm to simulate the reporting of a fault alarm.
Send this simulated alarm to test the alarm function of the device. If this
simulated alarm already exists, this alarm will be considered invalid after being
resent. Before the test, confirm that this simulated alarm has been manually
cleared. After the test, manually clear the alarm.
----End
Follow-up Procedure
If an alarm appears on the Current Alarms tab page, select the alarm and
diagnose the problem according to its details and repair suggestions.
Procedure
Step 1 Open Internet Explorer, and enter https://ipaddress:8088, the IP address of the
management network port, in the address box. ipaddress indicates the IP address
of the management network port.
Step 2 Enter your user name and password.
The fault page is displayed.
Step 3 Click Download Log.
The system automatically downloads logs.
----End
4 Common Troubleshooting
Symptom
The system displays a message indicating that the time zone is modified
successfully. However, the time zone information of the client is not synchronized
to the device.
Alarm Information
None
Possible Causes
The time zone of the client is modified after you log in to the DeviceManager.
Therefore, the time zone information of the client is not synchronized to the
device.
Fault Diagnosis
Figure 4-1 Fault diagnosis of the failure of synchronizing the client time zone on
the DeviceManager
No
End
Procedure
Step 1 Close the browser window.
Step 2 Reopen the browser, log in to the DeviceManager, and synchronize the time zone
of the client.
----End
Symptom
When an IPv6 address is entered in the address box of a browser earlier than
Firefox 24.0 to log in to the DeviceManager, the IP address cannot be added to the
exception or trusted site list. As a result, the login fails.
Alarm Information
None
Possible Causes
The Firefox web browser is incompatible with the DeviceManager.
Fault Diagnosis
End
Procedure
Step 1 When using an IPv6 address to log in to DeviceManager, you are advised to use
the Chrome browser and not to use Firefox 24.0 or an earlier version. Alternatively,
you can use an IPv4 address to log in to DeviceManager.
----End
Symptom
The possible symptoms after logging in to the DeviceManager:
● Device status remains offline.
● Device time has stopped.
● Click the content area left to the navigation bar to load, but the progress is
very slow.
● Press F5 to refresh the page. The system prompts that the web page cannot
be displayed.
Alarm Information
None
Possible Causes
Agent or other programs cause the negotiated symmetric key information damage
of the browser current tab page.
Fault Diagnosis
Figure 4-3 Flowchart for locating the cause for browser SSL information damage
Browser SSL information is
damaged
No
No
End
Procedure
Step 1 Copy the address path of the damage tab page.
Step 3 Open a tab page again and enter the copied tab page path. Press Enter to visit
DeviceManager again. Check whether the operation is successful.
● If yes, the problem is resolved.
● If no, go to Step 4.
Step 4 If the tab page problem remains, clear the browser cache and close the browser.
Step 5 Open the browser again and log in to the DeviceManager. Check whether the
operation is successful.
● If yes, the problem is resolved.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Symptom
● Symptom 1: A user logs in to the DeviceManager on Chrome (earlier than
Chrome 57) running on a Windows XP system, and the alarm sound is
enabled, as shown in Figure 4-4. However, no alarm sound is played when an
alarm is generated.
● Symptom 2: A user cannot enable the alarm sound on Chrome 57 to 71. The
browser reports that the Adobe Flash Player was blocked, as shown in Figure
4-5.
Alarm Information
None
Possible Causes
● The Flash Player of Chrome earlier than 57 has a compatibility issue with
Windows XP.
● The Adobe Flash Player of Chrome 57 to 71 is outdated and blocked by
Chrome.
Procedure
● The Flash Player of Chrome earlier than 57 has a compatibility issue with
Windows XP.
a. Download the Flash Player from Adobe's official website and install it.
b. Restart Chrome, type chrome://plugins in the address box, and press
Enter. The following page is displayed, as shown in Figure 4-6. If the
Flash Player is installed correctly, two Flash Player plug-ins are displayed.
The one that ends with pepflashplayer.dll in Location is Chrome's built-
in Flash Player, while the other is the newly installed one.
c. Disable Chrome's built-in Flash Player (clicking Disable) and enable the
newly installed one, as shown in Figure 4-7.
d. Press F5 to refresh the DeviceManager. On the home page, click the bell
button to enable the alarm sound function.
e. Check whether the system plays the alarm sound correctly.
Symptom
After a user logs in to DeviceManager by using a browser, characters cannot be
input on the interface or other interface input exceptions occur. For example, after
the Create Storage Pool dialog box is displayed, a user inputs a capacity value by
using the input method. The value cannot be input or an input exception occurs.
Alarm Information
None
Possible Causes
Possible causes are as follows:
● The browser input status is abnormal. For example, the shortcut key,
intelligent word selection function, or intelligent statistics function of the
input method triggers the setting page of the input method.
● The input method is set to the full-width state.
● The input method is incompatible with the browser. An unknown bug is
reported during the input process.
Fault Diagnosis
No
Is the input method set to Yes Switch the input method to the
the full-width state? half-width state.
No
Is the exception
solved after the input Yes
method is switched to
another?
No
End
Procedure
Step 1 Check whether the setting page or a sub-page of the input method is opened. If
the setting page or a sub-page of the input method is opened, close the setting
page or sub-page.
After the operation is complete, check whether the exception is solved.
● If yes, no further action is required.
● If no, go to Step 2.
Step 2 Check whether the input method is set to the full-width state. If the input method
is set to the full-width state, switch the input method to the half-width state.
After the operation is complete, check whether the exception is solved.
● If yes, no further action is required.
● If no, go to Step 3.
Step 3 Switch the input method to another input method.
After the operation is complete, check whether the exception is solved.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Symptom
After a user logs in to DeviceManager through a browser, a layout fault or picture
display fault occurs.
Alarm Information
None
Possible Causes
Because the browser's cache is not cleared, the user cannot obtain the latest static
resources.
Fault Diagnosis
Figure 4-9 Flowchart for locating a layout fault or picture display fault
A layout fault or picture
display fault occurs
Whether
it is the initial login after Yes
Clear the browser's cache.
the DeviceManager is
upgraded?
No
End
Procedure
Step 1 Check whether it is the initial login after the DeviceManager is upgraded.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
Step 2 Clear the browser's cache. For Internet Explorer, Firefox, or Chrome, press Ctrl
+Shift+Del when the browser is activated.
The page for clearing historical records is displayed. Select items that you want to
delete, and delete them. (For Internet Explorer, select the item of Internet
temporary files. For Chrome and Firefox, select the cache item.)
After the operation is complete, check whether the fault is solved.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
computer. When the user clicks Open or Save, Current Alarms or All Events are
deleted with a probability.
Symptom
A user employs Internet Explorer 9 (9.0.8112.16421) to log in to the
DeviceManager and export Current Alarms or All Events to the user's local
computer. When the user clicks Open or Save, Internet Explorer 9 displays a
message indicating that Current Alarms or All Events are removed or deleted.
Alarm Information
None
Possible Causes
Current Alarms or All Events exported from Internet Explorer 9 (9.0.8112.16421)
are deleted with a probability because security configuration of Internet Explorer 9
is incorrect.
Fault Diagnosis
Figure 4-10 Flowchart for locating a deletion fault of Current Alarms or All
Events exported from Internet Explorer 9
Current Alarms or All Events
exported from IE9 are deleted
No
End
Procedure
Step 1 Replace Internet Explorer 9 with another explorer or use Internet Explorer of
another version.
Then, export Current Alarms or All Events, and check whether the fault is
rectified.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Alarm Information
None
Possible Causes
Web browsers access DeviceManager through HTTPS. Errors occur in the
certificate chain of Internet Explorer 10. This is an inherent problem of Internet
Explorer 10.
Procedure
Press F5 to refresh the web page.
Then check whether the fault is rectified.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.
Symptom
After a user has imported a digital certificate or rolled back the digital certificate
to the factory defaults, the user fails to log in to the DeviceManager using the
Firefox web browser. The This Connection Is Untrusted page is displayed, and the
Add Exception button is unavailable, as shown in the following figure.
Alarm Information
None
Possible Causes
The cache of the Firefox web browser is not refreshed.
Procedure
In the Firefox web browser, press Ctrl+Shift+Del and clear the browsing history as
prompted.
After the operation is complete, restart the Firefox web browser and check
whether the fault is rectified.
Symptom
A user logs in to SystemReporter using Chrome or DeviceManager. However, the
loading is slow and SystemReporter cannot be used.
Possible Causes
Web pages previously cached by the earlier version of Chrome are still in the
browser cache.
Procedure
Step 1 Open the Chrome web browser and press Ctrl+Shift+Delete.
----End
Symptom
● The DeviceManager page is being loaded or is displayed incorrectly.
● Log in to the DeviceManager and go to the device view. When the device view
is being loaded, click other navigation paths in succession to switch to other
pages for several times. The tab page of the web browser breaks down
occasionally.
Alarm Information
None
Possible Causes
● The network is faulty, so the page fails to be loaded occasionally.
● The web browser is incompatible with the storage system, so the page fails to
be loaded occasionally.
● The cache data of the web browser is abnormal.
Procedure
Step 1 Press F5 to reload the page or press the Reload button on the current tab page.
Step 2 Open the browser and press Ctrl+Shift+Delete. Clear the browsing history as
prompted.
Log in to the browser and check whether the page is successfully loaded.
----End
Symptom
When a storage system is being configured, a message is displayed stating The
communication is abnormal or the system is busy. Please try again later.
Possible Causes
● The communication link is down, so the configuration fails or the returned
result is lost.
● The storage system is processing a system fault or abnormality, so it cannot
run a command.
Procedure
Step 1 Check whether the configuration has taken effect.
● If yes, go to Step 2.
● If no, go to Step 3.
Step 3 Check the alarm and log information and remove the system fault or abnormality.
Step 4 Run the command after the system is recovered and check whether the command
is successfully executed.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Symptom
The configuration file fails to be imported. A message is displayed stating Receive
message failed.
Possible Causes
● The primary controller malfunctions and cannot process any services.
● The primary controller is being restarted and cannot process any services.
Procedure
Step 1 Check whether the primary controller status is normal.
● If yes, keep the fault environment intact and contact technical support
engineers.
● If no, go to Step 2.
Step 2 Wait until the primary/secondary switchover is complete (for example, for 10s)
and import the configuration file.
----End
Symptom
● The login pages of DeviceManager and SystemReporter are blank.
● The login page of SystemReporter is displayed, but nothing is displayed after
Login is clicked.
Possible Causes
The Internet Explorer browser on the operating system of the server is configured
for enhanced security.
Fault Diagnosis
No
End
Procedure
Step 1 Open an Internet Explorer browser.
----End
Alarm Information
None
Possible Causes
More than 3 maintenance terminals are used simultaneously to connect to the
storage system.
Conclusion: Excessive maintenance terminals are used, causing the system
response to become slow.
Procedure
Reduce the number of maintenance terminals that are simultaneously used to
connect to the storage system.
other controllers. In the meantime, controller or coffer disk fault alarms are
reported in the storage system.
Possible Causes
● In multi-controller scenarios, the boot disk of the current controller cannot be
detected.
● In single-controller scenarios, the coffer disk of the current controller cannot
be detected.
Procedure
Contact technical support engineers.
Symptom
Possible Causes
● The antivirus server has not been added into an antivirus domain, causing
access and scanning failures.
● Antivirus Agent Watchdog of the antivirus server has not been started using
an antivirus domain user, causing access and scanning failures.
● The antivirus software on the antivirus server has not been started using an
antivirus domain user, causing access and scanning failures.
● Some files to be scanned (for example, EXCEL) are opened exclusively by
some software, causing scanning failures. This problem is normal and requires
no handling.
● The files to be scanned are deleted. Temporary files are generated when a
user uses some editors, such as vi, to edit files. These files are deleted when
the user exits the vi editor, causing scanning failures. This problem is normal
and requires no handling.
● Antivirus software has vulnerabilities, which cause scanning failures.
● The antivirus domain user has not been added to the antivirus group, causing
access and scanning failures.
Procedure
● Cause 1: The antivirus server has not been added into an antivirus domain,
causing access and scanning failures.
a. On the antivirus server, right-click Computer and choose Properties.
b. In Computer name, domain, and workgroup settings, click Change
settings.
c. On the Computer Name tab page, click Change.
d. In Member of, select Domain and enter the full domain name.
e. Click OK. Check whether the scanning is successful.
Possible Causes
Special Flash Player is built in the Windows Server 2012 system, but the plug-in
has not been enabled.
Procedure
Step 1 Open the server manager and click Add roles and features.
Step 3 Expand User Interfaces and Infrastructure and select Desktop Experience.
In the Add Roles and Features Wizard dialog box that is displayed, click Add
Features.
Step 5 On the Confirm installation selections page that is displayed, click Install.
Step 6 After the installation is complete, restart the PC. Open the control panel and you
can see that Adobe Flash Player has been successfully installed.
----End
Symptom
When you use a browser to log in to DeviceManager, the wait times out, and you
log out automatically. If you try to log in again, Communicating with the device
failed. Please check that the network connection or the system is normal is
displayed. The retry fails.
Possible Causes
The security certificate used by DeviceManager is not trusted by the browser. The
login request is intercepted by the browser.
Procedure
Step 1 Press F5 to refresh the browser page.
NOTE
The browser may prompt that the security certificate is questionable. Ignore this prompt
and continue visiting the storage system.
Step 2 Enter the user name and password to check whether you can log in to
DeviceManager.
● If you can log in, no further action is required.
● If you fail to log in, keep the environment intact and contact technical
support.
----End
Symptom
The alarm sound is disabled by default. After you use Chrome later than 55 to log
in to DeviceManager and enable the alarm sound, sound for new alarms is still
not played. Message This plugin is not supported is displayed for Quick Start, as
shown in Figure 4-13.
NOTE
For details about the browser versions supported by DeviceManager, see the Huawei
Storage Interoperability Navigator.
Possible Causes
If the customer's maintenance terminal cannot access Adobe's official website or
Chrome 55's built-in Flash Player is not updated, Chrome 55 checks whether the
current Flash Player version has security vulnerabilities. If security vulnerabilities
exist, Chrome blocks the plug-in.
Procedure
Step 1 Download the Flash Player from Adobe's official website and install it.
Step 2 If the Chrome version is earlier than 57, perform the following operations:
1. Restart Chrome, type chrome://plugins in the address box, and press Enter.
The following page is displayed, as shown in Figure 4-14. If the Flash Player is
installed correctly, two Flash Player plug-ins are displayed. The one that ends
with pepflashplayer.dll in Location is Chrome's built-in Flash Player, while
the other is the newly installed one.
2. Disable Chrome's built-in Flash Player (clicking Disable) and enable the newly
installed one, as shown in Figure 4-15.
– If no, keep the fault environment intact and contact Huawei technical
support.
5. Check whether the system plays Quick Start correctly.
– If yes, no further action is required.
– If no, keep the fault environment intact and contact Huawei technical
support.
Step 3 If the Chrome version ranges from 57 to 71, perform the following operations:
1. Restart Chrome, type chrome://settings/content/flash in the address box,
and press Enter. Enable Ask first (recommended) and click ADD. Add the IP
address of the storage system's management port to the Allow list. Then save
the settings and restart Chrome, as shown in Figure 4-16.
Figure 4-16 Enabling Flash Player and configuring the Allow list
Step 4 If the Chrome version is later than 71, visit the Chrome official website to obtain
the operation method.
----End
Symptom
After you log in to DeviceManager and do not perform any operation for a certain
period of time, the data that should be updated periodically is abnormal. For
example, the performance curve is displayed as a straight line in the performance
statistics area, or the alarm status in the current alarm area remains unchanged.
Possible Causes
The browser fails to communicate with the storage device for a certain period of
time due to a network fault. In addition, the session has not timed out during the
period. Querying data fails before the session is automatically logged out.
Procedure
Step 1 Check whether the communication between the maintenance terminal where
DeviceManager is located and the storage device is normal.
● If yes, go to Step 2.
● If no, go to Step 3.
Step 2 Check whether the network status was normal when the data was abnormal.
● If yes, keep the fault environment intact and contact technical support
engineers.
● If no, go to Step 3.
Step 3 Contact the network administrator to restore the network communication
between the maintenance terminal where DeviceManager is located and the
storage device, and re-log in to DeviceManager to check whether the fault is
rectified.
● If rectified, no further action is required.
● If not rectified, keep the fault environment intact and contact technical
support engineers.
----End
Possible Causes
The active scripting is disabled in the browser.
Procedure
Step 1 Open Internet Explorer.
Step 5 Restart the Internet Explorer browser and check whether the fault is rectified.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Symptom
After a maintenance terminal is connected to the serial port on a storage device
with a serial cable, the maintenance terminal cannot receive messages from the
serial port, the serial port outputs bit errors, or the login prompt is not displayed.
Alarm Information
None
Possible Causes
Possible causes for a login failure through a serial port:
Fault Diagnosis
Figure 4-17 Troubleshooting flowchart for a login failure through a serial port
No
No
No
No
End
Procedure
● Cause 1: The serial port is being used.
a. After running the serial port login tool, check whether a message is
displayed indicating that the serial port cannot be opened.
▪ If no, go to b.
b. Stop the program or process that is using the serial port.
c. On the maintenance terminal, reattempt the login using the serial port.
Check whether the maintenance terminal receives messages outputted by
the serial port.
▪ If no, go to b.
b. Remove and reinsert, or replace the serial cable.
c. On the maintenance terminal, retry the login through the serial port.
Check whether the maintenance terminal receives messages from the
serial port.
▪ If no, go to d.
d. Check by trying the login from a remote desktop.
NOTE
▪ If yes, go to b.
Symptom
● When an iSCSI link is added for the remote device immediately after the iSCSI
initiator is renamed, a message is displayed stating The communication is
abnormal or the system is busy. Please try again later.
Possible Causes
An iSCSI link is added for the remote device within 30 seconds after the iSCSI
initiator is renamed.
Procedure
Step 1 Check whether the iSCSI initiator is renamed in 30 seconds.
● If yes, go to Step 2.
● If no, go to Step 4.
Step 2 Check whether an iSCSI link has already been added for the remote device.
● If yes, go to Step 3.
● If no, go to Step 4.
Step 3 Delete the existing iSCSI link.
----End
Symptom
LUNs have been mapped to the application server but cannot be discovered on
the application server.
Alarm Information
None
Possible Causes
Possible causes for a failure to discover LUNs by an application server:
● The storage pool is faulty.
● The link is abnormal.
● The node file on the application server is lost (for Linux or UNIX).
● The dynamic detection mechanism of the application server is not triggered
(for Mac OS X).
● A LUN whose host LUN ID is 0 is not mapped to the application server (for
HP-UX).
● The automatic LUN scan function is disabled on the application server (for
Solaris 9).
Impact
An application server fails to discover LUNs, causing it unable to use storage
resources.
Fault Diagnosis
No
No
No
Is the automatic
Restart LUN scan on the
LUN scan function disabled on Yes
port on the application
the application server
server.
(for Solaris 9)?
No
End
Procedure
● Cause 1: The storage pool is faulty.
a. Check whether there is the alarm Storage Pool Is Faulty on the storage
system.
▪ If yes, go to b.
▪ If yes, go to c.
▪ If it is an iSCSI network, go to e.
d. Troubleshoot the Fibre Channel link failure by referring to 5.3.1 Fibre
Channel Link Failure. Go to f.
e. Troubleshoot the iSCSI link failure by referring to 5.3.2 iSCSI Link Failure.
Go to f.
f. Scan for LUNs again on the application server. Check whether the fault is
rectified.
▪ If yes, go to b.
▪ If no, go to c.
c. In Terminal on the application server, run the mknod command to create
a node.
NOTE
The format of the mknod command is mknod Name {b | c} Major Minor, where
Name indicates the name of the device, b | c indicates whether the device is a
block device or a character device, Major indicates the ID of the major device,
and Minor indicates the ID of the minor device.
d. Scan for LUNs again on the application server. Check whether the fault is
rectified.
▪ If yes, go to b.
The dynamic detection mechanism is not triggered when the Mac OS X based
application server has no LUN mapping. You need to restart the application
server to trigger the mechanism.
c. Scan for LUNs again on the application server. Check whether the fault is
rectified.
▪ If yes, go to b.
▪ If no, go to c.
▪ If yes, go to b.
▪ If yes, go to c.
Possible Causes
If the LUN mapping is deleted when the host is sending a write I/O request to the
disk array, the write I/O cannot reach the disk array. As a result, the disk array
resource allocated to the I/O cannot be released, causing LUN deletion timeout.
Procedure
Step 1 Wait about 10 minutes and click Refresh on the LUN page of DeviceManager to
check whether the LUN is successfully deleted.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Alarm Information
None
Possible Causes
1. The health status of the storage system and the application server is normal.
2. The link between the storage system and the application server is normal.
Conclusion: The special mechanism of AIX causes the failure to connect the
storage system to the AIX-based application server for the first time.
Procedure
Step 1 Run the lsdev -Cc adapter|grep fcs command on the application server to view
the HBA information.
-bash-3.00# lsdev -Cc adapter | grep fcs
fcs0 Aailable 05-00 4Gb FC PCI Express Adapter (df1000fe)
fcs1 Aailable 05-00 4Gb FC PCI Express Adapter (df1000fe)
fcs2 Aailable 06-00 4Gb FC PCI Express Adapter (df1000fe)
fcs3 Aailable 06-00 4Gb FC PCI Express Adapter (df1000fe)
-bash-3.00#
Step 2 Run the lscfg -vpl fcsx command (where x indicates the HBA ID) to view the
World Wide Port Names (WWPNs) of the HBAs (in bold type).
-bash-3.00# lscfg -vpl fcs1
fcs1 P2-I2/Q1 FC Adapter
Part Number...............LP9802-F2
Serial Number.............BG50l99256
Network Address...........l00000000C9447DA7
ROS Level and ID..........02E0l99l
Device Specific.(Z0)......2003806D
Device Specific.(Z1)......00000000
Device Specific.(Z2)......00000000
Device Specific.(Z3)......03000909
Device Specific.(Z4)......FF60l4l6
Device Specific.(Z5)......02E01991
Device Specific.(Z6)......06631991
Device Specific.(Z7)......07631991
Device Specific.(Z8)......20000000C9447DA7
Device Specific.(Z9)......HS1.92A1
Device Specific.(ZA)......H1D1.92A1
Device Specific.(ZB)......H2D1.92A1
Device Specific.(YL)......P2-I2/Q1
PLATFORM SPECIFIC
Name: fibre-channel
Node: fibre-channel@c
Physical Location: P2-I2/Q1
-bash-3.00#
Step 3 Check whether the HBA is connected to the storage system based on the WWPN.
If the WWPN is consistent with that on the HBA label, the HBA is connected to the
storage system.
Step 4 Run the rmdev -dl fcsx -R command (where x indicates 0 or 1) to delete the HBA
connected to the storage system.
Run the lsdev -Cc adapter|grep fcs command to view the HBA information and
confirm that the HBA is successfully deleted.
Step 6 Run the lsdev -Cc adapter|grep fcs to view that the HBA connected to the
storage system is displayed again.
After the HBA is detected, you can view and add the initiator for the application
server on the DeviceManager.
----End
Related Information
None
Symptom
After an HP-UX server is connected to the storage system using Fibre Channel
links, its initiators cannot be detected by the storage system.
Possible Causes
If no LUNs are mapped to an HP-UX server, the HP-UX server does not connect to
the storage system.
Procedure
Step 1 Log in to the storage system and create initiators for the HP-UX server.
Step 2 Remap LUNs to the HP-UX server.
Step 3 Remove and reinsert the links and check whether the fault is rectified.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact Huawei technical
support.
----End
5 Emergency Handling
When a burst storage event that remarkably affects services occurs, start the
emergency plan, including the emergency process and emergency measures, to
minimize the impact.
5.1 Emergency Handling Of Hardware Module Faults (Applicable to V500R007)
5.2 Emergency Handling Of Multipathing Software Faults
5.3 Emergency Handling Of Basic Storage Service Faults
5.4 Emergency Handling Of Value-added Service
5.5 Emergency Handling Of Other Faults
Symptom
The controller Alarm indicator on the storage device is steady red. Figure 5-1 and
Figure 5-2 show the location of the controller Alarm indicator.
Alarm Information
On the Alarms and Events page, click the Current Alarms tab. Alarms related to
the controller are displayed.
Possible Causes
The controller is faulty.
Impact
Controller faults may deteriorate system performance and reliability.
Fault Diagnosis
Start
End
Procedure
Step 1 Check whether the controller Alarm indicator on the storage device is steady red.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
Step 2 Rectify the fault by following the handling suggestions on the DeviceManager
alarm details page.
1. Log in to DeviceManager.
2. Choose Insight > Alarms and Events.
3. Click a fault to view details, and rectify the fault based on the suggestion in
the Suggestion area.
Step 3 Check whether the controller Alarm indicator is steady green and Health Status
of the controller on the DeviceManager is Normal.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Alarm Information
On the Alarms and Events page of the DeviceManager, click the Current Alarms
tab. Alarms related to the disk are displayed.
Possible Causes
The disk is faulty.
Impact
A disk failure causes the disk domain to which the disk belongs to be degraded or
fail. If the disk domain is degraded, the system read/write performance
deteriorates and data loss may occur. If the disk domain fails, data loss occurs and
services are interrupted.
Fault Diagnosis
Start
Start
End
Procedure
Step 1 Check whether the disk Alarm/Location indicator on the storage device is steady
red.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
Step 2 Rectify the fault by following the handling suggestions on the DeviceManager
alarm details page.
1. Log in to DeviceManager.
Choose Insight > Alarms and Events.
Click a fault to view details, and rectify the fault based on the suggestion in
the Suggestion area.
Step 3 Check whether the disk Alarm/Location indicator is steady green and Health
Status of the disk on the DeviceManager is Normal.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Log in to the DeviceManager and click System. On the system page, click
to display the rear view. Click the interface module in the red square.
Health Status of the interface module is Faulty.
The interface module Power indicator on the storage device is steady red. Figure
5-7 shows the location of the interface module Power indicator.
Alarm Information
On the Alarms and Events page, click the Current Alarms tab. Alarms related to
the interface module are displayed.
Possible Causes
The interface module is faulty.
Impact
If an interface module malfunctions, it cannot process services and services will
work in single-link mode, resulting in service interruption risks.
Fault Diagnosis
Start
End
Procedure
Step 1 Check whether the power indicator of the interface module is steady yellow on
the storage device.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
Step 2 Rectify the fault by following the handling suggestions on the DeviceManager
alarm details page.
1. Log in to DeviceManager.
2. Choose Insight > Alarms and Events.
3. Click a fault to view details, and rectify the fault based on the suggestion in
the Suggestion area.
Step 3 Check whether the interface module Power indicator is steady green and Health
Status of the interface module on the DeviceManager is Normal.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Symptom
Log in to the DeviceManager and click System. Select a disk enclosure. On the
system page, click to display the rear view. On the rear view of the storage
device, click the expansion module in the red square. Health Status of the
expansion module is Faulty.
The expansion module Alarm indicator on the storage device is steady red. Figure
5-9 and Figure 5-10 show the location of the expansion module Alarm indicator.
Alarm Information
On the Alarms and Events page, click the Current Alarms tab. Alarms related to
the expansion module are displayed.
Possible Causes
The expansion module is faulty.
Fault Diagnosis
Start
End
Procedure
Step 1 Check whether the expansion module Alarm indicator on the storage device is
steady red.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
Step 2 Rectify the fault by following the handling suggestions on the DeviceManager
alarm details page.
1. Log in to DeviceManager.
2. Choose Insight > Alarms and Events.
3. Click a fault to view details, and rectify the fault based on the suggestion in
the Suggestion area.
Step 3 Check whether the expansion module Alarm indicator is steady green and Health
Status of the expansion module on the DeviceManager is Normal.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Alarm Information
On the Alarms and Events page, click the Current Alarms tab. Alarms related to
the fan module are displayed.
Possible Causes
The fan module is faulty.
Impact
If a fan module is faulty, the temperature of the controller enclosure or disk
enclosure may increase. If the storage system works at a high temperature for a
long time, the service life of the storage system may be impaired.
Fault Diagnosis
Start
Yes
Is the fan Replace the faulty fan
module faulty? module.
No
End
Procedure
Step 1 Check whether the fan module Running/Alarm indicator on the storage device is
steady red.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
Step 2 Check whether any objects affect the rotation of the fans.
● If yes, go to Step 3.
● If no, go to Step 4.
After the objects are removed, check whether the fan module Running/Alarm
indicator is steady green and Health Status of the fan module on the
DeviceManager is Normal.
----End
Log in to the DeviceManager and click System. On the front view of the
storage device, click the BBU in the red square. Health Status of the BBU is
Faulty.
The BBU Running/Alarm indicator on the storage device is steady red. Figure 5-15
shows the location of the BBU Running/Alarm indicator.
Alarm Information
On the Alarms and Events page, click the Current Alarms tab. Alarms related to
the BBU are displayed.
Possible Causes
The BBU is faulty.
Impact
A BBU fault may reduce the reliability of the storage system.
Fault Diagnosis
Start
End
Procedure
Step 1 Check whether the BBU Running/Alarm indicator on the storage device is steady
red.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
Step 2 Rectify the fault by following the handling suggestions on the DeviceManager
alarm details page.
1. Log in to DeviceManager.
----End
Log in to the DeviceManager and click System. On the system page, click
to display the rear view. On the rear view of the storage device, click the
power module in the red square. Health Status of the power module is Faulty.
The power Running/Alarm indicator is steady red. Figure 5-17, Figure 5-18 and
Figure 5-19 show the location of the power Running/Alarm indicator.
Alarm Information
On the Alarms and Events page, click the Current Alarms tab. Alarms related to
the power module are displayed.
Possible Causes
The power module is faulty.
Impact
● For a 2 U controller enclosure, if a power module is faulty and no redundant
power module is available for the controller, the system reliability decreases.
● For a 4 U controller enclosure, a power module fault does not affect system
reliability.
Fault Diagnosis
Start
End
Procedure
Step 1 Check whether the power Running/Alarm indicator on the storage device is steady
red.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers.
Step 2 Rectify the fault by following the handling suggestions on the DeviceManager
alarm details page.
1. Log in to DeviceManager.
2. Choose Insight > Alarms and Events.
3. Click a fault to view details, and rectify the fault based on the suggestion in
the Suggestion area.
Step 3 Check whether the power Running/Alarm indicator is steady green and Health
Status of the power module on the DeviceManager is Normal.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Symptom
After the multipathing software is installed on an application server that runs the
Linux or UNIX operating system and the application server is restarted, the
application server fails to load the multipathing software.
Alarm Information
None
Possible Causes
Multiple operating systems are installed on the application server and the
menu.lst file of the last installed operating system does not have the
multipathing startup option.
Impact
If an application server fails to load multipathing software, the storage system
runs with single links up, system performance deteriorates and service operation
risks increase.
Fault Diagnosis
Does the
menu.lst file have Yes
Modify the menu.lst file.
the UltraPath startup
option?
No
End
Procedure
Step 1 In the CLI, run the vi /boot/grub/menu.lst command to open the configuration file
of the operating system.
NOTE
Step 3 Run the mount /dev/partition /filepath command to mount each partition.
Step 4 Run the ls -l /boot/grub/ command to check whether each mounted partition has
the configuration file menu.lst.
If a mounted partition does not have the configuration file, check the next
partition. If a mounted partition has the configuration file, run the vi menu.lst
command to open it. The configuration file content is as follows:
# Modified by YaST2. Last modification on ?.12?.19 17:53:26 UTC 2009
default 0
timeout 8
gfxmenu (hd0,2)/boot/message
##YaST - activate
root (hd0,2)
kernel /boot/vmlinuz-2.6.16.46-0.12-default root=/dev/sda3 vga=0x317 resume=/dev/sda1 splash=silent
showopts
initrd /boot/initrd-2.6.16.46-0.12-default
###Don't change this comment - YaST2 identifier: Original name: SUSE Linux Enterprise Server 10 SP1
(/dev/sda2)###
title SUSE Linux Enterprise Server 10 SP1 (/dev/sda2)
kernel (hd0,1)/boot/vmlinuz-2.6.16.46-0.12-default root=/dev/sda2 vga=0x317 resume=/dev/sda1
splash=silent showopts
initrd (hd0,1)/boot/initrd-2.6.16.46-0.12-default
Step 5 Input i to enter the editing mode. Copy the content of the configuration file in
Step 1 into the configuration file of the current mounted partition.
The modified configuration file content is as follows:
# Modified by YaST2. Last modification on ?.12?.19 17:53:26 UTC 2009
default 0
timeout 8
gfxmenu (hd0,2)/boot/message
title Linux with UltraPath
root (hd0,0)
kernel /vmlinuz-2.6.16.60-0.21-smp root=/dev/system/root vga=0x314 resume=/dev/system/swap
splash=silent showopts
initrd /mpp-2.6.16.60-0.21-smp.img
##YaST - activate
root (hd0,2)
kernel /boot/vmlinuz-2.6.16.46-0.12-default root=/dev/sda3 vga=0x317 resume=/dev/sda1 splash=silent
showopts
initrd /boot/initrd-2.6.16.46-0.12-default
###Don't change this comment - YaST2 identifier: Original name: SUSE Linux Enterprise Server 10 SP1
(/dev/sda2)###
title SUSE Linux Enterprise Server 10 SP1 (/dev/sda2)
kernel (hd0,1)/boot/vmlinuz-2.6.16.46-0.12-default root=/dev/sda2 vga=0x317 resume=/dev/sda1
splash=silent showopts
initrd (hd0,1)/boot/initrd-2.6.16.46-0.12-default
root (hd0,2)
kernel /boot/vmlinuz-2.6.16.46-0.12-default root=/dev/sda3 vga=normal showopts ide=nodma apm=off
acpi=off noresume nosmp noapic maxcpus=0 edd=off 3
initrd /boot/initrd-2.6.16.46-0.12-default
Step 7 Type :wq and press Enter to exit and save the configuration file menu.lst.
Step 8 Repeat steps Step 4 to Step 7 to modify the menu.lst files of all the operating
systems on the application server. Go to Step 9.
Step 9 Restart the application server and check whether the fault is rectified.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
NOTE
The displayed content may vary with different operating systems. The actually displayed
content may be different.
Symptom
When multipathing software is being installed on a Windows-based application
server (Windows Server 2003 or Windows Server 2008), an unexpected error
occurs, resulting in a blue screen.
Alarm Information
None
Possible Causes
The latest service pack (SP) is not installed in the Windows operating system.
Impact
A BSOD error occurs when multipathing software is being installed in Windows, so
the multipathing software fails to be installed. If a storage system is running with
single links up, system performance deteriorates and service operation risks
increase.
Fault Diagnosis
No
End
Procedure
Step 1 Restart the operating system. Go to Windows Advanced Options Menu and
select Last Known Good Configuration.
Step 2 Log in to the operating system. Download the latest Windows SP from the
Microsoft official website.
Step 4 Restart the application server. Check whether the fault is rectified.
● If yes, the procedure is complete.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Symptom
● Indicators on controller A show that controller A is working properly but all
indicators on controller B are off.
● Read and write requests from the application servers connected to controller
B cannot be sent to the storage system, causing service interruption. On the
Performance monitoring page of the DeviceManager, the write or read I/O
traffic on the front-end port of controller B becomes 0.
Possible Causes
The controller is faulty.
Impact
If a controller is faulty and host services are interrupted when UltraPath is not
installed, you can manually switch the host services to another functional
controller.
Fault Diagnosis
Troubleshooting
Whether the
indicator of No
controller B is steady
green?
Yes
End
Procedure
Step 1 Switch services from controller B to controller A.
1. Remove the cable between controller B and the application server.
2. Connect the application server to controller A.
3. Reconfigure the initiator of the host where services are interrupted.
Enable the performance monitoring function before you check the service processing on the
DeviceManager.
----End
Alarm Information
None
Possible Causes
The antivirus software detects UltraPath as a virus and isolates it by mistake.
Procedure
Step 1 On the management page of the antivirus software, add UltraPath to the trust list.
Step 2 Restart the antivirus software.
----End
Related Information
None
Alarm Information
None
Possible Causes
In the CLI of the application server, run the iscsicli ListPersistentTargets
command to view the information about the initiator.
Target Name : iqn.2006-08.com.:21000022a1002828:notconfig:192.168.252.1
Address and Socket : 192.168.252.1 3260
Session Type : Data
Initiator Name : Root\SCSIADAPTER\0000_0
Port Number : <Any Port>
Security Flags : 0x0
Version :0
Information Specified: 0x20
Login Flags : 0xa
Multipath Enabled
NOTE
Login Flags is 0xa, which corresponds to 00001010b in binary. The 1 at the second bit
from left to right indicates that the multipathing function of Microsoft iSCSI Initiator is
enabled at login.
Procedure
● If the multipathing software provided by Microsoft iSCSI Initiator has not been
installed.
When installing Microsoft iSCSI Initiator, do not install the multipathing
software by deselecting the Microsoft MPIO Multipathing Support for iSCSI,
as shown in Figure 5-24.
Figure 5-24 Clearing the Microsoft MPIO Multipathing Support for iSCSI
● If the multipathing software has been installed and used, remove all targets
in the persistent target list. Then log in to Microsoft iSCSI Initiator and clear
the Enable multi-path.
Related Information
None
Symptom
Log in to the DeviceManager and click System. On the system page, click
to display the rear view. On the rear view of the storage device, click the
interface module in the red square. View the information about the Fibre Channel
front-end ports. Health Status and Running Status of a Fibre Channel front-end
port are respectively -- and Link down.
The Link indicator of the Fibre Channel front-end port is steady red or off.
Alarm Information
On the Alarms and Events page, click the Current Alarms tab. The alarm Link to
the Host Port Is Down may be displayed on the tab page.
Possible Causes
Possible causes for a Fibre Channel link failure:
● The optical transceiver is faulty.
● The optical transceiver is incompatible with the front-end port.
● The rate of the optical transceiver is different from that of the front-end port.
● The optical fiber is improperly connected or faulty.
● The port rate of the storage device is different from that of its peer end.
– In a direct-connection network, the rate of the Fibre Channel front-end
port on the storage device is different from that of the host bus adapter
(HBA) on the application server.
– In a switch-based network, the rate of the switch is different from that of
the Fibre Channel front-end port on the storage device or that of the
HBA on the application server.
Impact
An unavailable Fibre Channel link causes a link down failure, service interruption,
and data loss between the application server and the storage system.
Fault Diagnosis
No
No
No
No
No
End
Procedure
● Cause 1: The optical transceiver is faulty.
▪ If yes, go to b.
▪ If yes, go to b.
▪ If yes, go to b.
▪ If it is a switch-based network, go to b.
▪ If it is a direct-connection network, go to 7.
b. Check whether the rate of the Fibre Channel front-end port on the
storage device is the same as that of the switch port connecting to the
storage device.
NOTE
For details about how to check the rate of a switch port or a Fibre Channel HBA,
consult the switch or HBA manufacturer or refer to the product manuals.
▪ If yes, go to e.
▪ If no, go to c.
c. Adjust Working Rate of the Fibre Channel front-end port to the rate of
the switch port.
d. After the rate is adjusted to the same, check whether the Link indicator
of the Fibre Channel front-end port is steady green or blue and its
Running Status on the DeviceManager is Link up.
▪ If no, go to e.
e. Check whether the rate of the Fibre Channel HBA on the application
server is the same as that of the switch port connecting to the
application server.
▪ If no, go to f.
f. Adjust the rate of the switch port to the rate of the Fibre Channel HBA.
g. After the rate is adjusted to the same, check whether the Link indicator
of the Fibre Channel front-end port is steady green or blue and its
Running Status on the DeviceManager is Link up.
▪ If no, go to i.
i. Adjust Working Rate of the Fibre Channel front-end port to the rate of
the Fibre Channel HBA.
j. After the rate is adjusted to the same, check whether the Link indicator
of the Fibre Channel front-end port is steady green or blue and its
Running Status on the DeviceManager is Link up.
Symptom
Log in to the DeviceManager and click System. On the system page, click
to display the rear view. On the rear view of the storage device, click the
interface module in the red square. View the information about the iSCSI front-
end ports. Running Status of an iSCSI port is Link down.
The Link indicator of the iSCSI front-end port is steady red or off.
Alarm Information
On the Alarms and Events page, click the Current Alarms tab. The alarm Link to
the Host Port Is Down may be displayed on the tab page.
Possible Causes
Possible causes for an iSCSI link failure:
● The IP address is incorrectly configured for the iSCSI front-end port on the
storage device or the service network port on the application server.
● The network cable between the application server and storage device is
improperly connected or faulty.
Impact
An unavailable iSCSI link causes a link down failure, service interruption, and data
loss between the application server and the storage system.
Fault Diagnosis
No
No
End
Procedure
● Cause 1: The IP address is incorrectly configured for the iSCSI front-end port
on the storage device or the service network port on the application server.
a. Ping the iSCSI front-end port from the application server. Check whether
the iSCSI front-end port is reachable.
▪ If no, go to b
b. Check whether the network is a direct-connection network or a switch-
based network.
▪ If it is a direct-connection network, go to c.
▪ If it is a switch-based network, go to d.
c. Modify the IP address of the iSCSI front-end port to be on the same
network segment as the IP address of the service network port. Go to e.
NOTE
You can also modify the IP address of the service network port to be on the same
network segment as the IP address of the iSCSI front-end port.
d. Add a route between the iSCSI front-end port and the service network
port to enable the communication between them. Go to e.
e. Ping the iSCSI front-end port from the application server again. Check
whether the iSCSI front-end port is reachable.
▪ If yes, go to c.
Symptom
CHAP authentication is enabled for initiators on the DeviceManager and the
automatic target reconnection is configured on Windows Server 2003. However,
after CHAP authentication is disabled on the DeviceManager and the application
servers are restarted, the application servers cannot be reconnected to the targets
automatically.
Alarm Information
None
Possible Causes
After CHAP authentication is disabled on the DeviceManager, CHAP authentication
is not updated on the application servers, causing the application servers unable
to access the storage system.
Procedure
Step 1 Open Microsoft iSCSI Initiator on the application server.
Step 2 Click the Persistent Targets tab. In the Select a Target list, delete the IP
addresses of the iSCSI front-end ports on the storage system.
Step 3 Click the Targets tab. In the Targets list, select the IP addresses of the iSCSI front-
end ports on the storage device.
Step 4 Click Log On.
The Log On to Target dialog box is displayed.
Step 5 Select Automatically restore this connection when the system boots and click
OK to save the settings. Restart the application server.
If the application server automatically reconnects to the targets after it is started
up, the fault is rectified. Otherwise, keep the fault environment intact and contact
technical support.
----End
Related Information
None
Possible Causes
When the client executes a process, the client may receive a message indicating
that the TCP connection must be disabled. However, the TCP connection in Linux
client is not disabled properly. As a result, the TCP connection resources are not
completely cleared. When the TCP connection is established again, an exception
occurs and the establishment times out. The client attempts to establish the TCP
connection again and again. However, the storage system does not sense these
attempts.
Fault Diagnosis
Start
Yes
Yes
No
Are logs and port status on the An unknown error exists. Contact Huawei
client normal? technical support to locate the error.
Yes
End
Procedure
Step 1 Restart the Linux operating system.
Step 2 Load the NFS share again and check whether the fault is rectified.
● If yes, keep the fault environment intact and contact technical support
engineers.
● If no, no further action is required.
----End
Alarm Information
None
Possible Causes
System internal error.
Procedure
Step 1 Run the delete user command on the CLI to delete the tenant administrator.
After the command is executed, check whether the administrator is deleted
successfully.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact Huawei technical
support.
----End
Symptom
After the local Huawei storage system is powered off unexpectedly, the file system
created based on the eDevLUN is lost and the eDevLUN becomes a raw disk.
Possible Causes
After the local storage system is recovered, it detects the third-party LUN and
reports the eDevLUN to the host. I/Os from the host to the eDevLUN fail before
the eDevLUN is recovered. As a result, the file system created based on the
eDevLUN cannot be read, and the host detects the eDevLUN as a raw disk.
Fault Diagnosis
Yes
Yes
Yes
End
Procedure
Step 1 Check whether the third-party LUN corresponding to the eDevLUN is connected.
● If yes, go to Step 2.
● If no, keep the fault environment intact and contact technical support
engineers of the remote storage system.
c. Connect the cable between the host and local storage system (or enable
the host's optical port that connects to the switch).
● If no, keep the fault environment intact and contact technical support
engineers.
Step 3 Check whether the fault is rectified, the file system created based on the eDevLUN
is recovered, and original files remain.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Symptom
Log in to the CLI, and run the show consistency_group general command to
check information about the remote replication consistency group. The health
status of the consistency group is fault.
Alarm Information
On the Alarms and Events page, click the Current Alarms tab. The alarm
indicating that Remote Replication Consistency Group Is Unavailable may exist.
Possible Causes
Possible causes are as follows:
● The consistency group is manually deleted from the local computer upon the
link interruption.
● Adding members to the consistency group fails, removing members from the
consistency group fails, or a primary/secondary switchover of the consistency
group fails.
Fault Diagnosis
No
Whether
adding/removing Yes Delete the consistency group,
members to the consistency group create a new consistency group,
fails, or a primary/secondary and add members to the newly
switchover of the consistency created consistency group.
group fails?
No
The fault is unknown. Contact
technical support engineers.
End
Procedure
● Cause 1: The remote replication link is interrupted.
a. Check whether an alarm indicating that Replication Link Is Down exists.
▪ If yes, go to b.
a. Delete the consistency group from the primary end and secondary end,
and create a consistency group again.
b. Add members to the consistency group.
Check whether the invalid status is solved.
Symptom
Log in to the CLI of the storage system and run show clone secondary_lun
clone_id=?. Running Status of a secondary LUN is Interrupted.
Alarm Information
On the Alarms and Events page of the DeviceManager, click the Current Alarms
tab. The alarm Clone Pair Is Abnormally Interrupted is displayed.
Possible Causes
● The I/O processing mechanism is malfunctioning.
● A controller is faulty.
Fault Diagnosis
No
Yes
Is the LUN faulty? Rectify the LUN fault.
No
Synchronize or reversely
synchronize the clone.
Yes
Is the fault rectified?
No
End
Procedure
Step 1 Check whether the alarm Storage Pool Capacity Is About to Be Used Up exists.
● If yes, expand the storage pool. Then go to Step 3.
● If no, go to Step 2.
Step 2 Check whether the alarm LUN Is Faulty exists.
● If yes, keep the fault environment intact and contact technical support
engineers to handle the LUN fault. After the fault is rectified, go to Step 3.
● If no, go to Step 3.
Step 3 Synchronize or reversely synchronize the clone, and then verify that the Running
Status of the secondary LUN is still Interrupted.
● If yes, keep the fault environment intact and contact technical support
engineers.
● If no, no further action is required.
----End
Symptom
On the DeviceManager, Health Status of a mirrored LUN is Fault.
Possible Causes
● A mirror copy of the mirrored LUN malfunctions.
● The mirrored LUN malfunctions.
Fault Diagnosis
No
Yes
Does the mirrored Rectify the storage pool
LUN malfunction? fault.
No
End
Procedure
● Cause 1: A mirror copy of the mirrored LUN malfunctions.
a. Log in to the DeviceManager and check whether alarm Storage Pool Is
Faulty is reported, indicating that the storage pool where the mirror copy
resides malfunctions.
▪ If yes, go to b.
▪ If no, go to d.
d. The mirror copy is an eDevLUN. Check whether alarm External LUN Is
Faulty or another alarm related to heterogeneous disk arrays is reported.
▪ If yes, go to e.
▪ If yes, go to b.
Symptom
When NDMP is used for backup or restore, the storage system is powered off,
interrupting the backup or restore service.
Possible Causes
When NDMP is used for backup or restore, the storage system is powered off,
interrupting NDMP service. As a result, the backup server is disconnected from the
storage system, causing backup or restore failure.
Procedure
Step 1 Wait until the storage system is powered on, log in to DeviceManager, and choose
Settings > Storage Settings > File Storage Service > NDMP Settings to check
whether the NDMP service is enabled.
● If yes, go to Step 3.
● If no, go to Step 2.
Step 4 Start the backup or restore service and check whether the fault is rectified.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact technical support
engineers.
----End
Symptom
After a storage system normally restarts, NMDP authentication in NetBackup may
fail.
Possible Causes
During the restart process of the storage system, the NDMP service has been
started before the Internet Small Computer Systems Interface (iSCSI) (Fibre
Channel) driver is successfully loaded.
Procedure
Step 1 Wait until the storage system is successfully powered on, log in to OceanStor
DeviceManager, and choose Settings > Storage Settings > File Storage Service >
NDMP Settings.
Step 2 Click Restart Service.
The NDMP service is restarted.
Step 3 Perform NDMP authentication in NetBackup and check whether the fault is
rectified.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact Huawei technical
support.
----End
Possible Causes
The validity period of the local authentication user's password is set to 180 days.
The password expires.
Procedure
Step 1 Change the password of the local authentication user.
1. Log in to DeviceManager.
Step 2 Use the new password to log in to the CIFS share again to check whether the fault
is rectified.
● If yes, no further action is required.
● If no, keep the fault environment intact and contact Huawei technical
support.
----End
Symptom
Services are interrupted, and the following alarm information is generated:
● On the CLI, enter the show lun general command. It is found that the health
status of some LUNs is fault.
● On the CLI, enter the show storage_pool general command. It is found that
the health status of some storage pools is fault.
● On the CLI, enter the show disk general command. It is found that the
health status of more than two disks in a storage pool is fault.
● On the CLI, enter the show alarm command. It is found that alarms are
generated indicating disk failure or removal.
Possible Causes
● Dual or multiple disks fail.
● Disks are faulty.
Impact
The storage pool is degraded or fails, and some or all storage services are
interrupted. Host services are interrupted.
Procedure
● Cause 1: Dual or multiple disks fail.
a. Check the mapping between disk slots and disk SNs.
i. Open the alarm list and extract all alarm information, refer to 3.2.7.3
Managing Current Alarms.
ii. Open the extracted alarm list. Use Disk Slot ID as the keyword to
search the entire list. The disk SN corresponding to the disk slot is
displayed:
Record the mapping between the disk slots and disk SNs.
b. Determine the disk failure or removal sequence based on the time when
the disk alarms or messages are generated.
i. Reinsert the removed or faulty disks in reverse order of disk removal
or failure.
ii. Check that indicators on the front panel of the disk enclosure are
normally turned on. Check whether disks are displayed on the device
figure on OceanStor DeviceManager and in the Normal state.
○ If yes, go to c.
○ If no, keep the fault environment intact and contact Huawei
technical support.
c. Repeat b until all disks are successfully recovered.
d. Check whether the storage pool and LUNs are recovered.
Possible Causes
The disk domain is degraded.
Impact
After a disk domain is degraded, I/O processing times out, causing file system
corruption. Host services are interrupted.
Fault Diagnosis
Figure 5-34 Flowchart for handling a file system corruption failure due to I/O
processing timeout
Yes
On the application server, check the
status of the file system
corresponding to the LUN.
Yes
End
Procedure
● Windows-based application server
a. Log in to the DeviceManager. Check the health status of the disk domain.
▪ If yes, go to b.
▪ If yes, go to b.
▪ If 1 is displayed, go to c.
v. After the file system is restored, run the mount command to mount
the disk to the original directory.
# mount /dev/sdb1 /directory
If the value in the red square increases, the UDP data packet loss occurs.
Possible Causes
The processing capabilities of server syslog-ng are limited and the UDP cache is
insufficient. You must modify the cache size of the server.
Procedure
Step 1 Run the sysctl -a | grep net.core and sysctl -a | grep udp commands on the
server to check the UDP cache size.
Step 2 Run the sysctl -w command on the server to set the UDP cache size to a larger
value.
Step 3 Run the sysctl -p command to enable the UDP cache configuration to take effect.
Step 4 Run the sysctl -a| grepnet.core and sysctl -a |grep udp commands on the server
to check the UDP cache size after the modification.
Step 5 Run the service syslog restart command to restart the syslog service.
----End
B Glossary
B
BBU Backup Battery Unit
D
DAS Direct-attached Storage
E
ESD Electrostatic Discharge
F
FC Fiber Channel
FRU Field Replaceable Unit
H
HBA Host Bus Adapter
I
IE Internet Explorer
IP Internet Protocol
iSCSI Internet Small Computer Systems Interface
L
LUN Logical Unit Number
N
NAS Network-attached Storage
O
OTDR Optical Time Domain Reflectometer
R
RAID Redundant Array of Independent Disks
S
SAN Storage Area Network
SAS Serial Attached SCSI
SCSI Small Computer System Interface
W
WWPN World Wide Port Name