HCIE-Storage V2.5 Learning Guide
HCIE-Storage V2.5 Learning Guide
HCIE-Storage
Learning Guide
V2.5
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of
their respective holders.
Notice
The purchased products, services and features are stipulated by the contract made
between Huawei and the customer. All or part of the products, services and features
described in this document may not be within the purchase scope or the usage scope.
Unless otherwise specified in the contract, all statements, information, and
recommendations in this document are provided "AS IS" without warranties,
guarantees or representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has
been made in the preparation of this document to ensure accuracy of the contents, but
all statements, information, and recommendations in this document do not constitute
a warranty of any kind, express or implied.
Website: http://e.huawei.com
Contents
1. AK mechanism: After data encryption has been enabled, the storage system
activates the AutoLock function of SEDs and uses AKs assigned by a key
manager. Access to the SEDs is protected by AutoLock and only the storage
system itself can access its SEDs. When the storage system accesses an SED, it
acquires an AK from the key manager. If the AK is consistent with the SED's, the
SED decrypts the data encryption key (DEK) for data encryption/decryption. If
the AKs are inconsistent, read and write operations will fail.
2. DEK mechanism: After AutoLock authentication has passed, the SED uses its
hardware circuits and internal DEK to encrypt or decrypt the data that is written
or read. The DEK encrypts data after it has been written into disks. The DEK
cannot be acquired separately, which means the original information on an SED
cannot be recovered in a mechanical way after it is removed from the storage
system.
Finally, the home controller of a LUN or file system achieves switchover among four
controllers within a few seconds.
End-to-end data integrity protection
The ANSI T10 Protection Information (PI) standard provides a way to check data
integrity during access to a storage system. The check is implemented based on the
PI field defined in the T10 standard. This standard adds an 8-byte PI field to the end
of each data block to implement data integrity check. In most cases, T10 PI is used to
ensure data integrity within a storage system.
Data Integrity Extensions (DIX) further extends the protection scope of T10 PI and
implements data integrity protection from applications to host HBAs. Therefore, a
combination of DIX and T10 PI can implement complete end-to-end data protection
from applications to disks.
Huawei hybrid flash storage system not only uses T10 PI to ensure the integrity of
internal data, but also supports a combination of DIX and T10 PI to protect end-to-
end data integrity from applications to disks. The storage system validates and
delivers the PI field in real time. If the host does not support PI, the storage system
adds the PI field to a host interface and delivers the field. In a storage system, PI
fields are forwarded, transmitted, and stored together with user data. Before user
data is read by a host again, the storage system uses PI fields to check the
correctness and integrity of user data.
Controller faults transparent to hosts
If any storage controller is faulty, the front-end interface modules quickly redirect the
I/Os of the faulty controller to other controllers. The host is unaware of the fault. The
Fibre Channel links remain connected, and services are running properly. No alarm or
event is reported.
1. Management isolation: Each tenant has its own administrators. The tenant
administrators can configure and manage their own storage resources only
through the GUI or RESTful APIs. Tenant administrators support role-based
permission control. When creating a tenant administrator, you must select the
role corresponding to the permission.
2. Service isolation: Each tenant has its own file systems, users/user groups, and
shares/exports. Users can access the file systems of the tenant only through the
tenant LIFs. The service data (mainly file systems, quotas, and snapshots), service
access, and service configurations (mainly NAS protocol configurations) are
isolated among multiple tenants.
3. Network isolation: Tenant networks are separated by VLANs and LIFs, preventing
unauthorized hosts from accessing the tenants' storage resources. Tenants use
logical interfaces (LIFs) to configure services. A LIF belongs only to one tenant to
achieve port isolation in a logical sense.
SmartQoS
SmartQoS is also called intelligent service quality control. It dynamically allocates
storage system resources to meet specific performance goals of certain applications.
SmartQoS enables you to set upper limits on IOPS or bandwidth for certain
applications. Based on the upper limits, SmartQoS can accurately limit performance
of these applications, thereby preventing them from contending for storage resources
with critical applications.
SmartQoS uses the I/O priority scheduling and I/O traffic control technologies based
on LUN, file system, or snapshot to ensure the service quality of data services.
SmartDedupe and SmartCompression
SmartDedupe and SmartCompression provide the data simplification service for file
systems and thin LUNs. They not only save storage space for customers but also
reduce the total cost ownership (TCO) of the enterprise IT architecture.
SmartDedupe deletes duplicate data blocks from a storage system to save storage
space. Huawei hybrid flash storage system supports inline deduplication. That is, only
the newly written data is deduplicated.
SmartCompression reorganizes data to save storage space and improve the data
transfer, processing, and storage efficiency without any data loss. Huawei hybrid
flash storage system supports inline compression. That is, only the newly written data
is compressed.
HyperReplication
1. Synchronous remote replication: Data is synchronized in real time to ensure data
consistency and minimize data loss in the event of a disaster.
2. Asynchronous remote replication: Data is periodically synchronized to minimize
service performance deterioration caused by the latency of long-haul data
transmission.
HyperMetro
HyperMetro (also called active-active feature) is a key technology of the active-active
data center solution. It ensures high-level data reliability and service continuity for
users. HyperMetro is an array-level active-active technology. Two active-active
Page 8
storage systems can be deployed in the same equipment room, the same city, or two
places that are 100 km away from each other.
HyperMirror
HyperMirror is the volume mirror software of Huawei hybrid flash storage system.
HyperMirror allows users to create two physical copies of a LUN. Each LUN copy can
reside in a local resource pool or be an external LUN. Each LUN copy has the same
virtual capacity as the mirror LUN. When a server writes data to a mirror LUN, the
storage system simultaneously writes the data to each copy of the mirror LUN. When
a server reads data from a mirror LUN, the storage system reads data from one copy
of the LUN. Even if one copy of a mirror LUN is temporarily unavailable (for
example, when the storage system that provides the storage pool is unavailable), the
server can still access the mirror LUN. The system memorizes LUN areas where data
has been written and synchronizes the changed data with the LUN copy after the
copy is available again.
HyperLock
With the development of technologies and explosive increase of information, secure
access and application of data are attached great importance. As required by laws
and regulations, important data such as case documents of courts, medical records,
and financial documents can only be read but cannot be written within a specific
period. Therefore, measures must be taken to prevent such data from being
tampered with. In the storage industry, write once read many (WORM) is the most
commonly used method to archive and back up data, ensure secure data access, and
prevent data tampering.
The WORM feature of Huawei hybrid flash storage systems is also called HyperLock.
After data is written to a file, the write permission of the file is removed so that the
file enters the read-only state. In the read-only state, the file can be read but cannot
be deleted, modified, or renamed. The WORM feature can prevent data from being
tampered with, meeting data security requirements of enterprises and organizations.
A file system with the WORM feature can be configured only by the administrator.
WORM modes are classified into regulatory compliance WORM (WORM-C) and
enterprise WORM (WORM-E) based on the administrator's permissions. The WORM-
C mode is mainly used in archiving scenarios where data protection mechanisms are
implemented in compliance with laws and regulations. The WORM-E mode is mainly
used for internal management of enterprises.
HyperVault
Huawei hybrid flash storage systems provide the HyperVault feature to implement
intra-system or inter-system file system data backup and restoration.
HyperVault supports local backup and remote backup.
4. NVMe multi-queue polling designed for multi-core Kunpeng 920 CPUs enables
lock-free processing of concurrent I/Os, fully utilizing the computing capacities of
processors.
5. Read requests to NVMe SSDs are prioritized, accelerating response to read
requests when data is being written into NVMe SSDs.
6. With the end-to-end NVMe design, the Huawei all-flash storage system offers a
minimum access latency of less than 100 μs.
Chipset
With long-term continuous accumulation and investment in the chipset field, Huawei
has developed some key chipsets for storage systems, such as the front-end interface
chipset (Hi1822), Kunpeng 920 chipset, Ascend AI chipset (Ascend 310), SSD
controller chipset, and baseboard management controller (BMC) chipset (Hi1710).
They are also integrated into Huawei all-flash storage systems.
1. Interface module chip: The Hi182x (IOC) chip is independently developed by
Huawei in the storage interface chip field. It integrates multiple protocol
interfaces, such as ETH interfaces of 100 Gbit/s, 40 Gbit/s, 25 Gbit/s, and 10
Gbit/s, and Fibre Channel interfaces of 32 Gbit/s, 16 Gbit/s, and 8 Gbit/s. This
chip features high interface density, rich protocol types, and flexible ports,
creating unique values for storage.
2. Kunpeng 920 chipset: The Kunpeng 920 chipset is a processor chipset
independently developed by Huawei. It features strong performance, high
throughput, and high energy efficiency to meet diversified computing
requirements of data centers. It can be widely used in scenarios such as big data
and distributed storage. The Kunpeng 920 chipset supports various protocols
such as DDR4, PCIe 4.0, SAS 3.0, and 100 Gbit/s RDMA to meet the requirements
of a wide range of scenarios.
3. Ascend AI chipset: The Ascend chipset is the first AI chipset independently
developed by Huawei. It features ultra-high computing efficiency and low power
consumption. Currently, it is the most powerful AI SoC for computing scenarios.
Ascend 310 is the first chipset of the Ascend series. It is manufactured by using
the 12 nm process and has a computing power of up to 16 TFLOPS.
4. SSD controller chip: HSSDs use Huawei-developed next-generation enterprise-
level controllers, which provide SAS 3.0 and PCIe 3.0 interfaces and feature high
performance and low power consumption. The chip uses enhanced ECC and
built-in RAID technologies to extend the SSD service life to meet enterprise-level
reliability requirements. In addition, this chip supports the latest DDR4, 12 Gbit/s
SAS, and 8 Gbit/s PCIe rates as well as Flash Translation Layer (FTL) hardware
acceleration to provide stable performance at a low latency for enterprise
applications.
5. BMC chipset: Hi1710 is a BMC chipset, including the A9 CPU, 8051 co-processor,
sensor circuit, control circuit, and interface circuit. It supports the intelligent
platform management interface (IPMI), which monitors and controls the
hardware components of the storage system. It provides various functions,
including system power control, controller monitoring, interface module
monitoring, power supply and BBU management, and fan monitoring.
Page 12
Each package on an HSSD consists of multiple physical dies. RAID 4 is used for
redundancy of data written to the dies, preventing data loss in the event of a single
die failure. The uncorrectable bit error rate (UBER) of SSDs in the industry is about
10-17. With the support of intra-disk RAID, the UBER of HSSDs is reduced to 10-18
(decreasing by one order of magnitude). For example, if intra-disk RAID is not used,
a bad block occurs when 11 PB of data is written to an SSD. If intra-disk RAID is
used, a bad block occurs when 110 PB of data is written to an SSD.
Wear leveling and anti–wear leveling
The SSD controller uses software algorithms to monitor and balance the P/E cycles
on blocks in the NAND flash. This prevents over-used blocks from failing and extends
the service life of the NAND flash.
1. Intra-disk wear leveling: HSSDs support dynamic and static wear leveling.
Dynamic wear leveling enables the SSD to write data preferentially to less-worn
blocks to balance P/E cycles. Static wear leveling allows the SSD to periodically
detect blocks with fewer P/E cycles and reclaim their data, ensuring that blocks
storing cold data can participate in wear leveling. HSSDs combine the two
solutions to ensure wear leveling.
2. Global wear leveling: The biggest difference between SSDs and HDDs lies in that
the amount of data written to SSDs is no longer unlimited, and the service life of
an SSD is inversely proportional to the amount of data written to the SSD.
Therefore, an all-flash storage system requires load balancing between SSDs to
prevent overly-used SSDs from failing. FlashLinkTM uses controller software and
disk drivers to regularly query the SSD wear degree from the SSD controller. In
addition, FlashLinkTM evenly distributes data to SSDs based on LBAs/fingerprints
to level the SSD wear degree.
3. Anti-wear leveling: When SSDs are approaching the end of their service life as
their wear degrees have reached 80% or above, multiple SSDs may fail
simultaneously if global wear leveling is still in use, resulting in data loss. In this
case, the system enables anti-global wear leveling to avoid simultaneous
failures. The system selects the most severely worn SSD and writes data to it as
long as it has idle space. This reduces that SSD's life faster than others, and you
are prompted to replace it sooner, avoiding simultaneous failures.
LDPC and FSP algorithms
HSSDs use the LDPC algorithm and the FSP technology to ensure data reliability.
Page 15
Huawei all-flash storage system supports T10 PI. Upon reception of data from a host,
the storage system inserts an 8-byte PI field to every 512 bytes of data before
performing internal processing such as forwarding to other nodes or saving the data
to the cache. After the data is written to disks, the disks verify the PI fields of the
data to detect any change to the data between reception and destaging to the disks.
As shown in the figure in the training materials, the green point indicates that a PI is
inserted into the data. The blue points indicate that a PI is calculated for the 512-
byte data and compared with the saved PI to verify data correctness.
When the host reads data, the disks verify the data to prevent changes to the data. If
any error occurs, the disks notify the upper-layer controller software, which then
recovers the data by using RAID. To prevent errors on the path between the disks
and the front end of the storage system, the storage system verifies the data again
before returning it to the host. If any error occurs, the storage system reads the data
from the disks in transparent mode or recovers the data using RAID to ensure end-
to-end data reliability from the front-end interface modules to the back-end disks.
Multi-layer redundancy and fault tolerance design
Huawei all-flash storage system uses the SmartMatrix multi-controller architecture
that supports linear expansion of system resources. The controller enclosure uses the
IP interconnection design and supports linear IP scale-out between controller
enclosures.
The management plane, control plane, and service plane are physically separated (in
different VLANs), and served by different components. Each plane can independently
detect, rectify, and isolate faults. Faults on the management plane and control plane
do not affect services. Service plane congestion does not affect system management
and control.
All components in Huawei all-flash storage system work in redundancy mode,
eliminating single points of failure. Huawei all-flash storage system provides multiple
redundancy protection mechanisms for the entire path from the host to the storage
system. Service continuity is guaranteed even if multiple field replaceable units
(FRUs) allowed by the redundancy scheme are faulty simultaneously or successively.
Key technologies for high reliability
1. Three copies across controller enclosures: For data with the same LBA, Huawei
all-flash storage system creates a pair between two controllers to form a dual-
copy relationship and creates a third copy in the memory of another controller. If
there is only one controller enclosure, three copies are stored on different
controllers in this controller enclosure, preventing data loss when any two
controllers become faulty at the same time. If there are two or more controller
enclosures, the third copy can be stored on a controller in another controller
enclosure. This prevents data loss when any two controllers are faulty at the
same time or a single controller enclosure (with four controllers) is faulty.
Page 17
2. Continuous cache mirroring: This prevents data loss or service interruption when
a controller is faulty and a new controller fault occurs before that controller
recovers. This ensures service continuity to the maximum extent. Generally, data
blocks 1 and 2 on controller A and data blocks 1 and 2 on controller B form are
mutually mirrored, respectively. If controller A is faulty, data blocks on controller
B are mirrored to controllers C and D, ensuring data redundancy. If controller D
becomes faulty before controller A recovers, data blocks on controllers B and C
are mirrored to each other to ensure data redundancy. If two controller
enclosures (housing eight controllers) are deployed, continuous mirroring is
implemented within a controller enclosure as long as two or more controllers in
the controller enclosure are working properly. If only one controller in a
controller enclosure is normal, the system mirrors data to a controller in the
other controller enclosure until only one controller is available in the storage
system. Continuous mirroring and back-end full interconnection allow up to
seven out of eight controllers to fail at the same time, achieving high service
availability.
Non-disruptive upgrade within seconds
Huawei all-flash storage systems of some models impose protection measures
against component failures and power failures. In addition, advanced technologies
are used to reduce risks of disk failures and data loss, ensuring high reliability of the
system. Huawei all-flash storage systems provide multiple advanced data protection
technologies to protect data against catastrophic disasters and ensure continuous
system running.
High availability architecture
Tolerating simultaneous failure of two controllers: The global cache provides three
cache copies across controller enclosures. If two controllers fail simultaneously, at
least one cache copy is available. A single controller enclosure can tolerate
simultaneous failure of two controllers with the three-copy mechanism.
Tolerating failure of a controller enclosure: The global cache provides three cache
copies across controller enclosures. A smart disk enclosure connects to 8 controllers
(in 2 controller enclosures). If a controller enclosure fails, at least one cache copy is
available.
Tolerating successive failure of 7 out of 8 controllers: The global cache provides
continuous mirroring to tolerate successive failure of 7 out of 8 controllers (on 2
controller enclosures).
Zero interruption upon controller failure
Page 18
The front-end ports are the same as common Ethernet ports. Each physical port
provides one host connection and has one MAC address.
Local logical interfaces (LIFs) are created for internal links. Four internal links
connect to all controllers in an enclosure. Each controller has a local LIF.
IP addresses are configured on the LIFs of the controllers. The host establishes IP
connections with the LIFs.
If the LIF goes down upon a controller failure, the IP address automatically fails over
to the LIF of another controller.
and data are accelerated by using DRAM for the hottest data and SCM cache for the
second hottest data. This reduces 30% of latency.
FlashLink
1. Multi-core technology: Huawei-developed CPUs are used to provide the most
number of CPUs and CPU cores in the same controller in the industry. Host I/O
requests are distributed to vNodes based on the intelligent distribution
algorithm. Services are processed in vNodes in an end-to-end manner, avoiding
cross-CPU communication overheads, cross-CPU remote memory access
overheads, and CPU conflicts. In this way, the storage performance increases
linearly as the number of CPUs grows. All CPU cores are grouped in the vNode.
Each service group corresponds to a CPU core group. The CPU cores in a service
group run only the corresponding service code. In this way, different service
groups do not interfere with each other. Different services are isolated and run
on different cores through service grouping, avoiding CPU contention and
conflicts between service groups. In a service group, each core uses an
independent data structure to process service logic. This prevents the CPU cores
in a service group from accessing the same memory structure, and implements
lock-free design between CPU cores.
2. Large-block sequential write: Flash chips in SSDs can withstand a limited number
of erase times. In traditional RAID overwrite mode, if data on an SSD becomes
hotspot data and is frequently modified, the number of erase times of the
corresponding flash chip will be quickly used up. Huawei all-flash storage
systems provide the large-block sequential write mechanism. In this mechanism,
controllers detect the data layouts on Huawei-developed SSDs and aggregate
discrete writes of multiple small blocks into sequential writes of large blocks.
Disk-controller collaboration is implemented to write data to SSDs in sequence.
This technology enables RAID 5, RAID 6, and RAID-TP to perform only one I/O
operation and avoid multiple read and write operations caused by discrete writes
of multiple small blocks. This makes the write performance of RAID 5, RAID 6,
and RAID-TP almost the same.
3. Hot and cold data separation: Hot data and cold data are identified in the
storage system. The cooperation between SSDs and controllers improves the
garbage collection performance, reduces the number of erase times on SSDs,
and extends the service life of SSDs. Data with different change frequencies is
written to different SSD blocks, which can reduce garbage collection. Metadata is
modified more frequently than user data, so the metadata and user data are
written into different SSD areas. The data in garbage collection is also different
from the newly written data in terms of coldness and hotness, and they are also
written into different SSD areas. In an ideal situation, garbage collection would
expect all data in a block to be invalid so that the whole block could be erased
without data movement. This would minimize write amplification.
4. I/O priority adjustment: Resource priorities are assigned to different I/O types to
ensure I/O processing based on the SLAs. A highway has normal lanes for
general traffic, but it also has emergency lanes for vehicles which need to travel
faster. Similarly, priority adjustment lowers latency by setting different priorities
for different types of I/Os by their SLAs for resources.
Page 20
1. Internal Key Manager, which is the built-in key management system of the
storage system, is designed with the best practices of NIST SP 800-57. It
generates, updates, backs up, restores, and destroys keys, and provides
hierarchical key protection. Internal Key Manager is easy to deploy, configure,
and manage. It is recommended if FIPS 140-2 certification is not required and
the key management system is only used by the storage systems in a data
center.
2. External Key Manager uses the standard KMIP and TLS protocols and complies
with the key security requirements of FIPS 140-2 SafeNet. External Key Manager
is recommended if FIPS 140-2 certification is required or multiple systems in a
data center require centralized key management.
Role-based permission management
Preset default roles (as listed the following table): Default roles of system
management users and tenant management users are preset in the system.
Role
Default Role Permission
Group
User-defined roles: The system allows such roles to define permissions as required. Create
a role and select the function permissions and object permissions required by the role.
employs numerous key technologies in virtual machines (VMs) to deploy VMs fast,
enhance VMs' bearing capability and operation efficiency, and streamline storage
management in virtual environments, helping you easily cope with storage in virtual
environments.
Application scenarios of multi-protocol access
The storage system allows NFS sharing and CIFS sharing to be configured for the
same file system concurrently. Huawei all-flash storage systems can support both
SMB and NFS services at the same time.
storage, backup, and archiving of financial electronic check images, audio and video
recordings, medical images, government and enterprise electronic documents, and
Internet of Vehicles (IoV).
For more information, log in to http://support.huawei.com to obtain the relevant product
documentation.
OAM: indicates the storage management plane, which provides functions such as
deployment, upgrade, capacity expansion, monitoring, and alarming.
During system initialization, Huawei distributed storage system sets partitions for
each disk based on the value of N and the number of disks. For example, the default
value of N is 3600 for two-copy backup. If the system has 36 disks, each disk has 100
partitions. The partition-disk mapping is configured during system initialization and
dynamically adjusted based on the number of disks. The partition-disk mapping table
occupies only a small space, and Huawei distributed block storage nodes store the
mapping table in the memory for rapid routing. Huawei distributed block storage
Page 25
does not employ the centralized metadata management mechanism and therefore
does not have performance bottlenecks incurred by the metadata service.
Huawei distributed block storage logically divides a LUN by every 1 MB of space. For
example, a LUN of 1 GB space is divided into 1024 slices of 1 MB space.
When an application accesses block storage, the SCSI command carries the LUN ID,
LBA ID, and I/O data to be read/written. The OS forwards the message to the VBS of
the local node. The VBS generates a key based on the LUN ID and LBA ID. The key
contains rounding information of the LBA ID based on the unit of 1 MB. The result
calculated using DHT hash indicates the partition. The specific disk is located based
on the partition-disk mapping recorded in the memory. The VBS forwards the I/O to
the OSD to which the disk belongs. For example, if an application needs to access the
4 KB data identified by an address starting with LUN1+LBA1, Huawei distributed
storage first constructs "key=LUN1+LBA1/1M", calculates the hash value for this key,
performs modulo operation for the value N, gets the partition number, and then
obtains the disk of the data based on the partition-hard disk mapping.
Each OSD manages a disk. During system initialization, the OSD divides the disk into
slices of 1 MB and records the slice allocation information in the metadata
management area of the disk. After receiving an I/O from the VBS, the OSD searches
for the data fragment information on the disk based on the key, reads or writes the
data, and returns the data to the VBS. In this way, the entire data routing process is
completed.
The DHT routing technology helps Huawei distributed storage quickly locate the
specific location where data should be stored based on service I/Os, avoiding
searching and computing in massive data. This technology uses Huawei-developed
algorithms to ensure that data is balanced among disks. In addition, when hardware
is added or removed (due to faults or capacity expansion), the system automatically
and quickly adjusts the hardware to ensure the validity of data migration, automatic
and quick self-healing, and automatic resource balancing.
Dynamic intelligent partitioning and static disk selection algorithms
During data persistence, Huawei distributed storage adopts two-layer algorithms to
optimize the performance and reliability of distributed storage. One is to create Plogs
and select the dynamic intelligent partitioning algorithm of PT for Plogs. The other is
to select the local static disk selection algorithm of OSDs for PT.
1. The dynamic intelligent partitioning algorithm introduces an adaptive negative
feedback mechanism to achieve superb reliability and performance. Its major
improvements and objectives are as follows:
a. Write reliability is not degraded. If the partition corresponding to a Plog falls
into a faulty disk, the Plog is discarded and a new Plog is selected to write
data.
b. Loads are balanced and hotspots are eliminated. In random access
scenarios, polling or distributed hash algorithms cannot fully ensure
balanced data layouts and disk access performance. For example, in some
storage systems based on the CRUSH hash algorithm, the utilization
difference between OSDs reaches 20%, causing continuous occurrence of
hotspot disks. In addition, disk fault recovery, slow disk hotspots, and QoS
Page 26
the same core to ensure operation atomicity. This avoids frequent multi-core
switchovers and improves the CPU cache hit ratio.
EC intelligent aggregation technology
Erasure coding (EC) increases the computing overhead, and a poor EC design brings
more write penalties. Therefore, the performance of products using EC may be
significantly lower than that of products using multi-copy storage.
Write penalty of EC: This section uses 4+2 redundancy as an example. If the size of
data to be written is less than 32 KB (for example, only 16 KB), 16 KB of data is
written for the first time. During the second write, the 16 KB data written earlier
must be read and combined before being written to disks. This causes the read
overhead. This problem does not occur when full stripes are delivered. The intelligent
aggregation EC based on append write ensures EC full-stripe write at any time,
reducing read/write network amplification and disk amplification by several times.
Data is aggregated at a time, reducing the CPU computing overhead and providing
ultimate peak performance. In the cache, data of multiple LUNs is aggregated into a
full stripe, reducing write amplification and improving performance.
Huawei distributed storage provides intelligent I/O aggregation to use different
policies for different I/Os, ensuring the read/write performance.
1. Large I/Os are formed into EC stripes and directly written to disks without being
cached, saving cache resources. When SSDs are used as cache media, the service
life of SSDs can be extended.
2. Small I/Os are written to the cache and an acknowledgement is returned
immediately.
3. In the log cache, small I/Os of different LUNs are aggregated into large I/Os to
significantly increase the probability of aggregation and improve performance.
Adaptive global deduplication and compression
Huawei distributed storage systems support adaptive inline and post-process global
deduplication and compression, which can provide ultimate space reduction and
reduce users' TCO with proper resource consumption. Adaptive inline and post-
process deduplication indicates that inline deduplication automatically stops when
the system resource usage reaches the threshold. Data is directly written to disks for
persistent storage. When system resources are idle, post-process deduplication starts.
After the deduplication is complete, the compression process starts. The compression
is 1 KB aligned. The LZ4 algorithm is used to support HZ9 deep compression to
obtain a better compression ratio. Deduplication and compression can be enabled or
disabled as required.
Huawei distributed storage supports global deduplication and compression, as well
as adaptive inline and post-process deduplication. Deduplication reduces write
amplification of disks before data is written to disks.
Huawei distributed storage adopts the opportunity table and fingerprint table
mechanism. After data enters the cache, the data is sliced into 8 KB data fragments.
The SHA-1 algorithm is used to calculate 8 KB data fingerprints. The opportunity
table is used to reduce invalid fingerprint space, thereby reducing cost in memory.
Multi-level cache technology
Page 28
Intelligent load balancing is used for access based on domain names (in active-
standby mode). It supports partitioning. Each partition can be configured with a
unique domain name and a customized load balancing policy.
Minimum impact on performance: Data I/Os and verification metadata are written to
disks as an atom.
Periodic verification: When the system service load is light, the system automatically
starts periodic background verification.
Both real-time verification and periodic background verification are supported. Real-
time verification: Write requests are verified on the access point of the system (the
VBS process). Host data is re-verified on the OSD process before being written to
disks. Data read by the host is verified on the VBS process. Periodical background
data integrity check: The system automatically enables periodical data integrity check
and self-healing when the workload is light.
Huawei distributed storage provides three verification mechanisms. It uses the CRC32
algorithm to protect user data blocks (4 KB). In addition, it supports host logical
block addressing (LBA) check and disk LBA check, perfectly addressing problems
related to silent data corruption, such as bit changes and read and write skew.
Huawei distributed storage provides two self-healing mechanisms: If faulty data fails
to be recovered in the local redundancy mechanism, the HyperMetro redundant data
will be used to recover the faulty data.
Sub-health management technology
Sub-health may cause slow system response or service interruption. Huawei
distributed storage supports fast sub-health check and isolation. It uses the fast-fail
function to control the impact on system performance within 5 seconds, ensuring
that latency-sensitive services are not interrupted in case a sub-health fault occurs.
1. Disk sub-health management: includes intelligent detection, diagnosis, isolation,
and warning. Huawei distributed storage can generate alarms and isolate disks
for mechanical disk faults, slow disks, Smart information, and UNC errors. It can
also cope with problems such as SSD card faults, slow cards, high temperature
faults, capacitor failures, and excessive number of erase times exceeding the sub-
health threshold. Through intelligent detection, Huawei distributed storage
collects information about Smart messages, statistical I/O latency, real-time I/O
latency, and I/O errors. Clustering and slow-disk detection algorithms are used to
diagnose abnormal disks or RAID controller cards. Through isolation and
warning, Huawei distributed storage notifies the MetaData Controller (MDC) to
isolate disks and report alarms after diagnosis. The MDC controls the distributed
cluster node status, data distribution rules, and data rebuilding rules.
2. Network sub-health management: If the NIC failure, rate decrease, link packet
loss, port fault, intermittent disconnection, or packet loss occurs or exceeds the
corresponding subhealth threshold, Huawei distributed storage locates the fault
and generates an alarm. It isolates network resources from the fault through
multi-level detection, intelligent diagnosis, and level-by-level isolation and
warning. Multi-level detection: The local network of a node quickly detects
exceptions such as intermittent disconnections, packet errors, and negotiated
rates. In addition, nodes are intelligently selected to send detection packets in an
adaptive manner to identify link latency exceptions and packet loss. Smart
diagnosis: Smart diagnosis is performed on network ports, NICs, and links based
on networking models and error messages. Level-by-level isolation and warning:
Page 31
Network ports, links, and nodes are isolated based on the diagnosis results and
alarms are reported.
3. Sub-health management for processes/services: includes cross-process/service
detection, intelligent diagnosis, and isolation and warning. Cross-process/service
detection: If the I/O access latency exceeds the specified threshold, an exception
is reported. Smart diagnosis: Huawei distributed storage diagnoses processes or
services with abnormal latency using the majority voting or clustering algorithm
based on the reported abnormal I/O latency of each process or service. Isolation
and warning: Abnormal processes or services are reported to the MDC for
isolation and alarms are reported. If CPU resources are used up or memory faults
occur, the system can locate sub-health faults and isolate related nodes.
4. Fast-Fail: ensures that the I/O latency of a single sub-healthy node is
controllable. Average I/O latency detection is used to check whether the average
I/O latency exceeds the threshold and whether a response is returned for the
I/O. If no response is returned, path-switching retry is triggered. Path switching
retry indicates that read I/Os are read from other copies or recalculated based
on EC, and write I/Os are written to Plogs to allocate space to other disks.
Fast failover
Cluster management (CM): manages clusters.
Service node management (SNM): provides functions such as process monitoring and
fault recovery.
If a node is temporarily faulty or subhealthy, a switchover is performed. The cluster
controller generates a temporary view based on the sub-healthy node, switches
services to other healthy nodes, and writes the replication I/Os to the temporary data
storage node.
The system checks whether the access latency of the sub-healthy node has recovered
every 5 minutes. If the latency has recovered, the temporary data is pushed to the
original sub-healthy node through the background process. After the data
transmission is complete, the temporary view is deleted and the services are switched
back to the original node. If not, the system removes the faulty node from the cluster
and performs global reconstruction.
Hardware faults transparent to the system
1. Memory protection is enabled upon power failures, ensuring data security.
2. Hot-swappable SAS disks in RAID 1 are used as system disks.
3. Power and fan modules are redundant.
4. Swappable mainboards and cable-free design are adopted, significantly
increasing node reliability while reducing 80% of replacement time.
Cabinet-level reliability
In multi-copy mode, different data copies are distributed in different cabinets. For
example, if the 3-copy storage mode is configured for a storage pool containing eight
cabinets, the system can still provide services when two cabinets become faulty.
In EC mode, data and parity fragments are distributed in different cabinets. For
example, if the 4+2 EC scheme is configured for a storage pool containing eight
cabinets, the system can still provide services when two cabinets become faulty.
Page 32
based on the service level of the user. For important services, high availability should be
considered when hardware damage occurs and during maintenance.
LLD planning and design: Output the LLD solution and document. Submit the LLD
document to the customer for review. Modify the document based on the customer's
comments.
Value-added feature planning and design: Plan and design value-added features based
on the customer's requirements and purchased value-added features.
For storage resource planning and design, requirement analysis refers to the process
of analyzing and sorting out the requirements or needs involved in a project to form
a complete and clear conclusion.
Requirement analysis involves functional requirements, non-functional requirements,
and standards and constraints.
In addition to core functions, availability, manageability, performance, security, and
cost must also be considered in requirement analysis.
Availability: indicates the probability and duration of normal system running during a
certain period. It is a comprehensive feature that measures the reliability,
maintainability, and maintenance support of the system.
Manageability: Storage manageability includes integrated console, remote
management, traceability, and automation.
1. Integrated console: integrates the management functions of multiple devices and
systems and provides end-to-end integrated management tools to simplify
administrators' operations.
2. Remote management: manages systems through the network on the remote
console. These devices or systems do not need to be managed by personnel at
the deployment site.
3. Traceability: ensures that the management operation history and important
events can be recorded.
4. Automation: The event-driven mode is used to implement automatic fault
diagnosis, periodic and automatic system check, and alarm reporting when the
threshold is exceeded.
Performance: Indicators of a physical system are designed based on the service level
agreement (SLA) for the overall system and different users. Performance design
covers not only performance indicators required by normal services, but also
performance requirements in abnormal cases, such as the burst peak performance,
fault recovery performance, and DR switchover performance.
Security: Security design must provide all-round security protection for the entire
system. The following aspects must be included: physical layer security, network
security, host security, application security, virtualization security, user security,
security management, and security service. Multiple security protection and
management measures are required to form a hierarchical security design.
Cost: The cost is always considered. An excellent design should always focus on the
total cost of ownership (TCO). When calculating the TCO, consider all associated
costs, including the purchase cost, installation cost, energy cost, upgrade cost,
migration cost, service cost, breakdown cost, security cost, risk cost, reclamation cost,
and handling cost. The cost and other design principles need to be coordinated based
on balance principles and best practices.
2.1.2.2.2 Hardware Planning
Device Selection
After receiving a device selection requirement, conduct industrial comparison and
communication to determine the appropriate technical standards. Then, select more
Page 38
than one supplier to perform device tests. Finally, output the device selection report
to provide technical basis for device procurement and acceptance.
The key principles of device selection are high product quality and cost-effectiveness
Storage devices are selected based on the capacity, throughput, and IOPS. Business
requirements vary with scenarios. Therefore, costs must be considered during the
evaluation.
In actual applications, device selection may involve other indicators, including but not
limited to those listed in the following table.
Indicator Description
Energy conservation
The equipment should have a low power consumption and be
and environmental
environment friendly.
protection
Compatibility Check
Check the compatibility based on the host operating system version, host
multipathing information, host application system information, Huawei storage
version, storage software, and multipathing software information provided by the
customer.
Use Huawei storage interoperability navigator to query the compatibility between
storage systems and application servers, switches, and cluster software, and evaluate
whether the live network environment meets the storage compatibility requirements.
If Huawei storage devices are used, you are advised to use Huawei storage
interoperability navigator to plan compatibility.
Page 39
Using Huawei all-flash storage as an example, when planning file services, you need
to consider storage pools, file systems, network planning, and NFS/CIFS shares.
In addition, you need to consider the planning of the host side (queue depth and I/O
alignment), network side (switch configuration, zone or VLAN division), and
application side (database parameters, files, and file groups) based on the
application type.
Advanced Feature Planning
Plan and design advanced storage features based on customer requirements and
purchased advanced features.
In addition, some advanced features must be planned and designed before being
used.
For example, before creating a remote replication task, you need to plan the network
and data. Remote replication involves the primary and secondary storage systems.
Therefore, before creating a remote replication task, you need to plan the
networking mode of remote replication and replication links between storage
systems. Data planning includes capacity planning and bandwidth planning. The
requirements of remote replication for the system capacity and network bandwidth
must be considered.
Solution Planning
Understand common solution types and plan and design solutions based on user
requirements.
Hot spare capacity: The storage system provides hot spare space to take over data from
failed member disks.
RAID usage: indicates the capacity used by parity data at different RAID levels.
Disk bandwidth performance: The total bandwidth provided by the back-end disks of a
storage device is the sum of the bandwidth provided by all disks. The minimum value is
recommended during device selection.
RAID level: A number of RAID levels have been developed, but just a few of them are still
in use.
I/O characteristics: Write operations consume most of disk resources. The read/write ratio
describes the ratio of read and write requests. The disk flushing ratio indicates the ratio
of disk flushing operations when the system responds to read/write requests.
2.1.3.1.2 Project Information
Collecting Application Information
During engineering survey, you are advised to collect application information,
including but not limited to data disaster recovery, virtualization platform, and
database platform. The following table is an example of data disaster recovery
information.
The collection items related to the virtualization platform and database platform are
more complex. For details, see Installation and Initialization > Site Planning
Guide > Collecting Live Network Information in the corresponding product
documentation.
Collecting Compatibility Evaluation Information
During the engineering survey, you are advised to collect compatibility evaluation
information, including but not limited to software and hardware information used for
SAN function compatibility evaluation and backup software compatibility evaluation.
Category Item
Patch version
Number of HBAs
Category Item
advised to configure the same number of loops on each RDMA interface module
based on the number of RDMA interface modules on each controller enclosure.
3. The number of disk enclosures connected to the expansion ports on the
controller enclosure and the number of disk enclosures connected to the back-
end ports cannot exceed the upper limit.
4. The expansion modules in the controller enclosure's slot H connect to each disk
enclosure's expansion module A, and those in slot L connect to each disk
enclosure's expansion module B.
5. A pair of SAS ports can cascade up to two SAS disk enclosures. One disk
enclosure is recommended.
6. A pair of RDMA ports can cascade up to two smart disk enclosures. One disk
enclosure is recommended.
7. Storage pools created in the storage system support disk-level redundancy policy
(common RAID mode) and enclosure-level redundancy policy (cross-enclosure
RAID mode).
When creating a storage pool using the enclosure-level redundancy policy,
ensure that the disks in the storage pool come from at least four disk
enclosures and each disk enclosure houses at least three disks of each
capacity type.
When a loop connects to only one SAS disk enclosure or smart disk
enclosure, the disk-level or enclosure-level redundancy policy can be
configured for disks in the disk enclosure.
When two SAS disk enclosures are cascaded in a loop: If the forward
redundancy connection is used, the disk-level or enclosure-level redundancy
policy can be configured for disks in the level-1 SAS disk enclosure, and only
the disk-level redundancy policy can be configured for disks in the level-2
SAS disk enclosure. If the forward and backward redundant connections are
used, the disk-level or enclosure-level redundancy policy can be configured
for the disks in the two SAS disk enclosures in the loop.
When two smart disk enclosures are cascaded in a loop and the forward
redundant connection or the forward and backward redundant connections
are used, the same redundancy policy can be configured. The disk-level or
enclosure-level redundancy policy can be configured for disks in the level-1
smart disk enclosure in the loop, and only the disk-level redundancy policy
can be configured for disks in the level-2 smart disk enclosure.
2.1.3.1.4 Network Planning
Front-End Network Planning
On a dual-link direct-connection network, an application server is connected to two
controllers of the storage system to form two paths for redundancy. The path
between the application server and the LUN's controller is the optimum one and the
other path is standby. In normal cases, UltraPath selects the optimum path for data
transfer. If the optimum path is down, UltraPath selects the standby path for data
transfer. After an optimum path recovers, UltraPath switches data transfer back to
Page 45
the optimum path again. The dual-link direct-connection network is the simplest and
most cost-effective connection mode of the storage network.
The multi-link single-switch networking mode adds one switch on the basis of dual-
link direct connection, improving data access and forwarding capabilities. A switch
expands host ports to improve the access capability of the storage system. Moreover,
switches extend the transmission distance by connecting remote application servers
to the storage system. Since only one switch is available in this networking mode, it
is vulnerable to single points of failure. There are four paths between the application
server and storage system. The two paths between the application server and LUN's
controller are the optimum paths, and the other two paths are standby.
In multi-link single-switch networking mode, UltraPath selects two optimum paths
for data transfer in normal cases. If one optimum path is faulty, UltraPath selects the
other optimum path for data transmission. If both optimum paths are faulty,
UltraPath uses the two standby paths for data transmission. After an optimum path
recovers, UltraPath switches data transfer back to the optimum path again.
Multi-link dual-switch networking adds one switch on the basis of multi-link single-
switch networking to offer dual-switch forwarding. With two switches, the network is
protected against single points of failure, which improves the network reliability.
There are four paths between the application server and storage system. The
UltraPath software works in the same way as that in the multi-path single-switch
networking mode.
Storage Port Planning
Storage ports are planned based on device models. For details, see the product
documentation of the corresponding device.
Planning Ethernet ports: Ethernet ports are physically visible on a device. They are the
basis for creating VLANs, bound ports, and logical ports. You can bind multiple
Ethernet ports into one port to improve bandwidth and data transfer efficiency.
Planning bound ports: Bind multiple Ethernet ports and specify the bound port name
for higher bandwidth and better redundancy. Port binding provides more bandwidth
and higher redundancy for links. Although ports are bound, each host still transmits
data through a single port. Therefore, the total bandwidth can be increased only
when there are multiple hosts. Determine whether to bind ports based on site
requirements. After ports are bound, their MTU changes to the default value. In
addition, you need to configure the port mode of the switch. Take Huawei switches
as an example. You must set the ports on a Huawei switch to work in static LACP
mode. The link aggregation modes vary with switch manufacturers. If a switch from
another manufacturer is used, contact technical support of the switch manufacturer
for specific link aggregation configurations. In addition, the port binding mode of the
storage system has some restrictions. For example, read-only users are not allowed
to bind Ethernet ports, and management network ports do not support port binding.
Pay attention to these restrictions when planning port binding.
Planning VLANs: VLANs logically divide the Ethernet port resources of the storage
system into multiple broadcast domains. In a VLAN, when service data is being sent
or received, a VLAN ID is configured for the data, so that the networks and services
of VLANs are isolated, further ensuring service data security and reliability. VLANs are
created based on Ethernet ports or bound ports. One physical port can belong to
Page 46
multiple VLANs. A bound port instead of one of its member ports can be used to
create a VLAN. The VLAN ID ranges from 1 to 4094. You can enter a VLAN ID or
VLAN IDs in batches.
Planning logical ports: Logical ports are created based on physical Ethernet ports,
bound ports, or VLANs and used for service operation. A physical port can be
configured with logical ports on the same network segment or different network
segments. Different physical ports on the same controller can be configured with
logical ports on the same network segment or different network segments. Logical
ports can be created based on Ethernet network ports, bound ports, or VLAN ports
only if these ports are not configured with any IP addresses. If logical ports are
created based on Ethernet network ports, bound ports, or VLAN ports, the Ethernet
network ports, bound ports, or VLAN ports can only be used for one storage service,
such as block storage service or file system storage service. When creating a logical
port, you need to specify an active port. If the active port fails, another standby port
will take over the services.
Planning IP Address Failover
You need to plan policies for IP address failover to meet service requirements. The
planning items of storage ports and IP address failover have been provided in the
training materials. This section introduces IP address failover.
IP address failover: A logical IP address fails over from a faulty port to an available
port. In this way, services are switched from the faulty port to the available port
without interruption. The faulty port takes over services back after it recovers.
During the IP address failover, services are switched from the faulty port to an
available port, ensuring service continuity and improving the reliability of paths for
accessing file systems. Users are not aware of this process. The essence of IP address
failover is a service switchover between ports. The ports can be Ethernet ports, bound
ports, or VLAN ports. IP addresses can float not only between logical ports that are
created based on Ethernet ports but also between logical ports that are created
based on other ports.
When a controller is faulty, the port selection priority for IP address failover is as
follows: ports in the quadrant of the peer controller > ports in the quadrants of other
controllers (in the sequence of quadrants A, B, C, and D). The following figure shows
the IP address failover upon a port fault.
Page 47
Before planning an IP address failover policy, you need to understand the meaning of
a failover group. Failover groups are categorized into system failover groups, VLAN
failover groups, and customized failover groups.
1. System failover group (Ethernet port and bond port failover group): The storage
system automatically adds all Ethernet ports and bound ports of the cluster to
the system failover group.
2. VLAN failover group: When VLANs are created, the storage system automatically
adds VLANs with the same ID to a failover group, that is, each VLAN ID
corresponds a failover group.
3. Customized failover group: You can add ports whose IP addresses can float to a
customized failover group. The port failover policy in the customized group does
not require a symmetric network. During the failover, select the same type of
Page 48
ports based on the order in which the ports are added to the customized failover
group. If such ports are unavailable, select available ports of other types based
on the preceding order.
2.1.3.1.5 Service Planning
Permission Planning
To prevent misoperations from compromising the storage system stability and service
data security, the storage system defines user levels and roles to determine user
permission and scope of permission.
1. User level: controls the operation or access permissions of users. However, not all
storage products adopt the concept of user level. For details, see the
corresponding product documentation. If the permissions of an administrator or
a read-only user on a device do not meet actual requirements, for example, if
the access permissions of a user need to be upgraded to the management
permissions or the management permissions of a user needs to be degraded to
access rights, the super administrator can be used to adjust the permission level
of the current user.
2. User role: defines the scope of operations that a user can perform or the objects
that a user can access. For details, see the corresponding product
documentation. The storage system provides two types of roles: preset roles and
custom roles.
The following table uses Huawei all-flash storage as an example to describe some
preset roles and their permissions.
Super
All permissions over the system
administrator
In the Huawei storage system, after logging in to DeviceManager, you can choose
Settings > User and Security > Users and Roles > Role Management to view the
permissions and operation scope of the current account.
Allocating and Using Space (Block Service)
Take Huawei all-flash storage as an example. When planning space allocation and
usage, you need to consider storage pools, LUNs, and mapping views.
A storage pool is a container that stores storage space resources. To better utilize the
storage space of a storage system, you need to properly plan the redundancy policy,
RAID policy, and hot spare policy of a storage pool based on actual service
requirements.
When creating a storage pool, you can set an alarm threshold for capacity usage. The
default threshold is 80%. Capacity alarm is particularly important when thin LUNs
are used. Set a proper alarm threshold based on the data growth speed to prevent
service interruption due to insufficient capacity of the storage pool.
The storage system supports two redundancy policies: disk-level redundancy and
enclosure-level redundancy. The enclosure-level redundancy policy can be set only for
new storage pools. You cannot modify the redundancy policy (disk-level or enclosure-
level redundancy) for a storage pool that has been created.
Disk redundancy: Chunks in a chunk group come from different SSDs. With this
redundancy policy used, the system can tolerate disk failures within the RAID
redundancy capacity.
Enclosure redundancy: Chunks in a chunk group come from different SSDs and
are distributed in different enclosures if possible. In addition, the number of
chunks in each enclosure does not exceed the RAID redundancy. This policy
enables the system to tolerate a single disk enclosure failure without service
interruption or data loss.
Disk redundancy delivers equal or higher RAID usage as compared with enclosure
redundancy while enclosure redundancy provides higher reliability.
RAID usage: At the same RAID level, the RAID usage of the disk-level
redundancy policy is greater than or equal to that of the enclosure-level
redundancy policy. For example, the storage system has four disk enclosures
configured, each of which has five disks configured. Two storage pools
containing all disks with the disk-level redundancy policy and the enclosure-level
Page 50
redundancy policy are created, respectively. The hot spare policy is set to low
(one disk) and the RAID level is set to RAID-TP. For the storage pool with the
disk-level redundancy policy, the number of RAID columns is 19, and the RAID
usage is calculated in the formula: [(19 – 3)/19] x 100% = 84.21%. For the
storage pool with the enclosure-level redundancy policy, the number of RAID
columns is 7, and the RAID usage is calculated in the formula: [(7 – 3)/7] ×
100% = 57.14%.
Reliability: Enclosure redundancy can tolerate an enclosure failure while disk
redundancy cannot. Enclosure redundancy provides a higher reliability than disk
redundancy.
When planning RAID policies, note that different RAID levels have similar impact on
performance. As the redundancy level increases, the performance slightly
deteriorates.
The performance of different I/O models (random/sequential read/write) is different.
Random read I/O performance is equivalent to sequential read I/O performance.
Sequential write I/O performance is superior to random write I/O performance.
In the storage planning and deployment stage, the most suitable RAID level should
be selected based on service requirements, and the performance differences, space
utilization, and reliability should also be considered. Recommended RAID policy
configuration: RAID-TP is recommended for core services (such as billing systems of
carriers or financial A-class online transaction systems). For non-core services, RAID 6
or RAID 5 is recommended.
In addition, in the enclosure-level redundancy policy, the storage system provides two
protection levels: RAID 6 and RAID-TP.
For hot spare policy planning, note that Huawei all-flash storage system adopts RAID
2.0+ underlying virtualization technology. The hot spare space is distributed on each
member disk in the storage pool. For ease of understanding, the hot spare space is
represented by the number of hot spare disks on DeviceManager. If spare disks are
sufficient, the default number of hot spare disks is 1, which can meet the
requirements of most scenarios. Even if the hot spare space is used up, the storage
system can still use the free space of the storage pool for reconstruction to ensure
the reliability. If the remaining capacity of a storage pool is about to be used up and
the storage system is configured with the data protection feature, replace the faulty
disk in a timely manner. If the total number of hard disks is less than 13, the number
of hot spare disks must meet the following condition: Total number of hard disks –
Number of hot spare disks ≥ 5.
To achieve the optimal performance of the storage system, you need to select a
proper policy for LUNs based on the actual data storage situation. The key
parameters include the application type, number of LUNs, and LUN capacity.
The planning of the application type is closely related to the actual application type.
Take the SQL Server OLAP application as an example. You are advised to select
SQL_Server_OLAP. The corresponding policy is that the application request size is 32
KB and SmartCompression is enabled. The storage system compresses all written
data. SmartDedupe is disabled. The default deduplication granularity is 8 KB. Each 8
KB page in the SQL Server database contains a header with a unique field. Therefore,
Page 51
enabling SmartDedupe does not reduce data storage space. You are not advised to
enable SmartDedupe.
Regarding the capacity and number of LUNs, different from traditional RAID groups
with 10 to 20 disks, a storage pool that leverages the RAID 2.0+ mechanism creates
LUNs using all disks in a disk domain, which can have dozens of or even more than
100 disks. To bring disk performance into full play, it is recommended that you
configure LUN capacity and quantity based on the following rules:
Total number of LUNs in a disk domain ≥ Number of disks x 4/32 (4 is the proper
number of concurrent access requests per disk, and 32 is the default maximum
queue depth of a LUN). You are advised to create 8 to 16 LUNs to store SQL Server
data files and 2 to 8 LUNs to store tempdb files. This configuration provides a
maximum capacity of 32 TB for a database. If you need a larger capacity, change the
number of LUNs as required.
In addition, it is recommended that the capacity of a single LUN be less than or equal
to 2 TB and that the LUN capacity be as large as possible to reduce management
overheads.
A mapping view defines logical mappings among LUNs, array ports, and host ports.
You are advised to create a mapping view based on the following rules:
A LUN group is an object designed to facilitate LUN resource management. Typically,
LUNs that serve the same service should be added to the same LUN group.
A host group is a set of hosts that share storage resources. Each host contains
multiple initiators (host ports). You are advised to create a host for each server and
add all initiators of the server to the host.
Port groups help you allocate storage ports in a more fine-grained manner. Port
groups are not mandatory. However, it is recommended that you allocate a port
group for the SQL Server application to facilitate O&M and reduce performance
impacts between applications. To prevent single points of failure, a port group must
contain at least one port on each controller.
Allocating Space (File Service)
Using Huawei all-flash storage as an example, when planning space allocation and
usage, you need to consider the storage pools, file systems, and shares.
The planning for a storage pool is similar to that for block services.
The following table lists the parameters that need to be considered during file
system planning.
Parameter Description
Parameter Description
NFSv4 ACL, or NFSv3 ACL.
Capacity Alarm Alarm threshold of the planned file system capacity. An alarm
Threshold (%) will be generated when the threshold is reached.
Snapshot Directory Indicates whether the directory of the file system snapshots is
Visibility visible.
Management network:
A management network port can be a GE or 10GE
network port.
The number of management network ports can be one
or two. You are advised to bind two management
Compute node network ports in bond1 mode.
Local link IPv6 addresses starting with FE80 and
multicast IPv6 addresses starting with FF00 cannot be
used. HyperMetro and HyperReplication are not
supported when IPv6 addresses are used.
Service network:
Page 55
Management network:
A management network port can be a GE or 10GE
network port.
The requirements for the IP address type of each network are as follows:
When the TCP/IP protocol is used on each network, IPv4 and IPv6 addresses are
supported.
In the same cluster, the IP address type on the same network must be the same.
Page 57
The IP address type of the front-end storage network must be the same as that
of the back-end storage network.
Deployment mode
Huawei intelligent distributed block storage service can be deployed in two modes:
centralized deployment of compute nodes and storage nodes and separate
deployment of compute nodes and storage nodes.
The networking schemes, node port planning, and switch port planning of the two
deployment modes are different. For details, see the network planning guide of the
corresponding product.
Term Description
A slow disk responds to I/Os slowly. Consequently, the read and write
Slow disk performance of the slow disk is obviously lower than that of other
disks.
Dirty data refers to new data that has been cached but is not
Dirty data
persistently stored into disks.
important to check all parts of the I/O path to find out where the problem is and
eliminate it.
The common method for preliminarily demarcating a performance problem is to
compare the average latency differences between hosts and storage devices to
determine whether the problem is caused by the host, network and link, or storage
device.
If the read latency on both the host side and the storage side is large and the
difference is small, the problem may be caused by storage devices. The common
possible causes are that the disk performance, disk enclosure performance, and
mirror bandwidth reach the upper limit. If the write latency on both the host side and
the storage side is large, it cannot be determined that the problem occurs on the
storage device because the write latency includes the time for data transmission from
the host to the storage device. In this case, you need to check all possible causes.
If host latency is significantly greater than storage latency, the host configurations
may be incorrect or the network links are disconnected. For example, I/Os are stuck
due to insufficient concurrency capability of block devices or HBAs, the host CPU
usage reaches 100%, a bandwidth bottleneck occurs, the switch configurations are
improper, or the multipathing software selects incorrect paths.
After confirming that the problem is caused by the host, storage device, or network,
analyze and rectify the problem.
The training material does not describe how to view the latency. This slide describes
how to view the latency on common operating systems.
Checking the latency on a Linux host
1. Method 1: Use the iostat tool to query the I/O latency on a Linux host. As shown
in the following figure, await indicates the average time for processing each I/O
request (in ms), that is, the I/O response time.
2. Method 2: Use the Linux performance test tool Vdbench to query the latency. As
shown in the following figure, resp indicates the average time for processing
each I/O request (in ms), that is, the I/O response time.
Page 61
3. Other methods: You can also use the performance measurement function of the
service software to query the host latency, for example, the AWR report of
Oracle.
Checking the latency on a Windows host
1. Method 1: Use IOmeter, a common Windows performance test tool, to query the
host latency.
The process of using the Windows Performance Monitor to monitor disk performance
and obtain the host latency is as follows:
On the Windows desktop, choose Start > Run. In the Run dialog box, enter
perfmon to open the performance monitoring tool page.
In the Performance Monitor window, choose Monitoring Tools > Performance
Monitor from the navigation tree on the left, and then click the Add button to
add performance items.
Page 63
In the Add Counters dialog box, select PhysicalDisk, add the performance items
to be monitored, click Add, and click OK.
The Windows Performance Monitor starts to monitor the disk performance in
real time.
Small blocks, 2 KB to 8 KB
Random access
OLTP High IOPS and low latency
20% to 60% writes, high
concurrency
Online transaction processing (OLTP) involves mostly small random I/Os and is sensitive
to response latency. Online analytical processing (OLAP) is the most important
application of the data warehouse system. Most OLAP applications involve multi-channel
large sequential I/Os and are sensitive to bandwidth. VDI involves mostly small random
I/Os with high hit ratios and is sensitive to response latency.
In addition, there is a typical I/O model applied to SPC-1. SPC-1 is a foremost
authoritative and well-recognized SAN performance testing benchmark model developed
by the Storage Performance Council (SPC) with small random I/Os, sensitive to IOPS and
latency.
OLTP
OLTP is a type of database application that allows many users to perform
transactional operations online.
From a database perspective, the service characteristics of OLTP are as follows:
1. Reading, writing, and changing of each transaction involve a small amount of
data.
2. Database data must be up-to-date. Therefore, high database availability is
required.
3. Many users are accessing the database simultaneously.
4. The database must be highly responsive and able to complete a transaction
within seconds.
From a storage perspective, the service characteristics of OLTP are as follows:
Page 64
Monitoring
Component Definition Recommended Threshold
Indicator
CPU usage in a
An alarm is triggered when the
Average measurement period
CPU usage reaches 90%. The
CPU usage (5s by default). It
alarm is cleared when the CPU
(%) indicates the service
usage drops to 85%.
load of the CPU.
Disk usage in a
An alarm is triggered when the
measurement period
disk usage reaches 90%. The
Disk Usage (%) (5s by default). It
alarm is cleared when the disk
indicates the service
usage drops to 70%.
load of a disk.
Monitoring
Component Definition Recommended Threshold
Indicator
response read/write LUN in a is triggered when the latency
time (μs) measurement period. reaches three times the
specified value. For example, if
a user requires the OLTP service
latency to be 15 ms, an alarm is
triggered when the latency
reaches 45 ms.
The performance bottleneck of random services usually lies on disks. The following
table lists the empirical values of random IOPS of disks (for reference only).
SATA disk: 30 to 60
SAS disk: 100 to 200
FC disk: 100 to 200
SSD: 5000 to 10000
OLAP
The technology behind OLAP services uses multidimensional structures to provide
rapid access to data for analysis. OLAP is a type of application that allows users to
execute complex statistics queries in a database for an extended time.
From a database perspective, the service characteristics of OLAP are as follows:
1. No data or only a small amount of data is modified.
2. The data query process is complex.
3. The data is used at a gradually declining frequency.
4. The query output is usually in the form of a statistical value.
From a storage perspective, the service characteristics of OLAP are as follows:
1. Every I/O is large in size, typically ranging from 64 KB to 1 MB.
2. Data is read in sequence.
3. When read operations are being performed, write operations are performed in a
temporary tablespace.
4. Online logs are seldom written. The number of write operations increases only
when data is loaded in batches.
The following table describes the performance monitoring items.
Page 66
Monitoring
Component Description Recommended Threshold
Indicator
CPU usage in a
An alarm is triggered when the
Average measurement period
CPU usage reaches 90%. The
Controller CPU usage (5s by default). It
alarm is cleared when the CPU
(%) indicates the service
usage drops to 85%.
load of the CPU.
Disk usage in a
An alarm is triggered when the
measurement period
disk usage reaches 90%. The
Disk Usage (%) (5s by default). It
alarm is cleared when the disk
indicates the service
usage drops to 70%.
load of a disk.
Select proper indexes. For example, create bitmap indexes for columns used
for classification. Create a function index for a function that is often used as
a where clause for query. Create anti-key indexes if a large number of
concurrent transactions are concentrated on a small amount of hotspot
data. Create local indexes for partition tables in the OLAP system to shorten
the time for inserting data in batches.
The fields for which indexes should be created include the fields that are
often used in the query expression, fields that are often used for table
joining, fields with a high degree of differentiation, and fields of foreign
keys.
The fields for which indexes should not be created include the fields used
only in functions and expressions, fields that are frequently updated, and
fields with a low degree of differentiation. In addition, do not create indexes
for the performance of a single SQL statement.
Unnecessary indexes must be deleted after being identified. Deleting useless
indexes can release more space and reduce the database load, improving
the performance of DML operations.
2. About partitions
Partitions are used to improve the performance of large tables. When the
number of records in a table exceeds 100 million, the table needs to be
partitioned.
During the query, the Oracle database automatically ignores irrelevant
partitions to reduce the query time.
Hash partitioning can be used to balance the load of tablespaces.
This feature facilitates file management, backup, and restoration.
3. About compression
When CPU resources are sufficient but I/O resources are insufficient,
compression can be used.
Compression helps the storage system reduce I/O operations and save
storage space.
Compression is especially suitable for a large amount of regular data.
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await
r_await w_await svctm %util
rrqm/s: number of read requests merged per second that were queued to the
device.
wrqm/s: number of write requests merged per second that were queued to the
device.
r/s: number of read requests per second.
w/s: number of write requests per second.
rkB/s: number of kilobytes read from the device per second.
wkB/s: number of kilobytes written to the device per second.
avgrq-sz: average I/O size.
avgqu-sz: average I/O queue length.
await: average wait time of each I/O operation.
svctm: average storage access time of each I/O operation.
%util: percentage of time in which the I/O queue is not empty during the
measurement period.
Sequential services: The value of util% is close to 100%. The values of rkb/s and
wkb/s should reach the theoretical bandwidth values of the channel. The value of
avgrq-sz must be equal to the value of Multi Block set for the Oracle database.
Random services: The values of r/s and w/s should be equal to the theoretically
calculated IOPS values. The value of await must be less than 30 ms.
This command can also be used to analyze the HBA status of the host. In the output
of the iostat command, avgqu-sz indicates the average queue depth of the block
devices of a LUN. If the value is greater than 10 for a long time, the problem is
probably caused by concurrency limits. As a result, I/O requests are piled up at the
block device layer of the host instead of being delivered to the storage side. In this
case, you can modify the HBA concurrency.
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await
r_await w_await svctm %util
sda 0.05 2.14 0.09 1.46 1.27 14.41 20.21 0.00 0.69
0.38 0.06
sdf 0.00 0.00 0.01 0.99 0.36 11.71 24.07 0.00 0.30
0.27 0.03
sdg 0.00 0.00 0.00 0.00 0.00 0.04 62.78 0.00 1.38
0.27 0.00
sdh 0.00 0.00 0.00 0.00 0.02 0.00 8.01 0.00 0.11
0.10 0.00
Page 69
top
The top command is a common performance analysis tool in the Linux operating
system. It can display the resource usage of each process in the system in real time to
help analyze whether the memory of the host is insufficient or whether the CPU
usage is high.
The load average values in the first line show the average numbers of processes
in the running queue in the past 1, 5, and 15 minutes, respectively.
The second line shows the number of processes in each state.
The third line shows the CPU usage.
us indicates the CPU usage in user mode.
sy indicates the CPU usage in kernel mode.
id indicates the CPU idle rate, which cannot be low. If the value is less than
10%, check whether it becomes a bottleneck.
si indicates the CPU usage of the software interrupt, which is related to
cards and should not be high.
The fourth line displays the usage of the physical memory. If the remaining
memory space is small, it will become a performance bottleneck.
The fifth line shows the usage of the swap partition. If the remaining space is
small, it will become a performance bottleneck.
sar
Run the following command to query the CPU performance:
cat
On a Linux host, you can run the cat /proc/cpuinfo command to view the CPU
frequency of the current host and check whether the CPU underclocking occurs.
Server Manager
An outdated HBA driver will result in large I/O splitting, inadequate concurrency, and
long I/O latency. These problems often occur in low load, full hit, and bandwidth-
intensive scenarios.
Windows host: Open Server Manager in Windows, and select the corresponding HBA
in the device list. The driver version is displayed on the Driver tab page of the device
properties panel.
Page 73
For the dd command, set oflag to direct when testing the write performance and set
iflag to direct when testing the read performance.
When testing a system's peak performance, you can raise the queue depth to
increase host write I/O pressure and the probability that I/Os are merged in the
queue. You can run the following command to temporarily change the queue depth
of a block device:
You can temporarily change the queue depth of a block device for performance
tuning. After the application server is restarted, the queue depth of the block device
is restored to the default value.
Scheduling algorithm
Linux kernel 2.6 supports four block device scheduling algorithms: Completely Fair
Queuing (CFQ), NOOP, deadline, and anticipatory.
CFQ I/O scheduling
In the latest kernel versions and distributions, CFQ is selected as the default
I/O scheduler and is the best choice for general-purpose servers.
CFQ tries to evenly distribute the I/O bandwidth to access requests to avoid
process starvation and achieve low latency. It is a compromise between the
deadline and as schedulers.
CFQ is the best choice for multimedia applications (such as video and audio)
and desktop systems.
CFQ assigns a priority to each I/O request. The I/O priority is independent of
the process priority. Read and write operations of a process with a high
priority cannot automatically inherit the high I/O priority.
Page 75
The scheduling algorithm will be restored to the default value after an application
server restarts.
Page 76
Prefetch function
The prefetch function of Linux is similar to the prefetch algorithm of a storage
device. The function applies only to sequential reads. It identifies sequence flows and
prefetches read_ahead_kb (expressed in sectors) of data. For example, the default
prefetch size of the SUSE 11 operating system is 512 sectors, that is, 256 KB. You can
run the cat command to query the prefetch size of the current block device.
You can increase the prefetch size to improve performance when many large files
need to be read. You can run the following command to change the prefetch size of
a block device:
I/O alignment
If master boot record (MBR) partitions are created in Linux or Windows earlier than
Windows Server 2003, the first 63 sectors of a disk are reserved for the MBR and
partition table. The first partition starts from the sixty-fourth sector by default. As a
result, misalignment occurs between data blocks (database or file system) of hosts
and data blocks stored in disks, causing poor I/O processing efficiency. This type of
performance tuning may not be involved in new versions of operating systems. It is
not mentioned in the training materials and is only for knowledge extension.
When creating MBR partitions in Windows earlier than Windows Server 2003, you are
advised to run the diskpart command to set partition alignment.
In a Linux operating system, you can use either of the following methods to resolve
I/O misalignment:
Method 1: Change the start location of partitions. When creating MBR partitions in
Linux, you are advised to run the fdisk command in expert mode and set the start
location of the first partition as the start location of the second extent on a LUN.
(The default extent size is 4 MB.) The following quick command is used to create an
MBR partition in /dev/sdb. The partition uses all space of /dev/sdb. The start sector
is set to 8192, that is, 4 MB.
Method 2: Use GPT partitions. The following quick command is used to create a GPT
partition in /dev/sdb. The partition uses all space of /dev/sdb. The start sector is set
to 8192, that is, 4 MB.
parted -s -- /dev/sdb "mklabel gpt" "unit s" "mkpart primary 8192 -1" "print"
Page 77
You can increase the prefetch size to improve performance when many large files
need to be read.
2.2.3.2.3 Decrease in CPU Clock Speed
During off-peak hours, hosts in some running modes may automatically decrease their
CPU clock speed to reduce power consumption. However, the decrease during off-peak
hours will affect the interaction between the host and storage systems, resulting in a
higher I/O latency on the host. Therefore, if customers are sensitive to latency during off-
peak hours, you can configure the host running mode to ensure that the CPU clock speed
does not decrease, delivering optimal performance.
As mentioned earlier, you can use the DirectX diagnostic tool to view the CPU clock speed
on a Windows host. On a Linux host, you can run the cat /proc/cpuinfo command to
view the CPU clock speed of the current host.
To ensure that the CPU clock speed does not decrease, you can change the host running
mode to high-performance mode on a Windows host. Navigation path: Start > Control
Panel > System and Security > Power Options > Choose or Customize a Power Plan >
High Performance. After selecting Category in View by, you can view the System and
Security item.
On a Linux host, you can perform the following operations to configure the host running
mode to ensure that the host CPU clock speed does not decrease:
Run the cd command to go to the /sys/devices/system/cpu/ directory.
Run the ll command to check the number of CPUx processes in the
/sys/devices/system/cpu directory. In CPUx, x indicates an integer.
Run the echo performance command for each CPUx in the
/sys/devices/system/cpu/ directory.
2.2.3.2.4 HBA
Concurrency limit on an HBA
Concurrency limit on an HBA indicates the maximum number of I/Os that can be
concurrently transmitted to each LUN. In high-concurrency service scenarios, the
performance is low due to insufficient concurrency of HBAs and block devices.
On the Windows operating system, the default number of current I/Os allowed by an
HBA is 128 in most cases but may be smaller in some versions. For example, in
Windows Server 2012 R2, the default number of current I/Os for Emulex HBAs is 32.
If the number of current I/Os is small, the host pressure cannot be completely
transferred to the storage device. If the latency on the host side differs significantly
from that on the storage device side, you can query the number of current I/Os
allowed by the current HBA by using the management software provided by the HBA
vendor, and change the number as required.
On the Linux operating system, HBA queue parameters vary depending on the HBA
type and driver. For details, see the specifications provided by HBA vendors. For
example, a QLogic dual-port 8 Gbit/s Fibre Channel HBA allows the maximum queue
depth of each LUN to be 32. If the difference between the latency on the host and
storage sides is large, run the iostat command to check whether the concurrency
bottleneck is reached.
Page 79
In the command output, avgqu-sz indicates the average queue depth of the block
devices where LUNs are created. If the value is greater than 10 for a long time, the
problem is probably caused by concurrency limits. As a result, I/O requests are piled
up at the block device layer of the host instead of being delivered to the storage side.
In this case, you need to change concurrency limits of HBAs.
Driver issue
An outdated HBA driver will result in large I/O splitting, inadequate concurrency, and
long I/O latency. These problems often occur in low load, full hit, and bandwidth-
intensive scenarios. If the HBA driver version is outdated, you are advised to update
the HBA driver version to the latest version.
For a Windows host, open the Server Manager window in Windows, and select the
corresponding HBA from the device list. The driver version is displayed on the Driver
tab page of the device properties panel.
Page 80
For a Linux host, you can run the lsscsi command to check the channel ID
corresponding to the current HBA. As shown in the following command output, the
channel IDs of the HBAs are 5 and 6.
Bandwidth of 8 Gbit/s
750 1400
Fibre Channel links
Bandwidth of 16 Gbit/s
1500 2900
Fibre Channel links
Ports on host HBAs, Fibre Channel switches, and Fibre Channel optical modules on
storage devices can work at different rates. The actual transmission rate is the minimum
rate among these rates. If the working rate of a link is improper or a link is faulty, the
system performance may deteriorate. In addition, load balancing and redundancy need to
be configured for links to ensure better performance and higher reliability.
On a Windows host, open the UltraPath Console and check whether the link status is
normal and whether the owning controller is working properly.
On a Linux host, run the upadmin show path command to check whether physical
paths are normal.
In some scenarios, links may be lost, that is, some initiator ports are absent. On
UltraPath, check whether the number of links is the same as the number of physical
connections. If link loss occurs in the IOPS-intensive scenario, I/O forwarding
performance deteriorates. If link loss occurs in the bandwidth-intensive scenario, a
physical connection is reduced, and the bandwidth decreases.
Configuring the multipathing software
If the front-end intelligent balancing mode, load balancing algorithm, or other
parameters of UltraPath are set inappropriately, I/O loads on links will be
unbalanced or the sequence of I/Os will be affected.
These prevent optimal performance delivery and adversely affect bandwidth
capability.
On a Windows host, use the UltraPath Console to query the multipathing parameter
settings. Navigation path: System > Global Settings
On a Linux host, run the upadmin show upconfig command to query the
multipathing parameter settings.
Intelligent front-end balancing
Huawei UltraPath 21.6.0 or later works closely with storage systems. UltraPath uses
the intelligent distribution algorithm to calculate the shard value of each I/O and
searches for the corresponding vNode in the storage system based on the shard
value. Based on the search result, UltraPath distributes the I/O to the front-end link
corresponding to the vNode. This prevents data forwarding between vNodes.
Load balancing algorithm
If the load balancing algorithm of UltraPath is inappropriate, I/O loads on each link
may be unbalanced. Consequently, the optimal performance cannot be achieved.
UltraPath 21.6.0 supports three types of load balancing policies: min-queue-depth,
round-robin, and min-task.
min-queue-depth: The policy obtains the number of I/Os at each path in real time
and delivers new I/Os to the path with the fewest I/Os. When an application server
delivers I/Os to a storage system, the queue with the minimum number of I/Os takes
precedence over other queues in sending I/Os. Min-queue-depth is the default path
selection algorithm and provides the optimal performance in most cases.
round-robin: When an application server delivers I/Os to a storage system, UltraPath
sends the first set of I/Os through path 1 and second set of I/Os through path 2, and
so on. In this way, each path can be fully utilized. This algorithm is used together
with the policy of load balancing among controllers. Generally, the round-robin
algorithm has adverse impact on bandwidth performance. I/Os are delivered to each
logical path in sequence without considering the link load. As a result, a link may
become congested with too much pressure, which affects the sequence of I/Os.
min-task: When sending I/Os to the storage system, the application server calculates
the total data volume based on the block size of each I/O request and delivers the
Page 83
I/Os to the path with the minimum data volume. The algorithm is seldom used and
has the same impact on performance as the min-queue-depth policy.
Number of consecutive I/Os for load balancing
The number of consecutive I/Os for load balancing indicates the number of
consecutive I/Os delivered by hosts on a path at a time. For example, the
Loadbalance io threshold parameter in the UltraPath software of the Linux
operating system indicates the number of consecutive I/Os for load balancing. The
default value is 100, indicating that the host delivers 100 consecutive I/Os on each
selected path. A Linux block device aggregates consecutive I/Os. If multiple I/Os
delivered on a path use consecutive addresses, the block device aggregates them into
a large I/O and sends the large I/O to the storage system, thereby improving
efficiency.
In some scenarios, improper setting of this parameter will affect the performance.
On the Windows operating system, the default value of this parameter is 1. It is
advised to retain the default value because the Windows operating system does not
support block devices and I/Os will not be aggregated.
On the Linux operating system, it is advised to set this parameter to a smaller value
if large I/Os are involved. The default maximum size of I/Os that can be processed by
a Linux block device is 512 KB. That is, a normal or aggregated I/O will never be
larger than 512 KB. For example, if a 512 KB I/O is delivered, it cannot be
aggregated. In this case, the recommended value of this parameter is 1.
In scenarios where all I/Os are random, I/Os are barely aggregated. To reduce the
overhead, it is advised to set this parameter to 1.
Before locating performance problems on the storage side, you need to know the
service types and I/O characteristics on the storage side.
Service types include the Oracle database online transaction processing (OLTP)
service, online analytical processing (OLAP) service, virtual desktop infrastructure
(VDI) service, and Exchange mail service. By understanding and analyzing service
types, you can determine whether to pay attention to IOPS or bandwidth for the
current service and whether a low latency is required.
I/O characteristics include the I/O size, read/write ratio, cache hit ratio, and hotspot
distribution. You can use DeviceManager to monitor the system performance. By
understanding and analyzing I/O characteristics, you can determine whether the
current services are random I/Os or sequential I/Os, large I/Os or small I/Os, and
whether read services or write services are dominant.
Storage resource plan
Before locating performance problems on the storage side, you need to learn about
the current storage resource plan to locate the fault domain. You can use
DeviceManager or the information collection function of Toolkit to collect storage
resource planning information. A storage resource plan covers the following aspects:
1. Product model, specifications, and software version. To ensure load balancing
among multiple controllers, the number of interface modules and the number of
front-end and back-end ports on each controller must be basically the same.
Otherwise, a controller may bear heavy load while other controllers may bear
light load.
2. Number of front-end and back-end interface modules and number of connected
ports. For bandwidth-intensive services, the number of interface modules that
provide bandwidth for back-end ports must match the number of interface
modules that provide bandwidth for front-end ports, ensuring that the
bandwidth of the storage system can be fully utilized.
3. LUN properties, including whether deduplication and compression are enabled
for a LUN.
4. Value-added feature configurations, such as whether value-added functions such
as snapshot and remote replication are configured. Due to the implementation
mechanism of value-added features, extra performance overheads are
generated. A large number of metadata operations (such as LUN initialization
and space allocation) are required. In addition, many non-host I/Os (such as
remote replication records and deletion of differences) may be generated.
Therefore, you need to know the value-added functions configured in the
storage system.
number of metadata operations are required and many non-host I/Os may be generated,
which inevitably affects host services of the storage system.
When locating performance problems, check whether there are internal transactions or
value-added features first. Common value-added features include migration,
reconstruction, and pre-copy.
Data copies are generated for pre-copy and reconstruction tasks. When there are a large
number of data copies, the current service performance is affected. Therefore, during the
performance test, you need to check whether the current storage system runs pre-copy
or reconstruction tasks by running the following CLI command. When a pre-copy or
reconstruction task exists in a disk domain, you can view the progress of each task. If no
pre-copy, reconstruction, or balancing task exists in the disk domain, the command can
be successfully executed.
Migration
SmartMigration supports LUN migration within a storage device or between storage
devices. Data migration involves a large number of read and write copies, impacting
the system performance. If the migration speed is set to a high value, the host I/O
performance will be significantly affected. If the migration speed is set to a low
value, the host I/O performance will be slightly impacted.
Reconstruction
If a disk is faulty, the reconstruction function will recover data to newly assigned hot
spare space using RAID redundancy. This process will generate a large number of
RAID calculating overheads and data copies.
Pre-copy
Disk data pre-copy enables a storage system to routinely check its hardware status
and migrate data from any faulty disk to minimize the risks of data loss. Data copies
are generated in the pre-copy process.
Others
Other value-added features such as snapshot and remote replication increase
overheads and complicate I/O processing procedures, thereby undermining the
system performance. For details about the impact of value-added features on host
performance, see the corresponding feature guide.
average CPU usage. For details about how to create a metric graph, see "Creating a
Metric Graph" in the corresponding product documentation.
You can also run the show performance controller command to display the CPU usage
of a controller.
If the overall CPU usage is high for a long time, the controller performance has reached
the upper limit. In this case, you are advised to migrate some services to other storage
systems to reduce service load.
You can set the CPU usage threshold of a storage system. The default threshold is 90%.
When the CPU usage of a controller in a storage system exceeds the threshold, the
storage system is triggered to collect information. The collected information is stored in
the /OSM/coffer_data/omm/perf/exception_info/ directory. The total size of files in this
directory cannot exceed 14 MB. If the total size exceeds 14 MB, the earliest files will be
overwritten. The collected information is used for subsequent performance tuning or
problem locating and analysis.
Before testing the maximum performance of a storage system, ensure that the host
concurrency pressure is high enough. If the number of concurrent tasks on the host is
large enough but the performance value (IOPS/bandwidth) is not high, the host
pressure may not have been transferred to the storage front-end or the storage
performance may have reached the bottleneck.
In addition to comparing the latency, you can also use either of the following
methods to check the concurrency pressure of front-end ports for supplementary
analysis.
1. Method 1: Perform calculation by using a formula. This method applies to
scenarios where pressure is constant, for example, using test tools such as
IOmeter and Vdbench to test the performance under constant pressure. The
number of concurrent tasks is generally fixed during the test, and can be derived
if the IOPS and latency are provided, to understand the front-end concurrency
pressure. The formula is as follows: IOPS = Number of concurrent tasks x
1000/Latency (unit: millisecond). For example, if the IOPS is 3546 and the
latency is 6.77 ms, the number of concurrent tasks is 24 (which is calculated by
3546 x 6.77/1000).
2. Method 2: Run related commands on the CLI to obtain the approximate front-
end concurrency. This method applies to scenarios where pressure changes. You
can run the show controller io io_type=frontEnd controller_id=XX command to
query the number of concurrent front-end I/Os of a specified controller. Run this
command multiple times to obtain a stable value that can approximate the
number of concurrent front-end tasks delivered to the controller. XX indicates
the controller ID.
If the latency is low, the number of concurrent tasks obtained by running the show
controller io command may be inaccurate. In this case, method 1 can be used for
analysis.
If the front-end concurrency pressure is not high enough, increase the host
concurrency pressure. If the front-end concurrency pressure remains the same, locate
the fault on the host.
Checking whether front-end ports have bit errors
If the performance fluctuates frequently or declines unexpectedly, faults may have
occurred on front-end ports or links. In this case, check whether the front-end ports
experience bit errors.
You can check whether bit errors occur on front-end ports on DeviceManager, by
running the show port bit_error command on the CLI, or in the inspection report.
The bit error information of front-end ports varies with system versions. Bit errors on
the actual interface prevail.
If the number of bit errors on a front-end port increases continuously, performance
faults have occurred on the front-end port or link. In this case, you are advised to
replace the optical fiber or optical module.
Front-end port performance indicators
Key performance indicators of front-end ports include the average read I/O response
time, average write I/O response time, average I/O size, IOPS, and bandwidth.
You can use DeviceManager or the CLI command to query these indicators.
Page 89
Whether cache write policy can be set depends on the product model. The cache
write policy cannot be set for some product models. In Huawei hybrid flash storage,
when running the create lun command on the CLI to create a LUN, you can set the
cache write policy.
When a BBU fails, only one controller is working, the temperature is too high, or the
number of LUN fault pages is higher than the threshold, the write mode of LUNs
may shift from write-back to write-through (the LUN health status may switch to
write protection). In this situation, data cannot be written to disks but can be read
because write protection prevents data in disks from being modified.
You can query the properties of a LUN on DeviceManager or by running the show
lun general command on the CLI to obtain the current LUN health status and cache
write policy.
In the command output, Health Status indicates the health status of a LUN, Write
Policy indicates the cache write policy, and Running Write Policy indicates the
current cache write policy.
If the LUN health status is write protection, check for the cause, for example,
whether a BBU fails, only one controller is working, the temperature is too high, or
the number of LUN fault pages is higher than the threshold.
High and low watermarks (affecting write performance)
The high or low watermark of a cache indicates the maximum or minimum amount
of dirty data that can be stored in the cache. An inappropriate high or low
watermark of a cache can cause the write performance to deteriorate.
When the amount of dirty data in the cache reaches the upper limit, the dirty data is
synchronized to disks at a high speed. When the amount of dirty data in the cache is
between the upper and lower limits, the dirty data is synchronized to disks at a
medium speed. When the amount of dirty data in the cache is below the lower limit,
the dirty data is synchronized to disks at a low speed.
Do not set an excessively large value for the high watermark. If the high watermark
value is excessively large, the page cache is small. When the front-end I/O traffic
surges, the I/Os become unstable and the latency is prolonged, adversely affecting
the write performance.
Do not set an excessively small value for the low watermark. If the low watermark
value is excessively small, cached data is frequently written to disks, decreasing the
write performance.
If the difference between the high and low watermarks is too small, back-end
bandwidth cannot be fully utilized.
The recommended high and low watermarks for a cache are 80% and 20%,
respectively.
If the current high and low watermarks of the cache do not meet the requirements,
adjust them based on the preceding rules and then check whether the performance
is improved. You can run the quota show pttinfo command to display the details
about the high and low watermarks of the cache or use SystemReporter to view the
performance indicators of the cache.
Page 91
Changing the high or low watermark of a cache affects the frequency and size of
writing data from the cache to disks. Do not change the high or low watermark
unless required.
Whether the high and low watermarks can be changed depends on the product
model. For Huawei hybrid flash storage systems, you can run the change system
cache command on the CLI to change the high and low watermarks.
Prefetch policy (affecting read performance)
Regarding data reading, the cache prefetches data to increase the cache hit ratio and
reduce the number of read I/Os delivered to disks, thereby minimizing the latency
and providing higher performance.
When reading data from a LUN, a storage system prefetches data from disks to the
cache based on the specified policy. A storage system supports four prefetch policies:
non-prefetch, intelligent prefetch, constant prefetch, and variable prefetch.
1. Non-prefetch: Data is not prefetched. Instead, data requested by a host is read
only from disks. This policy is suitable for scenarios where all I/Os are random.
2. Intelligent prefetch: Data prefetch is dynamically determined based on I/O
characteristics. Only in the case of sequential I/Os, data is prefetched. Intelligent
prefetch is the default policy and recommended in most scenarios.
3. Constant prefetch: The size of data prefetched each time is a predefined fixed
value. This policy is suitable for scenarios where multiple channels of sequential
fixed-size I/Os are delivered in a LUN. This policy can be used in media &
entertainment (M&E) and video surveillance scenarios.
4. Variable prefetch: The size of data prefetched each time is a multiple of the I/O
size. (The multiple is user-defined, ranging from 0 to 1024.) This policy is
suitable for scenarios where multiple channels of sequential I/Os are delivered
and the I/Os vary in size. This policy can be used in M&E and video surveillance
scenarios.
If a prefetch policy is set inappropriately, excessive or insufficient prefetch may occur.
Excessive prefetch indicates that the amount of prefetched data is much larger than
the amount of data that is actually read. For example, if the constant prefetch policy
is set in a scenario where all I/Os are random, excessive prefetch definitely occurs.
Excessive prefetch will cause poor read performance, fluctuations in the read
bandwidth, or the failure to reach the maximum read bandwidth of disks. To
determine whether excessive prefetch occurs, you can use SystemReporter to
compare the read bandwidth of a storage pool with that of the disk domain where
the storage pool resides. Excessive prefetch occurs if the read bandwidth of a storage
pool is much lower that of the disk domain where the storage pool resides. If
excessive prefetch occurs, change the prefetch policy to non-prefetch.
Insufficient prefetch indicates that the amount of prefetched data is insufficient for
sequential I/Os. As a result, all I/Os must be delivered to disks to read data, and no
data is hit in the cache. For example, if the non-prefetch policy is set in the event of
small sequential I/Os (database logs), insufficient prefetch definitely occurs.
Insufficient prefetch leads to a relatively long I/O latency. To determine whether
insufficient prefetch occurs, you can use SystemReporter to query the read cache hit
Page 92
ratio of a LUN. If the read cache hit ratio of the LUN mapped to the database log is
low, you can set the prefetch policy to intelligent prefetch or constant prefetch.
Note that the prefetch policy of some product models cannot be changed.
Random small I/Os: indicate I/Os smaller than 16 KB. Random I/Os cannot be
aggregated in the cache and their original sizes are typically retained when they are
written into disks. Therefore, if the stripe depth (128 KB by default) is multiple times
of the I/O size, there is a slight change that a small I/O will be split across stripes. In
random small I/O scenarios, the stripe depth has little impact on I/O performance.
You are advised to retain the default stripe depth 128 KB.
Sequential small I/Os: Multiple sequential small I/Os in the cache are aggregated into
a large I/O, ideally equal in size to the stripe depth, and then written into a disk. A
large stripe depth helps reduce the number of I/Os written into disks, improving the
data write efficiency. A stripe depth of 128 KB or larger is recommended.
Sequential and random large I/Os: indicate I/Os that are greater than 256 KB. If the
selected stripe depth is smaller than the I/O size, I/Os are split on the RAID layer,
affecting the data write efficiency. The maximum stripe depth, 512 KB, is
recommended.
Others
Huawei all-flash storage uses the ROW mechanism based on the characteristics of
SSDs. Compared with overwrite, full-stripe write avoids the disk read overhead
caused by write penalty and prevents parity data from being overwritten frequently.
After data in the logical space is overwritten, the corresponding data on the original
disk becomes garbage data. The garbage data is reclaimed and released in the
background. Garbage collection generates extra overheads. Therefore, after garbage
collection is started, service performance is affected to some extent.
reads. The address space for random reads is discrete. Therefore, the performance of
a thick LUN is equivalent to that of a thin LUN in random reads.
Local access
LUN access by local controllers ensures storage system performance.
Local access to a LUN means that I/Os destined for a LUN are directly delivered to
the owning controller of that LUN. As shown in the figure in the training material, a
host is physically connected to controller A, the owning controller of LUN 1 is
controller A, and that of LUN 2 is controller B.
When the host accesses LUN 1, the access requests are delivered through controller
A. Such a LUN access mode is called local access.
When the host attempts to access LUN 2, the access requests are first delivered to
controller A. Then, controller A forwards them to controller B through the mirror
channel between controllers A and B. Finally, controller B delivers the access requests
to LUN 2. Such a LUN access mode is called peer access.
The peer access scenario involves the mirror channel between controllers. The
channel limitations affect LUN read/write performance. To prevent peer access,
ensure that there are physical connections between the host and controllers A and B.
If a host is physically connected to only one controller, configure the controller as the
owning controller of the LUN.
port usage, total IOPS, and block bandwidth. Performance indicators of back-end
ports vary with versions. Performance indicators on the actual interface prevail.
Disk performance analysis
Common storage media include SSDs, SAS disks, and NL-SAS disks. Each type of
storage media offers unique advantages and disadvantages in terms of performance
and cost. You must therefore select the disk type based on the service load and I/O
characteristics during storage planning.
In addition, the performance differences between tiers formed by different types of
disks are the basis of the tiered storage technology. Therefore, before configuring
storage services, you need to understand the performance characteristics and
differences of different types of disks.
1. SSD: SSDs do not have rotational latency of HDDs. For I/O models with obvious
access hotspots and sensitive response latency, especially for the random small
I/O read model commonly used in many database applications, SSDs have
obvious performance advantages over HDDs. However, for bandwidth-intensive
applications, SSDs deliver slightly higher performance than HDDs. In the tiered
storage technology, SSDs are used at the high-performance tier to bear high
IOPS pressure.
2. SAS disk: SAS disks store data on a series of high-speed rotating disks, providing
excellent performance, high capacity, and high reliability. There are two types of
revolutions per minute (RPMs): 10K RPM and 15K RPM. In the tiered storage
technology, SAS disks are used at the performance tier to provide excellent
performance, including stable latency, high IOPS, and large bandwidth. In
addition, the price of SAS disks is moderate.
3. NL-SAS disk: The rotation speed of NL-SAS disks is lower than that of SAS disks.
Generally, the rotation speed of NL-SAS disks is 7.2K RPM. NL-SAS disks can
provide the maximum capacity but low performance, and therefore are used at
the capacity tier in tiered storage. According to statistics, for most applications,
60% to 80% of the capacity bears light service load. Therefore, NL-SAS disks,
which can provide large capacity at a low price, are suitable for this part of
capacity. In addition, NL-SAS disks consume less power. Compared with SAS
disks, NL-SAS disks can reduce energy consumption per TB by 96%.
For a hybrid flash storage system, if a performance problem occurs and the front-end
storage is working properly, check whether the disk performance reaches the upper
limit. When the disk performance reaches the upper limit (the disk usage is close to
100%), storage performance is subject to back-end disk performance, and the IOPS
and bandwidth performance cannot be improved.
To ensure disk reliability and extend the disk service life, you are advised to set the
disk usage upper limit to 70%. If the usage of most disks is greater than 90%, you
are advised to add disks to the storage pool for capacity expansion or migrate
services to disks with better performance.
You can view the disk usage on DeviceManager or SystemReporter.
Note that the all-flash storage system uses only SSDs as storage media.
Disk selection principles for disk domains
Not all storage systems need to select disks for disk domains.
Page 96
For a storage system requiring disk domains, disk selection during disk domain
creation may affect the performance of the storage system. You can check whether
the types and capacities of disks in the same disk domain are the same on
DeviceManager.
To ensure performance, you can observe the following principles when selecting disks
for creating a disk domain:
1. Avoiding dual-port access in bandwidth-intensive scenarios: To ensure reliability,
each disk enclosure loop is connected to one SAS port on controller A and one
SAS port on controller B in the same engine. In this way, disks in the loop can be
accessed by both controllers at the same time. Dual-port access indicates that
both controllers A and B can deliver I/Os to disks in a disk domain at the same
time. Single-port access indicates that only one controller can deliver I/Os to
disks. For sequential services, dual-port access affects the sequence of I/Os
delivered to disks. In this case, dual-port access underperforms single-port access
in terms of bandwidth performance. Therefore, in scenarios with sequential I/Os
and high bandwidth requirements, such as the Media & Entertainment (M&E)
industry, you can plan single-port access. That is, set one owning controller for
LUNs in the storage pool corresponding to the disk domain. Dual-port access is
used in random I/O scenarios.
2. Avoiding intermixing of disks: Disks with different rotational speeds or capacities
have different I/O processing latency and bandwidth. Therefore, disks with poor
performance in a RAID group may become a bottleneck that restricts the
performance of the entire stripe group. In addition, problems such as fast and
slow disks and uneven disk usage may occur. If sufficient disks are available, you
are advised to select disks with the same rotational speed and capacity in the
same disk domain and avoid intermixing different types of disks.
Page 97
3 Storage Solution
The backup system network can be deployed independently of the production service
network to prevent backup traffic from affecting services. Alternatively, the service
network can function as the backup system network.
3.1.3 Technologies
The client driver captures I/O changes of the production volume in real time and sends
the changed data to the log volume of the Data Protection Appliance for temporary
storage. In addition, the virtual snapshot technology is used to periodically generate
point-in-time copies. With the virtual snapshot data and I/O log data in the log volume,
data can be recovered to any point in time. According to the snapshot retention period in
the backup policy, snapshots and I/O logs in log volumes are periodically deleted to
release the backup capacity.
If the data of the production system is damaged, you need to select the historical time
with data to be recovered. Then, the latest virtual snapshot is found based on the
selected time point for recovery. The I/O log data between the snapshot time point and
the time point for recovery in the log volume is found, and the virtual snapshot data and
the I/O log data in the log volume are combined to generate a new virtual volume. In
this case, data can be recovered to any point in time at the I/O level.
Continuous backup has the following characteristics:
● The RPO is close to zero. The volume-based I/O log technology captures I/O changes
of production volumes in real time and backs up the I/O changes. In this way, data loss
can be approximately zero during recovery.
● Recovery to any point in time The virtual snapshot and log technologies can be used
together to recover data to any point in time.
● Supports deduplication. Supports deduplication at the target end to reduce backup
storage capacity requirements.
● Supports compression. Supports compression at the storage end to reduce the backup
storage capacity usage.
● Supports different recovery modes, such as recovery to the original location on the
original host, recovery to a different location on the original host, and recovery to a
different host, to meet requirements in different recovery scenarios.
3.2.2 Technologies
3.2.2.1 HyperReplication
3.2.2.1.1 Introduction
HyperReplication is the remote replication feature developed by Huawei. The feature
provides flexible and powerful data replication functions to achieve remote data backup
and recovery, continuous support for service data, and disaster recovery. This feature
requires at least two OceanStor storage systems that can be placed in the same
equipment room, same city, or two cities up to 1000 km apart. The storage system that
provides data access for production services is the primary storage system, and the
storage system that stores backup data is the secondary storage system.
HyperReplication supports the following replication modes:
● HyperReplication/S for LUN: In this mode, data is synchronized between two storage
systems in real time to achieve full protection for data consistency, minimizing data loss
in the event of a disaster. However, production service performance is affected by the
data transfer latency.
● HyperReplication/A for LUN: In this mode, data is synchronized between two storage
systems periodically to minimize service performance deterioration caused by the latency
of long-distance data transmission. Production service performance is not affected by the
data transfer latency. However, some data may be lost if a disaster occurs.
● HyperReplication/A for file system: In this mode, data is synchronized between two
file systems periodically to minimize service performance deterioration caused by the
latency of long-distance data transmission. Production service performance is not
affected by the data transfer latency. However, some data may be lost if a disaster
occurs. HyperReplication provides the storage array-based consistency group function for
synchronous or asynchronous remote replication between LUNs to ensure the consistency
of cross-LUN applications in disaster recovery replication. A consistency group is a
collection of pairs that have a service relationship with each other. For example, the
primary storage system has three primary LUNs that respectively store service data, logs,
and change tracking information of a database. If data on any of the three LUNs
becomes invalid, all data on the three LUNs becomes unusable. For the pairs in which
these LUNs exist, you can create a consistency group. Upon actual configuration, you
need to create a consistency group and then manually add pairs to the consistency
group. The consistency group function protects the dependency of host write I/Os across
multiple LUNs, ensuring data consistency on secondary LUNs. In addition,
HyperReplication allows data to be replicated through both Fibre Channel and IP
networks. Data can be transferred between the primary and secondary storage systems
through Fibre Channel or IP links.
3.2.2.1.2 Application scenarios
HyperReplication is mainly used for data backup and disaster recovery. Different remote
replication modes apply to different application scenarios:
Page 103
HyperReplication/S for LUN supports data disaster recovery of LUNs over shortdistances.
It applies to same-city disaster recovery that requires zero data loss. It concurrently writes
each host write I/O to both the primary and secondary LUNs and returns a write success
acknowledgement to the host after the data is successfully written to the primary and
secondary LUNs. Therefore, the RPO is zero.
2. Dual-write state: After initial synchronization, data on the primary LUN is the same as
that on the secondary LUN. The normal I/O processing process is as follows:
a. The production storage system receives a write request from the host.
HyperReplication logs the address information instead of data content.
b. The data of the write request is written to both the primary and secondary LUNs. If a
LUN is in the write-back state, data will be written to the cache.
c. HyperReplication waits for the data write results from the primary and secondary
LUNs. If the data has been successfully written to the primary and secondary LUNs,
HyperReplication deletes the log. Otherwise, HyperReplication retains the log and enters
the interrupted state. The data will be replicated in the next synchronization.
d. HyperReplication returns the data write result. The data write result of the primary
LUN prevails.
3. Single-write state: If a user runs a specific command to split the pair, the replication
link is disconnected and the data of the write request fails to be written to both primary
and secondary LUNs. The remote replication pair enters the singlewrite state.
Page 104
a. A write success acknowledgement is returned to the host immediately after the data
is written to the cache of the primary storage system.
b. After data in the primary cache is written to the primary LUN, data differences
between the primary LUN and the secondary LUN are recorded in the data change log
(DCL).
HyperReplication/A for LUN supports data disaster recovery of LUNs over longdistances.
It applies to scenarios where a remote disaster recovery center is used and the impact on
production service performance must be reduced.
HyperReplication/A for LUN employs the multi-point-in-time caching technology to
periodically synchronize data between primary and secondary LUNs. All data changes to
the primary LUN since last synchronization will be synchronized to the secondary LUN.
Similar to HyperReplication/S, HyperReplication/A supports the consistency group
function. Users can create or delete a consistency group and add members to or delete
members from the group.
HyperReplication/A adopts the multi-point-in-time caching technology. The working
principles are as follows:
1. After an asynchronous remote replication relationship is set up between a primary LUN
at the primary site and a secondary LUN at the secondary site, an initial synchronization
is implemented to fully copy data from the primary LUN to the secondary LUN.
Page 105
2. After the initial synchronization is complete, the secondary LUN data status becomes
Consistent (data on the secondary LUN is a copy of data on the primary LUN at a
specified past point in time). Then, the I/O process shown in the following figure starts.
1. When a replication period starts, snapshots are generated for primary and secondary
LUNs, and the points in time of these two LUNs are updated. The snapshot of the
primary LUN is X and that of the secondary LUN is X–1.
2. The data in a write request from the host is written to time segment X + 1 in the cache
of the primary LUN.
3. A write success response is returned to the host.
4. Differential data generated at point in time X is directly replicated to the secondary
LUN based on the DCL.
5. Both primary and secondary LUNs flush received data onto disks. The latest data on
the secondary LUN is the data at point in time X of the primary LUN.
3.2.2.2 BCManager
3.2.2.2.1 Introduction
eReplication is designed to manage DR services of data centers for enterprises. With its
excellent application-aware capabilities and Huawei storage products' value-added
features, eReplication ensures service consistency during DR, simplify DR service
configuration, support the monitoring of DR service status, and facilitate data recovery
and DR tests.
[Highlights]
• Simple and efficient
eReplication adopts an application-based management method and guides you through
the DR service configuration step by step. It supports one-click DR tests, planned
migration, fault recovery, and reprotection.
• Visualized
By graphically displaying physical topologies of global DR and logical topologies of
service protection, eReplication enables you to easily manage the entire DR process. The
status of protected groups and recovery plans is clear.
• Integrated
eReplication integrates storage resource management and can be used in various
application scenarios, such as active-passive DR, geo-redundant DR, HyperMetro DC, and
local protection, reducing the O&M cost and improving the O&M efficiency.
• Reliable
eReplication ensures that the applications and data at the DR site are consistent with
those at the production site. After the production site fails, the DR site immediately takes
over services from the DR site, ensuring business continuity. The multi-site deployment
improves the reliability of the DR service management system. eReplication backs up
management data automatically so that the management system can be recovered
rapidly if a disaster occurs.
3.2.3 Planning
3.2.3.1 Information Collection
When collecting information, survey the project background, extract customer
requirements, and collect information about the devices, live network, and application
environment. If Fibre Channel switches are involved, collect information about the
manufacturer models and versions, rates, and remaining ports and whether the switches
are reused.
3.3.1.2 Highlights
1. Dual write ensures storage redundancy.
18. In the event of a fault in a storage system or production center, services can be
quickly switched to the DR center, ensuring zero data loss and service continuity.
19. This solution meets the requirements of RTO = 0 and RPO = 0.
20. Two data centers carry services at the same time, fully utilizing DR resources.
21. Services run 24/7.
22. If one storage system in the HyperMetro deployment performs poorly, the
performance of the HyperMetro system will be negatively affected.
Note: For some models, if HyperMetro is applied to typical cluster scenarios, RTO = 0; if
HyperMetro is applied to virtualization clusters, when a host in a cluster experiences a
fault, the VM automatically restarts and recovers on another host at an RTO ≈ 0.
Page 110
3.3.2 Technologies
3.3.2.1 Basic Concepts
Protected Object
For customers, the protected objects are LUNs or protection groups. That is, HyperMetro
is configured for LUNs or protection groups for data backup and disaster recovery.
1) Data protection can be implemented for each individual LUN.
2) Data protection can be implemented for a protection group, which consists of multiple
independent LUNs or a LUN group.
Protection Group (PG) and LUN Group
A LUN group can be directly mapped to a host for the host to use storage resources. You
can group LUNs for different hosts or applications.
A protection group (PG) applies to data protection with consistency groups. You can plan
data protection policies for different applications and components in the applications. In
addition, you can enable unified protection for LUNs used by multiple applications in the
same protection scenario. For example, you can group the LUNs to form a LUN group,
map the LUN group to a host or host group, and create a protection group for the LUN
group to implement unified data protection of the LUNs used by multiple applications in
the same protection scenario.
HyperMetro Domain
A HyperMetro domain allows application servers to access data across DCs. It consists of
a quorum server and the local and remote storage systems.
HyperMetro Pair
A HyperMetro pair is created between a local and a remote LUN within a HyperMetro
domain. The two LUNs in a HyperMetro pair have an active-active relationship. You can
examine the state of the HyperMetro pair to determine whether operations such as
synchronization, suspension, or priority switchover are required by its LUNs and whether
such an operation is performed successfully.
HyperMetro Consistency Group (CG)
A HyperMetro consistency group (CG) is created based on a protection group. It is a
collection of HyperMetro pairs that have a service relationship with each other. For
example, the service data, logs, and change tracking information of a medium- or large-
size database are stored on different LUNs of a storage system. Placing these LUNs in a
protection group and then creating a HyperMetro consistency group for that protection
group can preserve the integrity of their data and guarantee write-order fidelity.
Creating a HyperMetro pair for an individual LUN
Dual-Write
Dual-write enables the synchronization of application I/O requests with both local and
remote LUNs.
DCL
Data change logs (DCLs) record changes to the data in the storage systems.
Synchronization
HyperMetro synchronizes differential data between the local and remote LUNs in a
HyperMetro pair. You can also synchronize data among multiple HyperMetro pairs in a
consistency group.
Pause
Pause is a state indicating the suspension of a HyperMetro pair.
Force Start
To ensure data consistency in the event that multiple elements in the HyperMetro
deployment malfunction simultaneously, HyperMetro stops hosts from accessing both
Page 112
storage systems. You can forcibly start the local or remote storage system (depending on
which one is normal) to restore services quickly.
Preferred Site Switchover
Preferred site switchover indicates that during arbitration, precedence is given to the
storage system which has been set as the preferred site (by default, this is the local
storage system). If the HyperMetro replication network is down, the storage system that
wins arbitration continues providing services to hosts.
FastWrite
FastWrite uses the First Burst Enabled function of the SCSI protocol to optimize data
transmission between storage devices, reducing the number of interactions in a data
write process by half.
3.3.3 Planning
3.3.3.1 Information Collection
Collecting information about the upper-layer application environment helps determine
whether applications run as expected on the active-active platform, and identify and
optimize application deployment as early as possible.
The future data growth trend can be predicted based on current data amount and
historical data growth records.
Logical links between each host and the two storage arrays are connected. In the active-
active data center solution, although the host in data center A is only connected to the
switches in data center A using physical links and is connected to the switches in data
center B without using physical links, ensure that storage arrays in data center A and
data center B are connected using logical links when planning zones or VLANs for
switches.
The same type of networks (Fibre Channel or IP network) must be deployed between
each host and the two storage arrays.
The Fibre Channel network is recommended for networks between hosts and storage
arrays because the Fibre Channel protocol provides better performance than the iSCSI
protocol.
3.3.3.4.2 HyperMetro Replication Networks
The HyperMetro replication networks between storage arrays must use two switches,
that is, two independent networks are deployed between each host and each storage
array. In this way, if either network experiences a fault, the other network runs properly.
Each controller on each storage array must have two ports configured to connect
HyperMetro replication networks for link redundancy and load balancing. For easy
network management, controller A on storage array A is connected only to controller A
on storage array B.
For convenient network management, it is recommended that the HyperMetro
replication networks and the networks between hosts and storage arrays be the same
type of networks. For example, if Fibre Channel networks are deployed between hosts
and storage arrays, the HyperMetro replication networks should also be Fibre Channel
networks.
The Fibre Channel network is recommended for the HyperMetro replication networks as
the Fibre Channel protocol provides better performance than the iSCSI protocol.
A write success is returned to the host only when I/Os are successfully written to both
storage arrays through the HyperMetro replication links. To ensure service performance,
the solution requires a low latency RTT on HyperMetro replication networks under 1 ms.
3.3.3.4.3 Fibre Channel Switches
If a large number of devices are connected to switches or the service planning is complex,
hundreds of or even thousands of small zones may be required. In this case, you are
advised to configure zones based on the single HBA principle. That is, if a zone has only
one HBA or initiator, multiple targets are allowed in the zone.
3.3.3.4.4 Quorum Network Planning
Storage array access: Each controller on each storage array provides one GE or 10GE port
dual-homed to the quorum network VLANs of two core switches. The quorum network IP
addresses of the two storage arrays in data centers A and B must be set in different
network segments. In this way, data center A and data center B can advertise different
routes in the WAN to isolate the path between the quorum server and data center A
from that between the quorum server and data center B.
Quorum server access: The quorum server uses the TCP/IP protocol to communicate with
the storage arrays in the two data centers and is dual-homed to the quorum VLANs of
Page 117
the two Ethernet switches using two GE/10GE links. In this way, the quorum networks are
logically isolated from other networks.
Quorum network between data centers: The storage arrays in the two data centers
periodically access the third-place quorum site over the quorum networks, but there is no
quorum network traffic between the two data centers. Therefore, no route needs to be
advertised for the quorum network between the two data centers. Quorum networks
require high reliability, and therefore, it is recommended that different paths or private
links be used between the third-place quorum site and the two data centers. When the
path or link to one data center is disconnected, the path or link to the other data center
must be normal. Quorum networks do not require Layer 2 interconnection, but the
quorum server and storage arrays must properly communicate with each other via Layer
3 routing. It is recommended that the latency from the quorum server to the controllers
on the storage arrays in the two data centers be less than RTT 50 ms and the bandwidth
be greater than 2 Mbit/s.
3.3.3.4.5 WDM Network Planning
Huawei WDM products use advanced low-latency processing technologies to optimize
latency and are ideal for Data Center Interconnect (DCI) scenarios requiring low latency.
Therefore, it is recommended that Huawei WDM products be used to construct intra-city
transmission networks. The following describes the recommended WDM network
deployment modes:
1. In large-scale networking scenarios, separate WDM channels should carry various
networks. At least the SAN, public, and private networks must be physically
separated. The public network may share a WDM channel with other service traffic
(such as traffic between servers) depending on the bandwidth utilization.
26. WDM devices provide three redundancy modes in line-side 1 + 1 protection, intra-
board 1 + 1 protection, and client-side protection, with the latter providing the
highest reliability. Reliability in an ascending order is as follows: line-side 1 + 1
protection < intra-board 1 + 1 protection < client-side protection.
27. The minimum adaptive bandwidth of the optical module on a Fibre Channel switch is
2 Gbit/s. Therefore, you are advised to configure optical modules with a bandwidth
greater than or equal to 2 Gbit/s on WDM devices for interconnection with Fibre
Channel switches.
3.3.3.4.6 Overall Network Planning
Global Server Load Balance (GSLB): GSLB distributes traffic amongst servers dispersed
across multiple geographies in the WAN (including the Internet) so that the best
available server serves the closest user, ensuring quick and reliable access.
Server Load Balance (SLB): SLB distributes traffic amongst servers in the same geography.
The GSLB and SLB are two functions provided by F5 BIG-IP Global Traffic Manager
(GTM) and Local Traffic Manager (LTM).
The GSLB function resolves the Domain Name System (DNS) and notifies the user of the
SLB of the data center to which the domain name should be sent. The SLB in each data
center is deployed in an HA cluster in AP mode. Only one F5 LTM provisions services at a
time. The user accesses the virtual server IP address of the SLB. In the SLB, the virtual
server IP address is mapped to several pools. A pool is a logical unit and contains
multiple web servers.
Page 118
The GSLB and SLB perform comprehensive health checks to identify the health state of
the target server to which data is to be sent. If the target server is faulty, data will be
sent to a server in another resource pool instead.
The SAN network planning involves planning the networks between database servers and
storage arrays, HyperMetro replication networks between storage arrays, and Fibre
Channel switches.
The IP network planning involves planning the GSLB access networks, SLB access
networks, web/app access networks, database public/private networks, storage quorum
networks, access switches, and core switches.
The intra-city transmission networks connect the SAN networks and IP networks between
two data centers.
3.3.3.4.7 Overall Port and IP Address Planning
The teaching material only illustrates the networking mode for data center A. In practice,
the networking mode of data center A also applies to data center B.
IP address planning rules
Consecutiveness: Consecutive IP addresses must be allocated to the same network
area.
Scalability: The IP addresses allocated to a network area must have certain
redundancy so that IP addresses can remain consecutive when more devices are
added.
Security: A network should be divided into different network segments (subnets) for
easy management.
Device naming rules
Storage array naming rule: DC ID_array model_SN, for example, DC1_5500_1.
Server naming rule: DC ID_service type_SN, for example, DC1_OrcleDB_1.
Switch naming rule: DC ID_switch model_SN, for example, DC1_2248_1.
Precautions
Configure the IP addresses of iSCSI host ports and service network ports on
application servers in the same network segment. Do not connect host ports and
service network ports via routing gateways.
Do not plan the IP addresses of service ports (such as HyperMetro, mirroring, and
network service ports) and management ports in the same network segment.
Do not overlap IP addresses with the IP addresses of heartbeat network ports and the
network segment allocated for connecting switches to controllers. Otherwise, routing
will fail.
3.3.3.4.8 Function Design
The function design involves service-related planning and design.
Page 119
3.4.1.2 Highlights
1. Reliable business continuity
33. Stable data consistency
34. Adjustable performance
35. Heterogeneous compatibility
Page 120
Note: Huawei provides all-around data migration solutions at the host layer, SAN
network layer, and storage layer, and provides mature tools to help customers solve data
migration challenges. This process is implemented using rigorous technologies, rich
experience, and developed tools.
3.4.2 Technologies
3.4.2.1 SmartVirtualization
3.4.2.1.1 Basic Concepts
External LUN
External LUN is a LUN in a heterogeneous storage system, which is displayed as an
external LUN on the DeviceManager.
eDevLUN
In the storage pool of a local storage system, the mapped external LUNs are reorganized
as raw storage devices based on a certain data organization form. A raw device is called
an eDevLUN. The physical space occupied by an eDevLUN in the local storage system is
merely the storage space needed by the metadata. The service data is still stored on the
heterogeneous storage system. Application servers can use eDevLUNs to access data on
external LUNs in the heterogeneous storage system, and the SmartMigration feature can
be configured for the eDevLUNs.
Hosting
LUNs in a heterogeneous storage system are mapped to a local storage system for use
and management.
3.4.2.1.2 Relationship Between an eDevLUN and an External LUN
An eDevLUN consists of data and metadata. A mapping relationship is established
between data and metadata.
1) The physical space needed by data is provided by the external LUN from the
heterogeneous storage system. Data does not occupy the capacity of the local storage
system.
2) Metadata is used to manage storage locations of data on an eDevLUN. The space used
to store metadata comes from the metadata space in the storage pool created in the
local storage system. Metadata occupies merely a small amount of space. Therefore,
eDevLUNs occupy a small amount of space in the local storage system. (If no value-
added feature is configured for eDevLUNs, each eDevLUN occupies only dozens of KBs
in the storage pool created in the local storage system.)
The following figure illustrates the relationship between an eDevLUN created in the local
storage system and an external LUN created in the heterogeneous storage system. An
application server accesses an external LUN by reading data from and writing data to the
corresponding eDevLUN.
Page 121
3.4.2.2 SmartMigration
3.4.2.2.1 Basic Concepts
Data organization
The storage system uses a virtualization storage technology. Virtual data in the storage
pool consists of meta volumes and data volumes.
1) Meta volume: records the data storage locations, including LUN IDs and data volume
IDs. LUN IDs are used to identify LUNs and data volume IDs are used to identify physical
space of data volumes.
2) Data volume: stores user data.
Source LUN
LUN from which service data is migrated.
Target LUN
LUN to which service data is migrated.
LM module
Manages SmartMigration in the storage system.
Pair
In SmartMigration, a pair indicates the data migration relationship between the source
LUN and target LUN. A pair can have only one source LUN and one target LUN.
Dual-write
The process of writing data to the source and target LUNs at the same time during
service data migration.
Log
Page 123
Records data changes on the source LUN to determine whether the data is written to the
target LUN at the same time. Both systems can be written simultaneously using the dual-
write technology.
Data change log (DCL)
Records differential data that fails to be written to the target LUN during the data
change synchronization.
Splitting
The process of stopping service data synchronization between the source LUN and target
LUN, exchanging LUN information, and then removing the data migration relationship
between the source LUN and target LUN.
3.4.2.2.2 Service Data Synchronization
Initial synchronization
After service data synchronization starts on the source LUN, all initial service data is
copied to the target LUN, as shown in the following figure.
3) After LUN information exchange, the host can still identify the source LUN using the
source LUN ID but read the target LUN's physical space. In this way, services are
migrated without awareness of users.
The following figure illustrates the principle of LUN information exchange.
Pair splitting
Pair splitting means that the data migration relationship between the source LUN and
target LUN is removed after LUN information is exchanged. After the pair is split, if the
host delivers an I/O request to the storage system, data is only written to the source LUN
(the physical space to which the source LUN ID points is the target data volume). The
target LUN will store all data of the source LUN at the pair splitting point in time. After
the pair splitting, no connections can be established between the source LUN and target
LUN.
Page 126
3.4.3 Planning
3.4.3.1 Huawei Migration Solutions
Currently, Huawei supports the following migration solutions:
1. Migration based on application software functions: This type of migration uses
functions of upper-layer applications to migrate data, such as migration functions of
Oracle databases and replication functions of file systems. This type of migration is
smooth and has less downtime. However, it has strict requirements on the
application scenarios and is not universally applicable.
36. Migration based on volume management software functions: This type of migration
is implemented using the mirroring or migration function of the volume
management software. For example, AIX, HP-UX LVM, and VxVM have the mirroring
function. Data can be migrated by mirroring and then splitting the mirroring. This
method is smooth, and downtime is minimal. However, customers must use the
operating system and volume management software that support this feature to
implement this function.
37. Migration based on network functions including VIS, heterogeneous virtualization,
and MigrationDirector for SAN: VIS is a Huawei-developed product. Heterogeneous
virtualization uses this function of storage devices to migrate data. MigrationDirector
for SAN is a non-disruptive migration tool developed by Huawei; however, the
supported source storage and target storage are limited.
38. Migration based on storage functions including internal LUN migration and remote
replication: This type of migration occupies few host resources. However, online
migration can be used only between homogeneous storage systems. Otherwise,
offline migration must be performed after the system is shut down.
3.4.3.1.1 Migration Based on Application Software Functions
VM software functions
Common VMs, such as VMware and Hyper-V, provide basic online storage
replacement and migration technologies. For example, the storage vMotion function
of VMware and the live migration function of Hyper-V can implement online storage
replacement, which requires simple operations and achieves non-disruptive
replacement and migration. However, this solution is not applicable to scenarios such
as RDM and VDI. Therefore, you need to check whether the requirements in the
application scenario are met before starting a VM migration. If yes, the preferred
solution is the storage replacement and migration function provided by the VM
software. The current version mainly provides the migration solution using the VM
built-in function.
Database software functions
The database migration solution is mainly used for database migration. In addition,
databases can also be migrated at the external storage or network layer. However, in
certain scenarios, data must be migrated at the database layer to achieve a better
effect because device types, data changes, and shortened downtime are involved.
Such scenarios include cross-platform database migration, cross-media migration,
and those that have high requirements regarding downtime.
Page 128
service data before data migration and reserve sufficient backup windows for data
backup. It is recommended that all onsite and remote support personnel be in place
and the specific implementation time be determined before data migration. You are
advised to back up service data before data migration. After selecting a data
migration solution, refer to the data migration specification list to check the
restrictions and requirements.
41. Assessment of the overall operation risks
Page 130
The fundamental goal of the control theory is to understand and define the functions
and processes of systems. These systems have objectives and participate in the causal
chain of cycles, performing operations such as perception, comparison, and action.
4.1.2 Tools
Huawei provides premium storage O&M tools that are tailored to common demanding
scenarios.
Users can query, set, manage, and maintain the storage system on DeviceManager and
the command-line interface (CLI). Serviceability management tools, such as SmartKit and
eService, help improve O&M efficiency. The following is a list of popular premium
Huawei tools:
DeviceManager: single-device O&M software. DeviceManager is an integrated storage
management platform designed for all Huawei storage systems, enabling you to
configure, manage, and maintain storage devices with ease.
SmartKit: a professional-grade tool for Huawei technical support. SmartKit includes
compatibility evaluation, planning and design, one-click fault information collection,
inspection, upgrade, and field replaceable unit (FRU) replacement.
eSight: multi-device maintenance suite that provides fault monitoring and visualized
O&M
Page 132
DME: unified management platform for storage resources, offering service catalog
orchestration, on-demand supply of storage services, and data application services
eService client: deployed in a customer's equipment room. It discovers exceptions of
storage devices in real time and reports them to the Huawei maintenance center.
eService cloud platform: deployed on the Huawei maintenance center. It monitors devices
in the network in real time, offering proactive maintenance.
With artificial intelligence (AI), storage management will develop towards the following
trends to improve management and O&M efficiency:
1. Single- to multi-device management
42. Single dimension to full lifecycle management
43. Manual management to automation
It is under this condition that Huawei developed the DME Storage platform, a full-
lifecycle automatic management platform.
DME Storage adopts the service-oriented architecture design that runs on taking
automation, AI analysis, and policy supervision. It integrates all phases of the storage
lifecycle, covering planning, construction, O&M, and optimization, to implement
integrated, full-lifecycle storage management and control.
The platform is designed to simplify storage management and improve data center
operation efficiency by allowing you to manage your resources over a unified
management environment. It provides open APIs, cloud-based AI enablement, and multi-
dimensional intelligent risk prediction and intelligent tuning, allowing you to integrate
multiple types of storage resources on demand without changing the existing data
channels or adding new storage functions to a single storage array. In addition, users can
connect hosts and Fibre Channel (FC) switches to the system for end-to-end O&M and
management. This solution supports full lifecycle management of storage devices,
including planning, construction, maintenance, and optimization. The continuous
application of AI technologies automates storage management.
Key characteristics of the DME:
It is a distributed microservice architecture with high reliability and 99.9% availability.
It manages large-scale storage devices. A single node can manage 16 storage devices,
each with 1500 volumes.
Northbound and southbound systems are fully open. Northbound systems provide
RESTful, Simple Network Management Protocol (SNMP), and ecosystem plug-ins (such
as Ansible) to interconnect with upper-layer systems, whereas southbound systems are
associated with the OpenStack storage. Standard interface protocols, such as SNMP and
RESTful, are used to interconnect with storage devices.
DME implements proactive O&M based on the AI and policy engine. In the maintenance
phase, it analyzes the following problems automatically:
Based on preset policies and AI algorithm models, the system automatically detects
potential problems throughout the lifecycle from multiple dimensions, including capacity,
performance, configuration, availability, and optimization.
It allows users to customize check policies, such as capacity thresholds.
For planning, it covers infrastructure management and service level management.
Page 133
A super administrator can only log in to the storage system as a local user.
Before logging in to DeviceManager as a Lightweight Directory Access Protocol
(LDAP) domain user, configure the LDAP domain server in the external storage
system, and then configure parameters on the storage system to add the system into
the LDAP domain, and finally create an LDAP domain user.
By default, DeviceManager allows a maximum of 32 logged-in users.
The storage system provides two types of roles: preset roles and custom roles.
Preset roles in the system have specific permissions, including the super
administrator, administrator, and read-only user.
Permissions of custom roles can be flexibly configured by users based on site
requirements.
To support permission control in multi-tenant scenarios, the storage system divides the
preset roles into the system and tenant groups. The differences are as follows:
Tenant group: The roles are used only when a user logs in to DeviceManager with a
tenant account.
System group: The roles are used only when a user logs in to DeviceManager with a
system account.
4.1.3 Management
4.1.3.1 Routine Management of a Storage System
DeviceManager helps to monitor the performance of a storage system.
Page 137
checks the firmware that needs to be upgraded on the controller and the firmware is
upgraded in sequence. After the upgrade is complete, restart the controller. After the
controller is powered on, services that belong to the controller are switched back to the
controller. Other controllers are upgraded in the same way.
This rolling upgrade method requires you to restart controllers and all services are taken
over by normal controllers. As a result, the read and write performance may decrease by
10% to 20%. Therefore, you are advised to perform the upgrade during off-peak hours.
maintenance, the time to obtain the spare parts, the response time of the
maintenance team, the time to record all tasks, and the time to put the device back
into use.
A shorter MTTR correlates to better system recoverability.
Availability (usually represented by A) refers to the capability of a repairable product
to have or maintain its functions at a certain time when the product is used under
specified conditions. The value can be calculated based on MTBF and MTTR: A =
MTBF/(MTBF + MTTR)
The fault diagnosis principles can help you quickly exclude useless information and
quickly diagnose faults.
The principles are as follows:
Analyze external factors first and then internal factors.
When locating faults, prioritize the external factors (optical fibers, optical cables,
power supplies, and customer devices) before internal factors (disks, controllers, and
interface modules).
Analyze high-severity alarms first and then low-severity alarms.
The alarm severity sequence from high to low is critical, major, and warning.
Analyze common alarms first and then uncommon alarms.
When analyzing an alarm, confirm whether it is an uncommon or common alarm
and its impact, and determine whether the fault occurred on one component or
multiple components.
To improve the emergency handling efficiency and reduce losses caused by emergency
faults, emergency handling must comply with the following principles:
If an emergency fault causes data loss, stop host services or switch services to a
standby host, and back up the data in a timely manner.
During emergency handling, record all operations performed.
Emergency handling personnel must participate dedicated training courses and
understand related technologies.
Recover core services before recovering other services.
4.2.2 Preparations
In the event of a fault, collect and report basic, fault, storage device and system,
application server, and network information to help maintenance personnel quickly
locate and rectify the fault.
The fault information to be collected includes: file system fault information, volume
management fault information, database fault information, storage system fault
information (mandatory), switch information (mandatory), host information, and HBA
information.
Before collecting the fault information, evaluate the impact of the fault on services, back
up data if necessary, and obtain related authorization information.
Page 143