ANSI/BICSI 002-2014
Data Center Design and Implementation Best Practices
Risk, Reliability & Availability
Selecting a Data Center Design Class
Bob Camerino RCDD DCDC CT
Secretary BICSI Mainland Europe District
Owner and Principal Engineer Innovative Technical Solutions
Discussion
Discussion points:
1. Risk analysis
2. Availability
3. Determining Data Center Class
4. Reliability
5. Efficiency and Reliability
6. BICSI Design Classifications
7. The BICSI Difference
Risk Analysis
Seven considerations:
1. Life safety – If the system failed would lives be at risk
2. Threats – Natural, man-made or technology catastrophic events
3. Economic loss from loss of data
4. Economic loss from damaged equipment
5. Regulatory or contractual impact
6. Damage to the organization’s reputation
7. Access to redundant off-site processing
Reliability & Availability
Reliability
• How many times will the equipment work as
expected?
Availability
• How often is the equipment operational?
Availability
Determine the availability class for a data center
• Operational requirements
• Availability requirements
• Impact of down time
• Component and system reliability
• Impact of class on design
Defining Availability Class
Operational Operational Impact of
Requirements Availability Downtime
Availability Class
Operational Requirements
Identifying Operational Requirements
Operational Annual Planned
Description
Level Maintenance Hours
Functions are operational less than 24 hours a day and less than 7 days a week.
0 >400 Scheduled maintenance is available during working hours and off hours
Functions are operational less than 24 hours a day and less than 7 days a week.
1 100 - 400 Scheduled maintenance is available during working hours and off hours
Functions are operational 24 hours a day and up to 7 days a week for 50 weeks a year.
2 50 - 99 Scheduled maintenance is available during working hours and off hours
Functions are operational 24 hours a day and up to 7 days a week for 50 weeks or more.
3 0 - 49 No scheduled maintenance is available during working hours
Functions are operational 24 hours a day and up to 7 days a week for 52 weeks a year. No
4 0 scheduled maintenance is available
Key Factor – The amount of time planned for maintenance
Operational Availability Rating
Allowable Maximum Annual Downtime in Minutes
Operational >5000 500 - 5000 50 - 500 5 - 50 0.5 - 5
Level (> 99%) (> 99% > 99.9%) (> 99.9% > 99.99%) (> 99.99% > 99.999%) (> 99.999% > 99.9999%)
Level 0 0 0 1 2 2
Level 1 0 1 2 2 2
Level 2 1 2 2 2 3
Level 3 2 2 2 3 4
Level 4 3 3 3 4 4
Operational Availability – When the IT services are expected to be available
Impact of Downtime
Classifying Downtime
Classification Impact of downtime
Local in scope, single site, minor disruption of delay to non-critical
Isolated
objectives
Minor Local in scope, single site, minor disruption of delay to key objectives
Regional in scope, portions of the enterprise, moderate disruption or
Major
delay of key objectives
Multiregional in scope, major portions of the enterprise, significant
Severe
disruption or delay of key objectives
Quality of service delivery across the enterprise, significant disruption or
Catastrophic
delay of key objectives
Determining Data Center Class
Operational Availability Rank
Impact of downtime
0 1 2 3 4
Isolated Class 0 Class 0 Class 1 Class 3 Class 3
Minor Class 0 Class 1 Class 2 Class 3 Class 3
Major Class 1 Class 2 Class 2 Class 3 Class 3
Sever Class 1 Class 2 Class 3 Class 3 Class 4
Catastrophic Class 1 Class 2 Class 3 Class 4 Class 4
Availability Concerns
• Component Redundancy
– Redundancy of critical high-risk components
• System Redundancy
– Redundancy at the system level
• Quality
– Commercial or premium grade
• Survivability
– Protection against external events
Reliability
“Reliability is the probability that a component
or system will perform it’s intended function
within stated conditions, for a specified period
of time without failure”
ANSI/BICSI 002-2014 B.8.1
Reliability is calculated from published MTBF
data for components and systems.
(mean time between failures)
Reliability
A1
R = .90
RA1A2 = RA1 x RA2
.90 x .90 = .81 A3
R = .95
A2
R = .90
RA = 1 – [(1 - RA1A2) x [(1 – RA3)] RTOTAL = RA X RB
1 – [1 – [(1 - .81) x [(1 - .95)] = .9905 .9905 x .995 - .9855475 (98.5%)
RB = 1 – [(1 – RB1) x [(1 – RB2)] B1 B2
1 – [(1 - ..95) x [(1 - .90)] = .995 R = .95 R = .90
What is N
• N or Need is the resource required to serve
the IT equipment
• N+1 when components (N) have at least one
independent backup component (+1)
Utilization Efficiency verses Reliability
N = 100 kVA of UPS
N+1 redundancy can be achieved as:
1. 2 x 100 kVA modules = 200 kVA (50% efficient)
2. 3 x 50 kVA modules = 150 kVA (66% efficient)
3. 4 x 33 kVA modules = 132 kVA (75% efficient)
4. 5 x 25 kVA modules = 125 kVA (80% efficient)
Continuous Improvement
1
Analyze
5 2
Measure Prioritize
4 3
Implement Develop
BICSI DC Design Classifications
• Class 0: Single path, and fails to meet one or more
criteria of Class 1
• Class 1: Single path
• Class 2: Single path with redundant components
• Class 3: Concurrently maintainable & operable
• Class 4: Fault tolerant
Availability Class Prefixes
• Class Fx: Facility (Electrical & Mechanical)
• Class Cx: Cable Plant
• Class Nx: Network Infrastructure
• Class Sx: Data Processing and Storage Systems
• Class Ax: Applications
Electrical Class F0 & F1
Utility Utility
Electrical Distribution Electrical Distribution
Maintenence Bypass
Mechanical Mechanical
Switchgear
Static Bypass
Optional Switchgear
UPS UPS
N N
Mechanical Mechanical
Loads Loads
PDU PDU
Critical Non-Critical Critical Non-Critical
Loads Loads Loads Loads
F0 – Single path, module and source F1 – Single path, module and source
Electrical Class F2
Transfer Alternate
Utility
Switchgear Power Source
Electrical Distribution
Maintenence Bypass
Mechanical
Static Bypass
UPS UPS
Switchgear
N +1
Mechanical
Loads
PDU
Critical Non-Critical
Loads Loads
Single source, multiple module, single path
Electrical Class F3 Single Utility
Alternate
Power Source N
Transfer Transfer
Utility
Switchgear Switchgear
Alternate
Power Source +1
Electrical Distribution Electrical Distribution
Maintenence Bypass
Static Bypass
UPS UPS Mechanical Mechanical
Switchgear Switchgear
N +1
Mechanical Mechanical
Loads Loads
Distribution Distribution
Switchgear Switchgear
Critical Mechanical Critical Mechanical
Switchgear Switchgear
PDU PDU
Critical
Fans Pumps
Non-Critical Critical Non-Critical
Loads Loads Loads
Multiple source, N rated single or multimodule system, dual or multiple path
Electrical Class F4 Two Utilities
Utility Utility
Transformer Transformer
Alternate Alternate
Power Source N Power Source N
Transfer Transfer
Switchgear Switchgear
Alternate Alternate
Power Source +1 Power Source +1
Electrical Distribution Electrical Distribution
Maintenence Bypass
Maintenence Bypass
Static Bypass
Static Bypass
UPS UPS Mechanical Mechanical UPS UPS
Switchgear Switchgear
N +1 +1 N
Mechanical Mechanical
Loads Loads
Distribution Distribution
Switchgear Switchgear
Critical Mechanical Critical Mechanical
Switchgear Switchgear
PDU PDU
Critical
Fans Pumps
Non-Critical Critical Non-Critical
Loads Loads Loads
Mechanical Class F0 & F1
Pump
DRY-COOLER Indoor Heated Water
Chiller System Indoor Cooled Water
Outdoor Heated Water
Outdoor Cooled Water
CHILLER DX System
CONDENSER CONDENSER
“N” CRAH
“N” CRAH “N” CRAC “N” CRAC
Single Path
Mechanical Class F2
AIR-COOLED AIR-COOLED Pump
CONDENSER CONDENSER Indoor Heated Water
Indoor Cooled Water
“N + 1” Chillers, CHILLER CHILLER
Outdoor Heated Water
Pumps and Outdoor Cooled Water
Condensers
“N” CRAH
“N” CRAH
“+1” CRAH
Single path with redundant components
Mechanical Class F3
AIR-COOLED AIR-COOLED
CONDENSER CONDENSER Pump
Indoor Heated Water
Indoor Cooled Water
“N + 1” Chillers, CHILLER CHILLER Outdoor Heated Water
Pumps and Outdoor Cooled Water
Condensers
Pipe loops
recommended
“N” CRAH
“N” CRAH
“+1” CRAH
Concurrently maintainable and operable
Mechanical class F4
AIR-COOLED AIR-COOLED AIR-COOLED
CONDENSER CONDENSER CONDENSER
“N + 1” Chillers,
CHILLER CHILLER CHILLER
Pumps and
Condensers
Pipe loops
Required
Pump
Indoor Heated Water
Indoor Cooled Water
Outdoor Heated Water
“N” CRAH Outdoor Cooled Water
“N” CRAH
“+1” CRAH
“+2” CRAH
Fault tolerant
Telecommunication Class C0 & C1
SP SP – Service Provider
MH – Maintenance Hole
ER – Entrance Room
MDA – Main Distribution Area
MH HDA – Horizontal Distribution Area
EDA – Equipment Distribution Area
Work Area ER
Computer room
TR MDA
HDA HDA
EDA EDA EDA EDA
Single path multiple ducts from property line
Telecommunication Class 2
SP SP SP – Service Provider
MH – Maintenance Hole
ER – Entrance Room
MDA – Main Distribution Area
MH MH
C2 HDA – Horizontal Distribution Area
EDA – Equipment Distribution Area
Work Area ER
Computer room
TR MDA
HDA HDA
EDA EDA EDA EDA
Redundant and diverse multipath from the property line
Telecommunication Class 3
SP SP SP SP – Service Provider
MH – Maintenance Hole
ER – Entrance Room
MDA – Main Distribution Area
MH MH
C2 HDA – Horizontal Distribution Area
EDA – Equipment Distribution Area
Work Area ER ER
Computer room
TR MDA MDA
HDA HDA
EDA EDA EDA EDA
Redundant and diverse multipath from the property line to each HDA
Telecommunication Class 4
SP – Service Provider
SP SP SP
MH – Maintenance Hole
ER – Entrance Room
MDA – Main Distribution Area
MH MH HDA – Horizontal Distribution Area
C2
EDA – Equipment Distribution Area
Work Area ER ER
Computer room
TR MDA MDA
HDA HDA HDA HDA
EDA EDA EDA EDA
Redundant and diverse multipath from the property line to each EDA
Network Class N0 & N1
WAN Internet
• Internet - Access from a single
provider via a single link
Service Provider
• WAN/MAN - Single link from
one service provider
Edge WAN Edge Internet
• LAN/SAN - Single link
Core
connections throughout the
network
Aggregation
Access switch Access switch
Servers Servers
Network Class N2
• Internet- Two service providers
WAN Internet
with a single link or one service
provider with two links
Service Provider
• WAN/MAN – Non-redundant
circuits from two service providers
Edge WAN Edge Internet
or redundant circuits from a single
provider
Core
• LAN/SAN - Single link connections
throughout the network with
Aggregation
redundant critical components
Access switch Access switch
Servers Servers
Network Class N3
WAN Internet WAN • Internet- Two service
providers with a single link or
one service provider with two
Service Provider Service Provider links
Edge WAN Edge Internet Edge WAN Edge Internet • WAN/MAN – Non-redundant
circuits from more then two
Core Core service providers or
redundant circuits from a
Aggregation Aggregation
single provider
• LAN/SAN - Redundant link
Access switch Access switch
and components from access
Servers Servers
switches
Network Class N4
WAN Internet WAN
• Internet- Two service providers
with redundant links
Service Provider Service Provider
• WAN/MAN – Multiple circuits
from more then two service
Edge WAN Edge Internet Edge WAN Edge Internet
providers with redundant
circuits.
Core Core
• LAN/SAN - Redundant links,
Aggregation Aggregation components and chassis
Access switch Access switch
Servers Servers
System Class S0 & S1
• Systems are
implemented on
specific platforms
• Hardware dependent
with no seamless
failover or self healing
Application Server Direct Attach Storage Device Tape Backup
Application specific hardware, direct attach storage
System Class S2
• Systems are
implemented on
Load Balancing Services specific platforms with
For Applications mirrored applications
• Failure recovery
Application Server Application Server Network Attach Storage through failover to
Mirrored redundant systems
Tape Backup
Application specific redundant hardware with mirrored application
System Class S3
• Application specific
Load Balancing Services For Applications hardware dependent
or virtualized with
Mirrored
mirrored applications
Application Server Application Server
• Network attached
storage with mirrored
data on redundant
Mirrored
systems
Disk Array Disk Array
Mirrored
Network Attach Storage Network Attach Storage
Hardware dependent or virtualized specific processing platforms
System Class S4
• Location transparent,
virtualized systems or
Load Balancing Services For Applications hardware dependent
grid
Application Server Application Server Application Server Application Server • Network attached
Location Transparent Processing storage with mirrored
data on redundant
systems and
Synchronous/Asynchronous
automated data
management
Disk Array Disk Array
Mirrored
Network Attach Storage Network Attach Storage
Location transparent, virtualized or grid platforms
THE BICSI DIFFERENCE
ANSI/BICSI 002-2014 Data Center Design and
Implementation Best Practices covers:
• Site selection • Telecommunications
• Space planning • Information Technology
• Architectural • Commissioning
• Structural • Design Process
• Electrical Systems • Reliability and availability
• Mechanical Systems • Applications and Systems
• Fire Protection • Service Outsourcing
• Security • Multi-data center
• Management and building • Testing
systems DCIM, BMS, ESS • Energy Efficiency
Thank You!
Bob Camerino RCDD DCDC CT
Secretary BICSI Mainland Europe District
Owner and Principal Engineer Innovative Technical Solutions
[email protected]