0% found this document useful (0 votes)
18 views4 pages

Computer and Spftware Reliability

Uploaded by

chanelldraws
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views4 pages

Computer and Spftware Reliability

Uploaded by

chanelldraws
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 4

A Computer System

Possible Faults in a Computer System


1. Fault-tolerant computing
a. Faults in computer hardware
b. Faults in computer software

2. Fault-tolerant control
a. Faults in sensors/effectors (actuators)
b. • Faults in application (system or plant)

3. Fault-tolerant communication
a. Faults in communication link (network)

4. Faults induced by operator – human errors

Overall, computer systems that can handle the above faults are referred to as Fault-Tolerant
Computer Systems, or in general, as Fault-Tolerant Systems.
Availability = Ahardware * Asoftware * Ahumans * Ainterfaces * Aprocess

Generic Sources of Faults


• Mechanical – “wears out’’
􀂾 Deterioration: wear, fatigue, corrosion
􀂾 Shock: fractures, stiction, overload
• Electronic hardware – “bad fabrication; wear out”
􀂾 Latent manufacturing defects
􀂾 Operating environment: noise, heat, electro-migration
􀂾 Design defects
• Software – “bad design”
􀂾 Design defects
􀂾 “Code rot” – accumulated run-time faults
• People
􀂾 Human-made errors
Failures, faults and Cause-Effect Sequence
Type of fault connotation Cause-Effect
Sequence
• system Faults Defects within the deviation of function
system from design value
 Hardware
 Software
Errors Deviation from the manifestation of fault
required operation by incorrect value
Failures When system fails to 􀂾 Random, unpredictable, in
hardware but not in software
perform its required 􀂾 Systematic, resulting from
operation erroneous design
(The inability of a component 􀂾 Single-point failure, failure
or system to perform its of single component causes
intended function for a system failure
specified time under specified 􀂾 Multiple failures, failure of
environment conditions) several components leads to
system failure

Dealing with faults 􀂾 During development: fault avoidance & removal


􀂾 During operation: fault detection, removal & tolerance

Steps Developing Reliable Computer Systems - Design


Methodology
• Specification
􀂾 Functional requirements
􀂾 Reliability requirements

• System Design and Implementation


􀂾 Architectural/module design
􀂾 Construction, integration, and test

• Verification and Validation (V&V)


􀂾 Verification: process of determining that system meets specification
􀂾 Validation: process of determining that system is appropriate for the purpose

• Certification

Methods of improving computer systems reliability


Primary Design Techniques
• Fault avoidance
􀂾 Prevent faults in the first place e.g. design review
• Fault masking
􀂾 Localize fault, prevent error from getting into system informational structure e.g. error
correcting code, triple modular redundancy (TMR)

• Fault tolerance
􀂾 Allow system be able to perform tasks in the presence of faults e.g.
 redundancy techniques (hardware, software, information, time redundancies),
 Fault detection,
 Reconfiguration

MTBF: Mean Time Between Failures


MTTD: Mean Time To Detection
MTTR: Mean Time To Repair
MTTF: Mean Time To Failure
Failure Rate: The expected number of failures of
a type of device or system per a given time period.
Fault Coverage: A measure of a system’s ability
to perform fault detection, fault location, fault
containment, and/or fault recovery.

Basic Steps in Fault Handling


 􀂙 Fault confinement
 􀂙 Fault detection
 􀂙 Fault masking
 􀂙 Retry
 􀂙 Diagnosis
 􀂙 Reconfiguration
 􀂙 Recovery
 􀂙 Restart
 􀂙 Repair
 􀂙 Reintegration

SOFTWARE MAINTENANCE
It is defined as the process of modifying the software system/component subsequent to delivery to rectify faults,
improve performance or other attributes, or adapt to a change in the use environment

You might also like