Mass Storage Structure
Mass Storage Structure
Magnetic Disks
Magnetic disks provide the bulk of secondary storage for modern computer systems.
The two surfaces of a platter are covered with a magnetic material. We store information by
recording it magnetically on the platters.
The heads are attached to a disk arm that moves all the heads as a unit.
The surface of a platter is logically divided into circular tracks, which are subdivided into sectors.
The set of tracks that are at one arm position makes up a cylinder.
There may be thousands of concentric cylinders in a disk drive, and each track may contain
hundreds of sectors.
Most drives rotate 60 to 250 times per second, specified in terms of rotations per minute (RPM).
The transfer rate is the rate at which data flow between the drive and the computer.
the time necessary to move the disk arm to the desired cylinder, called the seek time, and
the time necessary for the desired sector to rotate to the disk head, called the rotational latency.
Although the disk platters are coated with a thin protective layer, the head will sometimes damage
the magnetic surface.
This accident is called a head crash. A head crash normally cannot be repaired; the entire disk must
be replaced.
Disk Scheduling
The operating system is responsible for using hardware efficiently — for the disk
drives, this means having a fast access time and disk bandwidth
Disk bandwidth is the total number of bytes transferred, divided by the total time
between the first request for service and the completion of the last transfer
Disk Scheduling
There are many sources of disk I/O request
(Cont.)
OS
System processes
Users processes
I/O request includes input or output mode, disk address, memory address, number of sectors to
transfer
OS maintains queue of requests, per disk or device
Idle disk can immediately work on I/O request, busy disk means work must queue
Optimization algorithms only make sense when a queue exists
Several algorithms exist to schedule the servicing of disk I/O requests
The analysis is true for one or many platters
We illustrate scheduling algorithms with a request queue (0-199)
Head pointer 53
FCFS
Illustration shows total head movement of 640 cylinders
SSTF
Shortest Seek Time First selects the request with the minimum seek time from the
current head position
SSTF scheduling is a form of SJF scheduling; may cause starvation of some requests
But note that if requests are uniformly dense, largest density at other end of disk and those
wait the longest.
SCAN (Cont.)
C-SCAN
Provides a more uniform wait time than SCAN
The head moves from one end of the disk to the other, servicing requests as it goes
When it reaches the other end, however, it immediately returns to the beginning of
the disk, without servicing any requests on the return trip
Treats the cylinders as a circular list that wraps around from the last cylinder to the first
one
More commonly If the arm goes only as far as the final request in each direction. Then, it
reverses direction immediately without going all the way to the end of the disk
LOOK a version of SCAN, C-LOOK a version of C-SCAN, they look for a request before
continuing to move in a given direction
Arm only goes as far as the last request in each direction, then reverses direction
immediately, without first going all the way to the end of the disk
C-LOOK (Cont.)
Selecting a Disk-Scheduling Algorithm
SSTF is common and has a natural appeal because it increases performance over FCFS.
SCAN and C-SCAN perform better for systems that place a heavy load on the disk
Less starvation
Thus, failure of one disk does not lead to loss of data. A variety of disk-organization techniques,
collectively called redundant arrays of independent disks (RAID), are commonly used to
address the performance and reliability issues.
In the past, RAIDs composed of small, cheap disks were viewed as a cost-effective alternative to
large, expensive disks.
Today, RAIDs are used for their higher reliability and higher data-transfer rate, rather than for
economic reasons.
Hence, the I in RAID, which once stood for “inexpensive,” now stands for “independent.”
RAID Levels
RAID 0 provides the performance, while RAID 1 provides the reliability. Generally, this
level provides better performance than RAID 5.
Selecting a RAID Level
level 0 is used in high-performance applications where data loss is not
critical.
level 1 is popular for applications that require high reliability with fast
recovery.
RAID 0 + 1 and 1 + 0 are used where both performance and reliability are
important—for example, for small databases.
Due to RAID 1’s high space overhead, RAID 5 is often preferred for storing
large volumes of data.
Level 6 is not supported currently by many RAID implementations, but it
should offer better reliability than level 5.
Rebuilding is easiest for RAID level 1, since data can be copied from
another disk. For the other levels, we need to access all the other disks in
the array to rebuild data in a failed disk.
Rebuild times can be hours for RAID 5 rebuilds of large disk sets.
Problems with RAID
Unfortunately, RAID does not always assure that data are available for the
operating system and its users.
A pointer to a file could be wrong, for example, or pointers within the file
structure could be wrong.
Incomplete writes, if not properly recovered, could result in corrupt data.
Some other process could accidentally write over a file system’s structures,
too.
RAID protects against physical media errors, but not other hardware and
software errors.
As large as is the landscape of software and hardware bugs, that is how
numerous are the potential perils for data on a system.