تنظيم الملفات
تنظيم الملفات
An Object-Oriented
Approach with C++
Contents
Chapter 7 Indexing
CHAPTER 1
Introduction to the
Design and Specification
of File Structures
Disks are slow. They are also technological marvels: one can pack
thousands of megabytes on a, disk that fits into a notebook computer. Only
a few years ago, disks with that kind of capacity looked like small washing
machines. However, relative to other parts of a computer, disks are slow.
How slow? The time it takes to get information back from even
relatively slow electronic random access memory (RAM) is about 120
nanoseconds, or
120 billionths of a second. Getting the same information from a typical
disk
might take 30 milliseconds, or 30 thousandths of a second. To understand
the size of this difference, we need an analogy. Assume that memory
access is like finding something in the index of this book. Let’s say that
this local, book-in-hand access takes 20 seconds. Assume that accessing
a disk is like sending to a library for the information you cannot find here
in this book. Given that our “memory access” takes 20 seconds, how long
does the “disk access” to the library take, keeping the ratio the same as that
of a real memory access and disk access? The disk access is a quarter of a
million times longer than the memory access. This means that
getting'information back from the library takes 5 million seconds, or
almost 58 days. Disks are very slow compared with memory.
On the other hand, disks provide enormous capacity at much less cost
than memory. They also keep the information stored on them when they
are turned off. The tension between a disk’s relatively slow access time
and its enormous, nonvolatile capacity is the driving force behind file
structure design. Good file structure design will give us access to all the
capacity without making our applications spend a lot of time waiting for
the disk.
A file structure is a combination of representations for data in files and
of operations for accessing the data. A file structure allows applications to
read, write,, and modify data. It might also support finding the data that
matches some search criteria or reading through the data in some
particular order. An improvement in file structure design may make an
application hundreds of times faster. The details of the representation of
the data and the implementation of the operations determine the efficiency
of the file structure for particular applications.
A tremendous variety in the types of data and in the needs of applications
makes file structure design very important. What is best for one situation may
be terrible for another.
Our goal is to show you how to think creatively about file structure design
problems. Part of our approach draws on history: after introducing basic
principles of design, we devote the last part of this book to studying some of
the key developments in file design over the last thirty years. The problems that
researchers struggle with, reflect the same issues that you confront in addressing
any substantial file design problem. Working through the approaches to major
file design issues shows you a lot about how to approach new design problems.
The general goals of research and development in file structures can be
drawn directly from our library analogy.
■ Ideally, we would like to get the information we need with one access to
the disk. In terms of our analogy, we do not want to issue a series of
fifty-eight-day requests before we get what we want.
If it is impossible to getwhat we need in one accessT we want
It is relatively easy to come up with file structure designs that meet these goals when
we have files that never change. Designing file structures that maintain these qualities
as files change, grow, or shrink when information is added and deleted is much more
difficult.
Early work with files presumed that files were on- tape, since most files were.
Access was sequential, and the .cost to the size of the file. As files grew
intolerably large for unaided sequential access and as storage devices such as disk drives
became available, indexes were added to files. The indexes made it possible to keep a list
of keys and pointers in a smaller file that could be searched more quickly. With the key
and pointer, the user had direct access to the large, primary file.
Unfortunately, simple indexes had some of the same sequential flavor as the data
files, and as the indexes grew, they too became difficult to manage, especially for
dynamic files in which the set of keys changes. Then, in the early 1960s, the idea of
applying tree structures emerged. Unfortunately, trees can grow very unevenly as
records are added and deleted, resulting in long searches requiring many disk accesses
to find a record.
In 1963 researchers developed the tree, an elegant, self-adjusting binary tree
structure, called an AVL tree, for data in memory. Other researchers' began to look for
ways to apply AVL trees, or something like them, to files. The problem was that even
with a balanced binary tree, dozens of accesses were required to find a record in even
moderate-sized files. A method was needed to keep a tree balanced when each node of
the tree was not a single record, as in a binary tree, but a file block containing dozens,
perhaps even hundreds, of records.
It took nearly ten more years of design work before a solution emerged in the form
of the B-tree. Part of the reason finding a solution took so long was that the approach
required for file structures was very different from the approach that worked in memory.
Whereas AVL trees grow from the top down as records are added, B-trees grow from
the bottom up.
Bytrees provided excellent access performance, but there was a cost: no longer
could a file be,accessed_sequenfially^ifhLefficienQ:. Fortunately, this problem was
solved almost immediately bv adding a linked list structure at thebottom level of the B-
tree. The combination of a Brtriee and a.sequen- tial linked list is called a B+ tree.
Over the next ten years, B-trees and B+ trees became the basis for many
commercial file systems, since they provide access times that grow in
proportion to logwhere N is the number of entries in the file and k is the number of
entries indexed in a single block of the B-tree structure. In practical terms, this means
that B-trees can guarantee that you can find one file entry among millions of others
with only three or four trips to the disk. Further, B-trees guarantee that as you add and
delete entries, performance stays about the same.
Being able to retrieve information with just three or four accesses is
pretty good. But how about our goal of being able to get what we want with a
single request? An approach called hashing is_a.good .way to do that with
files that do not change size greatiX-Qxenfime. From early on, hashed indexes
were used to provide fast access to files. However, until recently, hashing did
not work well with volatile, dynamic files. After, the development of B-trees,
researchers turned to work on systems for extendible, dynamic hashing that
could retrieve information with one or, at most, two disk accesses no matter
how big the file became.
The symbol:: is the scope resolution operator. In this case, it tells us that
Person () is a method of class Person. Notice that within the method code, the
members can be referenced without the.dot (.) operator. Every call on a
member function has a pointer to an object as a hidden argument. The implicit
argument can be explicitly referred to with the keyword this. Within the
method, this->LastName is the same as LastName.
Overloading of symbols in programming languages allows a particular
symbol to have more than one meaning. The meaning of each instance of the
symbol depends on the context. We are very familiar with overloading of
arithmetic operators to have different meanings depending on the operand type.
For example, the symbol + is used for both integer and floating point addition.
C++ supports the use of overloading by programmers for a wide variety of
symbols. We can create new meanings for operator symbols and for named
functions.
The following class String illustrates extensive use of overloading: there
are three constructors, and the operators = and == are overloaded with new
meanings; .'
class String
{public:
String (}; // default constructor String (const
StringSc); //copy constructor String (const char *); //
create from C string -String (); // destructor
String & operator = (const String &); // assignment int
operator == (const String &) const; // equality char *
operator char*() // conversion to char *
{return strdup(string);} // inline body of method
private:char * string; // represent value as C string
int MaxLength;
,
The data members, string and MaxLength, of class String are in the
private section of the class. Access to these members is restricted. They can
be referenced only from inside the code of methods of the class. Hence, users
of String objects cannot directly manipulate these members..A conversion
operator (operator char *) has been provided to allow the use of the value of
a String object as a C string. The body of this operator is given inline, that is,
directly in the class definition. To protect the value of the String from direct
manipulation, a copy of the string value is returned. This operator allows a
String object to be used as a char *. For example, the following code creates
a String object s 1 and. copies its value to normal C string:
String si ("abcdefg"); // uses String::String (const char *) char
str[10];
strcpy (str, si); //uses String::operator char * ()
The new definition of the assignment operator (operator -) replaces the
standard meaning, which in C and C++ is to copy the bit pattern of one object
to another. For two objects si and s2 of class String, si = s2 would copy the
value of si. string (a pointer) to s2 . string. Hence, si. string and s2 . string
point to the same character array. In essence, si and s2 become aliases. Once
the two fields point to the same array, a change in the string value of si would
also . change s2. This is contrary to how we expect variables to behave. The
implementation of the assignment operator and an example of its use are:
String & String::operator = (const String & str)
{ // code for assignment operator
strcpy (string, str.string);
■ return *this;
1
String si, s2;
si = s2; // using overloaded assignment
In the assignment si = s2, the hidden argument (this) refers to si, and the
explicit argument str refers to s2. The line strcpy ( string, str . string); copies
the contents of the string member of s2 to the string member of si. This
assignment operator does not create the alias problem that occurs with the
standard meaning of assignment.
Chapter 1 introduction to the Design and Specification of File Structures
To complete the class String, we add the copy constructor, which is used
whenever a copy of a string is needed, and the equality operator (operator = = ),
which makes two String objects equal if the array contents are the same. The
predefined meaning for these operators performs pointer copy and pointer
comparison, respectively.
CHAPTER 2
Fundamental File
Processing Operations
a physical file or that they come from the keyboard or some other input device.
Similarly, the bytes the program sends down the line might end up in a file, or
they could appear on the terminal screen. Although the program often doesn’t
know where bytes are coming from or where they are going, it does know
which line it is using. This line is usually referred to as the logical file to
distinguish it from the physical files on the disk or tape.
Before the program can open a file for use, the operating system must
receive instructions about making a hookup between a logical file (for
example, a phone line) and some physical file or device. When using oper-
ating systems such as IBM’s OS/MVS, these instructions are provided through
job control language (JCL). On minicomputers and microcomputers, more
modern operating systems such as Unix, MS-DOS, and VMS provide the
instructions within the program. For example, in Cobol, 1 the association
between a logical file called inp_file and a physical file called my file .da t is
made with the following statement:
select- inp_file assign to "iayfile.dat".
This statement asks the.operating system to find the physical file named myf
ile . dat and then to make the hookup by assigning a logical file (phone line) to
it. The number identifying the particular phone line that is assigned is
returned.through the variable inp_f ile, which is the file’s logical name. This
logical name is what we use to refer to the file inside the program. Again, the
telephone analogy applies: My office phone is connected to six telephone
lines. When I receive a call I get an intercom message such as, “You have a
call on line three.” The receptionist does not say, “You have a call from
918-123-4567.” I need to have the call identified logically, not physically.
1. These values are defined in an “include" file packaged with your Unix system or C compiler. The
name of the include file is often f c n t l . h or f i l e . h, but it can vary from system to system.
16 Ch a pt e r 2 Fundamental File Processing Operations Op e n in g Fi le s 17
ready to start reading or writing. The file contents are not disturbed by the The return value fd and the arguments filename, flags, and pmode
open statement. Creating a file also opens the file in the sense that it is ready have the following meanings:
for use after creation. Because a newly created file has no contents, writing is Argument Type Explanation
initially the only use that makes sense. fd int The file descriptor. Using our earlier analogy, this is
As an example of opening an existing file or creating a new one in C and the phone line (logical file identifier) used to refer to the file
C++, consider the function open, as defined in header file fcntl .h. Although within the program. It is an integer. If there is an error in the
this function is based on a Unix system function, many C++ implementations attempt to open the file, this value is negative.
filename char * A character string containing the physical file name.
for MS-DOS and Windows, including Microsoft Visual C++, also support (Later we discuss pathnames that include directory
open and the other parts of f cntl. h. This function takes two required information about the file’s location. This argument can
arguments and a third argument that is optional: be a pathname.)
fd = open(filename, flags [, pmode]); (continued)
18 Chapter 2 Fundamental File Processing Operations
In terms of our telephone line analogy, closing a file is like hanging up the
phone. When you hang up the phone, the phone line is available for taking or
placing another call; when you close a file, the logical file name or file
descriptor is available for use with another file. Closing a file that has been used
for output also ensures that everything has been written to the file. As you will
learn in a later chapter, it is more efficient to move data to and from secondary
storage in blocks than it is to move data one byte at a time. Consequently, the
operating system does not immediately send off the bytes we write but saves
them up in a buffer for transfer as a block of data. Closing a file ensures that the
buffer for that file has been flushed of data and that everything we.have written
has been sent to the file.
Files are usually closed automatically by the operating system when a
program terminates normally. Consequently, the execution of a close statement
within a program-is needed only to protect it against data loss in the event that
the program is interrupted and to free up logical filenames for reuse.
20 Chapter 2 Fundamental File Processing Operations
Now that you know how to connect and.disconnect programs to and from
physical files and how to open the files, you are ready to start sending and
receiving data.
Reading and writing are fundamental to file processing; they are the
actions that make file processing an input/output (I/O) operation. The
form of the read and write statements used in different languages varies.
Some languages provide very high-level access to reading and writing and
automatically take care of details for the programmer. Other languages
provide access at a much low;er level. Our use of C and C++ allows us to
explore some of these differences.2
2. To accentuate the differences and view I/O operations at something close to a systems level, we use the
fread and fwrite functions in C rather than the higher-level functions such as fgetc, fgets, and so on.
Reading and Writing 21
A Write statement is similar; the only difference is that the data moves in
the other direction:
Write(Destination_file, Source_addr, Size)
Des t ination_f i le The logical file name that is used for sending the data.
Read and write operations are supported by functions fread, fg.et, fwrite, and
fput. Functions fscanf and fprintf are used for. formatted input and output.
Stream classes in C++ support open, close, read, and write operations that
are equivalent to those in stdio. h, but the syntax is considerably different.
Predefined stream objects cin and cout represent the standard input and standard
output files. The main class for access to files, f stream, as defined in header
files iostream. h and f stream. h, has two constructors and a wide variety of
methods. The following constructors and methods are included in the class:
fstream (); // leave the stream unopened 'fstream (char * filename, int
mode); int open (char * filename, int mode); int read (unsigned char *
dest_addr, int size); int write (unsigned char * source_addr,• int size);
The argument f ilename of the second constructor and the method open are just
as we’ve seen before. These two operations attach the fstream to a file. The
value of mode controls the way the file is opened, like the flags and type
arguments previously described. The value is set with a bit-wise or of constants
defined in class ios. Among the options are ios : : in (input), ios : : out (output),
ios : :nocreate (fail if the file does not exist), and ios : : noreplace (fail if the file
does exist). One additional, nonstandard option, ios : : binary, is supported on
many systems to specify that a file is binary. On MS-DOS systems, if ios : :
binary is not specified, the file is treated as a text file. This can have some
unintended consequences, as we will see later.
A large number of functions are provided for formatted input and output.
The overloading, capabilities of C++ are used to make sure that objects are
formatted according to their types. The infix operators >>(extraction) and
«(insertion) are overloaded for input and output, respectively. The header file
iostream. h includes the following overloaded definitions of the insertion
operator (and many others):
Reading and Writing 23
The insertion operators are evaluated left to right, and each one returns its left
argument as the result. Hence, the stream cout has first the string “Value of n is
” inserted, using the fourth function in the list above, then the decimal value of
n, using the eighth function in the list. The last operand is the I/O manipulator
endl, which causes an end-of-line to be inserted. The insertion function that is
used for << endl is not in the list above. The header file ios tream. h includes
the definition of endl and the operator that is used for this insertion.
Appendix C includes definitions and examples of many of the formatted
input and output operations.
// listc.cpp
// program using C streams to read characters from a file // and write
them to the terminal screen #include <stdio.h> .
main( ) { char ch;
FILE * file; // pointer to file descriptor char
filename[20];
printf("Enter the name of the-file: "); // Step 1
gets(filename); // Step 2
file =fopen(filename, "r"); // Step 3
while (fread(Scch, 1, 1, file) != 0) // Step 4a
fwrite(&ch, 1, 1, stdout); // Step 4b
fclose (file); // Step 5
}
■ Figure 2.2 The file listing program using C streams (listc. cpp).
// listcpp.cpp
// list contents of file using C++.stream classes #include <fstream.h> main () {
char ch;
fstream file; // declare unattached fstream char filename[20];
cout «"Enter the name of the file: " // Step 1 <<flush; // force output . cin »
filename; // Step 2
file . open(filename, ios::in); // Step 3
file . unsetf(ios::skipws);// include white space in read while (1)
{
file >> ch; " // Step 4a
if (file.fail()) break;
cout « ch; // Step 4b
}
file . close(); // Step 5
Figure 2.3 The file listing program using C++ stream classes (listcpp. cpp).
means: “Write to standard output the contents from memory starting at the
address &ch. Write only one element of one byte.” Beginning C++
programmers should pay special attention to the use of the & symbol in the
fwri te call here. This particular call, as a very low-level call, requires that the
programmer provide the starting address in memory of the bytes to be
transferred.
Stdout, which stands for “standard output,” is a pointer to a struct defined
in the file stdio . h, which has been included at the top of the program. The
concept of standard output and its counterpart standard input are covered
later in Section 2.8 “Physical and Logical Files.”
Again the C++ stream code operates at a higher level. The. right operand
of operator << is a character value. Hence a single byte is transferred to cout.
cout « ch;
As in the call to operator », C++ takes care of finding the address of the
bytes; the programmer need specify only the name of the variable ch that is
associated with that address. .
necessary: when the next byte is read, the system knows where to get it. The
end_of_file function queries the system to see whether the read/write
pointer’has moved past the last element in the file. If it has, end_of_f ile
returns true; otherwise it returns false. In Ada, it is necessary to call end_of_f
ile before trying to read the next byte. For an empty file, end_of_f ile
immediately returns true, and no bytes can be read.
2.5 Seeking
Source_file The logical file name in which the seek will occur.
Offset The number of positions in the file the pointer is to be
moved from the start of the file.
Now, if we want to move directly from the origin to the 373d position in a file
called data, we don’t have to move sequentially through the first 372
positions. Instead, we can say
Seek(data, 373)
The following program fragment shows how you could use f seek to move to a
position that is 373 bytes into a file. .
long pos;
fseek(File * file, long offset, int origin);
File * file;
pos=fseek(file, 373L, 0) ;
3. Although the values 0,1, and 2 are almost always used here, they are not guaranteed to work for all C
implementations. Consult your documentation.
Special Characters in Files
As you create the file structures described in this text, you may
encounter some difficulty with extra, unexpected characters that
turn up in your files with characters that disappear and with
numeric counts that are inserted into your files. Here are some
examples of the kinds of things you might encounter:
■ On many computers you may find that a Control-Z (ASCII
value of 26)Ts appended at the end of your files. Some
applications use this to indicate end-of-file even if you have not
placed it there. This is most likely to happen on MS-DOS
systems.
■ Some systems adopt a convention of indicating end-of-line in a
text file* 4 as a pair of characters consisting of a carriage return
4. When we use the term “text file” in this text, we are referring to a file consisting
entirely of characters from a specific standard character set, such as ASCII or
EBCDIC. Unless otherwise specified, the ASCII character set will be assumed.
Appendix B contains a table that describes the ASCII character set
30 Chapter 2 Fundamental File Processing Operations
value of 13) and a line feed (LF: ASCII value of 10). Sometimes I/O
procedures written for such systems automatically expand single CR
characters or LF characters into CR-LF pairs. This unrequested addition
of characters can cause a great deal of difficulty. Again, you are most
likely to encounter this phenomenon on MS-DOS systems. Using flag
“b” in a C file or mode ios::bin in a C++ stream will suppress these
changes.
■ Users of larger systems, such as VMS, may find that they have just the
opposite problem. Certain file formats under VMS remove carriage
return characters from your file without asking you, replacing them with
a count of the characters in what the system has perceived as a line of text.
These are just a few examples of the kinds of uninvited modifications that
record management systems or that I/O support packages might make to your
files. You will find that they are usually associated with the concepts of a line
of text or the end of a file. In general, these modifications to your files are an
attempt to make your life easier by doing things for you automatically. This
might, in fact, work out for those who want to do nothing more than store
some text in a file. Unfortunately, however, programmers building
sophisticated file structures must sometimes spend a lot of time finding ways
to disable this automatic assistance so they can have complete control over
what they are building. Forewarned is forearmed: readers who encounter these
kinds of difficulties as they build the file structures described in this text can
take some comfort from the knowledge that the experience they gain in
disabling automatic assistance will serve them well, over and over, in the
future.
data, and directories (Fig. 2.4). Since devices such as tape drives are also
treated like files in Unix, directories can also contain references to devices, as
shown in the dev directory in Fig. 2.4. The file name stored in a Unix
directory corresponds to what we call its physical name.
Since every file in a Unix system is part of the file system that begins with
the root, any file can be uniquely identified by giving its absolute pathname.
For instance, the true, unambiguous name of the file “addr” in Fig. 2.4 is /usr6
/mydir/addr. (Note that the / is used both to indicate the root directory and to
separate directory names from the file name.)
When you issue commands to a Unix system, you do so within a direc-
tory, which is called your current directory. A pathname for a file that does
not begin with a / describes the location of a file relative to the current
directory. Hence, if your current directory in Fig. 2.4 is mydir, addr uniquely
identifies the file /usr6/mydir/addr.
The special filename . stands for the current directory, and .. stands for
the parent of the current directory. Hence, if your current directory is
/usr6/mydir/DF,. . /addr refers to the file /usr6/mydir/addr.
32 Chapter 2 Fundamental File Processing Operations
The logical file is represented by the value returned by the f open call. We
assign this integer to the variable file in Step 3. In Step 4b, we use the value
stdout, defined in stdi o. h> to identify the console as the file to be written to.
There are two other files that correspond to specific physical devices in
most implementations of C streams: the keyboard is called stdin (standard
input), and the error file is called stderr (standard error). Hence, stdin is the
keyboard on your terminal. The statement fread(&ch, 1, 1, stdin);
reads a single character from your terminal. Stderr is an error file which, like
stdout, is usually just your console. When your compiler detects an error, it
generally writes the error message to this file, which normally means that the
error message turns up on your screen. As with stdin, the values stdin and
stderr are usually defined in stdio .h.
Steps 1 and 2 of the file listing.program also involve reading and writing
from stdin or stdout. Since an enormous amount of I/O involves these
devices, most programming languages have special functions to perform
console input and output—in list.cpp, the C functions print f and gets are
used. Ultimately, however, printf and gets send their output through stdout
and stdin, respectively. But these statements hide important elements of the
I/O process. For our purposes, the second set of read and write statements is
more interesting and instructive.
5. Strictly speaking, I/O redirection and pipes are part of a Unix shell, which is the
command interpreter that sits on top of the core Unix operating system, the kernel.
For the purpose of this discussion, this distinction is not important
34 Chapter 2 Fundamental File Processing Operations
What if, instead of storing the output from the list program in a file, you
wanted to use it immediately in another program to sort the results? Pipes let
you do this. The notation for a pipe in Unix and in MS-DOS is I. Hence,
programl I program2
means take any stdout output from programl and use it in place of any stdin
input to program2. Because Unix has a special program called sort, which
takes its input from stdin, you can sort the output from the list program,
without using an intermediate file, by entering
list | sort
Since sort writes its output to stdout, the sorted listing appears on your terminal
screen unless you use additional pipes or redirection to send it elsewhere.
Unix, like all operating systems, has special names and values that
you must use when performing file operations. For example, some
C functions return a special value indicating end-of-file (EOF)
when you try to read beyond the end of a file.
Recall the flags that you use in an open call to indicate
whether you want read-only, write-only, or read/write access.
Unless we know just where to look, it is often not easy to find
where these values are defined. Unix handles the problem by
putting such definitions in special header files such as
/usr/include, which can be found in special directories.
Header files relevant to the material in this chapter are stdio .
h, iostream. h, f stream.h, fcnt 1. h, and file . h. The C streams are
in stdio . h; C++ streams in iostream. h and f stream. h. Many
Unix operations are in fcntl. h and file.h. EOF, for instance, is
defined on many Unix and MS-DOS systems in stdio . h5 as ar
i
CHAPTER 3
3.1 Disks
Compared with the time it takes to access an item in memory, disk accesses are
always expensive. However, not all disk accesses are-equally expensive. This
has to do with the way a disk drive works. Disk drives1 belong to a class of
devices known as direct access storage devices (DASDs) because they make it
possible to access data directly. DASDs are contrasted with serial devices, the
other major class of secondary storage devices. Serial devices use media such
as magnetic tape that permit only serial access,. which means that a particular
data item cannot be read or written until all of the data preceding it on the tape
have been read or written in order.
Magnetic disks come in many forms. So-called hard disks offer high
capacity and low cost per bit. Hard disks are the most common disk used in
everyday file processing. Floppy disks are inexpensive, but they are slow and
hold relatively little data. Floppies are good for backing up individual files or
other floppies and for transporting small amounts of data. Removable disks use
disk cartridges that can be mounted on the same drive at different times,
providing a convenient form of backup storage that also makes it possible to
access data directly. The Iomega Zip (100 megabytes per cartridge) and Jaz (1
gigabyte per cartridge) have become very popular among PC users.
Nonmagnetic disk media, especially optical discs, are becoming
increasingly important for secondary storage. (See Sections 3.4 and 3.5 and
Appendix A for a full treatment of optical disc storage and its applications.)
1. When we use the terms disks or disk drives, we are referring to magnetic disk media.
Disks 47
Tracks Sectors
Disk drives typically have a number of platters. The tracks that are directly
above and below one another form a cylinder (Fig. 3.3). The signif- icance of
the cylinder is that all of the information on a single cylinder can
48
Chapter 3 Secondary Storage and System Software
be accessed without moving the arm- that holds the read/write heads. Moving
this arm is called seeking. This arm movement is usually the slowest part of
reading information from a disk.
The amount of data that can be held on a track and the number of tracks
on a surface depend on how densely bits can be stored on the disk surface.
(This in turn depends on the .quality of the recording medium and the size of
the read/write heads.) In 1991, an inexpensive, low-density disk held about 4
kilobytes on a track and 35 tracks on a 5-inch platter. In 1997, a Western
Digital Caviar 850-megabyte disk, one of the smallest disks being
manufactured, holds 32 kilobytes per track and 1,654 tracks on each surface
of a 3-inch platter. A Seagate Cheetah high performance 9-gigabyte disk (still
3-inch platters) can hold about 87 kilobytes on a track and 6526 tracks on a
surface. Table 3.1 shows how a variety of disk drives compares in terms of
capacity, performance, and cost.
Since a cylinder consists of a group of tracks, a track consists of a group
of sectors, and a sector consists of a group of bytes, it is easy to compute
track, cylinder, and drive capacities.
Track capacity = number of sectors per track X bytes.per sector Cylinder
capacity = number of tracks per cylinder X track capacity Drive capacity =
number of cylinders X cylinder capacity.
^-QOO = 25 000
sectors 2
One cylinder can hold ■ •
63 X 16 = 1008 sectors
so the number of cylinders required is approximately
= 24.8 cylinders
1008 ■
Of course, it may be that a disk drive with 24.8 cylinders of available space
does, not have 24.8 physically contiguous cylinders available. In this likely
case, the file might, in fact, have to be spread out over dozens, perhaps
even.hundreds, of cylinders.
When you want to read a series of sectors that are all in the same track,
one right after the other, you often cannot read adjacent sectors. After reading
the data, it takes the disk controller a certain amount of time to process the
received information before it is ready to accept more. If logically adjacent
sectors were placed on the disk so they were also physically adjacent, we
would miss the start of the following sector while we were processing the one
we had just read in. Consequently, we would be able to read only one sector
per revolution of the disk.
I/O system designers have approached this problem by interleaving the
sectors: they leave an interval of several physical sectors between logically
adjacent sectors. Suppose our disk had an interleaving factor of 5. The
assignment of logical sector content to the thirty-two physical sectors in a
track is illustrated in Fig. 3.4(b). If you study this figure, you can see that it
takes five revolutions to read the entire thirty-two sectors of a track. That is a
big improvement over thirty-two revolutions.
In the-early 1990s, controller speeds improved so that disks can now
offer 1:1 interleaving. This means that successive sectors are physically
adjacent, making it possible to read an entire track in a single revolution of the
disk.
52 Chapter 3 Secondary Storage and System Software
Clusters
Extents
Our final view of sector organization represents a further attempt to emphasize
physical contiguity of sectors in a file and to minimize seeking even more. (If
you are getting the idea that the avoidance of seeking is an important part of
hie design, you are right.) If there is a lot of free room on a disk, it may be
possible to make a file consist entirely of contiguous clusters. When this is the
case, we say that the file consists of one extent: all of its sectors, tracks, and
(if it is large enough) cylinders form one contiguous whole (Fig. 3.6a on page
54). This is a good situation, especially if the file is to be processed
sequentially, because it means that the whole file can be accessed with a
minimum amount of seeking.
2. It is not always physically contiguous; the degree of physical contiguity is determined by the
interleaving factor.
Disks 53
Figure 3.5 The file manager determines which cluster in the file has the
sector that is to be accessed.
Fragmentation
Generally, all sectors on a given drive must contain the same number of bytes.
If, for example, the size of a sector is 512 bytes and the size of all records in a
file is 300 bytes, there is no convenient fit between records and sectors. There
are two ways to deal with this situation: store, only one record per sector, or
allow records to span sectors so the beginning of a record might be found in
one sector and the end of it in another (Fig. 3.7).
The first option has the advantage that any record can be retrieved by
retrieving just one sector, but it has the disadvantage that it might leave an
enormous amount of unused space within each sector. This loss of space
54 Chapter 3 Secondary Storage and System Software
Figure 3.6 File extents (shaded area represents space on disk used by a
single file).
within a sector is called internal fragmentation. The second option has the
advantage that it loses no space from internal fragmentation, but it has the
disadvantage that some records may be retrieved only by accessing two sectors.
Another potential source of internal fragmentation results from the . use
of clusters. Recall that a cluster is the smallest unit of space that can be
allocated for a file. When the number of byte's in a file is not an exact multiple
of the cluster size, there will be internal fragmentation in the last extent of the
file. For instance, if a cluster consists of three 512-byte sectors, a file containing
1 byte would use up 1536 bytes on the disk; 1535 bytes would be wasted due
to internal fragmentation.
Clearly, there are important trade-offs in the use of large cluster sizes. A
disk expected to have mainly, large files that will often be processed
sequentially would usually be given a large cluster size, since internal frag-
mentation would not be a big problem and the performance gains might be
great. A disk holding smaller files or files that are usually accessed only
randomly would normally be set up with small clusters. -
Disks 55
(b)
(b)
(a)
(b)
Figure 3.9 Block addressing requires that each physical data block be accompanied by one
or more subblocks containing information about its contents.
20 000 = 15.38 = 15
1300
So fifteen blocks, or 150 records, can be stored per track. (Note that we
have to take the floor of the result because a block cannot span two
tracks.)
2. If there are sixty 100-byte records per block, each block holds 6000 bytes
of data and uses 6300 bytes of track space. The number of blocks per track
can be expressed as •
20 000 _ 3
6300
Seek Time
Seek time is the time required to move the access arm to the correct cylinder.
The amount of time spent seeking during a disk access depends, of course, on
how far the arm has to move. If we are accessing a file sequentially and the
file is packed into several consecutive cylinders, seeking needs to be done
only after all the tracks on a cylinder have been processed, and then the
read/write head needs to move the width of only one track. At the other
extreme, if we are alternately accessing sectors from two files that are stored
at opposite extremes on a disk (one at the innermost cylinder, one at the
outermost cylinder), seeking is very expensive. •
Seeking is likely to be more costly in a multiuser environment, where
several processes are contending for use of the disk at one time, than in a
single-user environment, where disk usage is dedicated to one process.
Since seeking can be very costly, system designers often go to great
extremes to minimize seeking. In an application that merges three files, for
example, it is not unusual to see the three input files stored on three different
drives and the output file stored on a fourth drive, so no seeking need be done
as I/O operations jump from file to file.
Since it is usually impossible to know exactly how many tracks will be
traversed in every seek, we usually try to determine the average seek time
required for a particular file operation. If the starting and ending positions for
each access are random, it turns out that the average seek traverses one- third
of the total number of cylinders that the read/write head ranges over.3
Manufacturers’ specifications for disk drives often list this figure as the
average seek time for the drives. Most hard disks available today have
average-seek times of less than 10 milliseconds (msec), and high-perfor-
mance disks have average seek times as low as 7.5 msec.
Rotational Delay
Rotational delay refers to the time it takes for the disk to rotate so the sector
we want is under the read/write head. Hard disks usually rotate at about 5000
rpm, which is one revolution per 12 msec. On average, the rotational delay is
half a revolution, or about 6 msec. On floppy disks, which often rotate at only
360 rpm, average rotational delay is a sluggish
83.3 msec.
3. Derivations of this result, as well as more detailed and refined models, can be found in Wiederhold
(1983), Knuth (1998), Teory and Fry (1982), and Salzberg (1988).
60 Chapter 3 Secondary Storage and System Software
As in the case of seeking, these averages apply only when the read/write
head moves from some random place on the disk surface to the target track. In
many circumstances, rotational delay can be much less than the average. For
example, suppose that you have a file that requires two or more tracks, that
there are plenty of available tracks on one cylin- .der, and that you write the
file to disk sequentially, with one write call. When the first track is filled, the
disk can immediately begin writing to the second track, without any rotational
delay. The “beginning” of the second track is effectively staggered by just the
amount of time it takes to switch from the read/write head on the first track to
the read/write head on the ' second. Rotational delay, as it were, is virtually
nonexistent. Furthermore, when you read the file back, the position of data on
the second track ensures that there is no rotational delay in switching from one
track to another. Figure 3.10 illustrates this staggered arrangement.
TransferTime .
Once the data we want is under the read/write head, it can be transferred. The
transfer time is given by the formula
„ r . number of bytes transferred w
Transfer time = ------------------ J ------------- : ---- X rotation time
number of bytes on a track
If a drive is sectored, the transfer time for one sector depends on the number of
sectors on a track. For example, if there are sixty-three sectors per track, the
time required to transfer one sector would be 1/63 of a revo-
Figure 3.10 When a single file can span several tracks on a cylinder, we can
stagger the beginnings of the tracks to avoid rotational delay when moving
from track to track during sequential access.
Disks 61
lution, or 0.19 msec. The Seagate Cheetah rotates at 10 000 rpm. The transfer
time for a single sector (170 sectors per track) is 0.036 msec. This results in a
peak transfer rate of more than 14 megabytes per second.
4. The term cache (as opposed to disk cache) generally refers to a very high-speed block of primary memory
that performs the same types of performance-enhancing operations with respect to memory that a
disk cache does with respect to secondary memory.
for the tape, which would be mounted by an operator onto a tape drive. The
application could then directly read and write on the tape. The tremendous
reduction in the cost of disk systems has changed the way tapes are used. At
present, tapes are primarily used as archival storage. That is, data is written to
tape to provide low cost storage and then copied to disk whenever it is needed.
Tapes are very common as backup devices for PC systems. In high
performance and high volume applications, tapes are commonly stored in
racks and supported by a robot system that is capable of moving tapes between
storage racks and tape drives.
Digital linear
tape
HP Colorado one-quarter inch manual 1.6 GB helical 0.5 MB/sec
T3000 cartridge
robot silo
StorageTek 50 GB helical . 10 MB/sec
Newer tape systems are usually based on a tape cartridge medium where
the tape and its reels are contained in a box. The tape media formats that are
available include 4 mm, 8 mm, VHS, 1/2 inch, and 1/4 inch.
The parity bit is not part of the data but is used to check the validity of the
data. If odd parity is in effect, this bit is set to make the number of 1 bits in the
frame odd. Even parity works similarly but is rarely-used with tapes.
Frames (bytes) are grouped into data blocks whose size can vary from a
few bytes to many kilobytes, depending on the needs of the user. Since tapes
are often read one block at a time and since tapes cannot stop or start
instantaneously, blocks are separated by interblock gaps, which contain no
information and are long enough to permit stopping and starting. When tapes
use odd parity, no valid frame can contain all 0 bits, so a large number of
consecutive 0 frames is used to fill the interrecord gap.
Tape drives come in many shapes, sizes, and speeds. Performance
differences among drives can usually be measured in terms of three quantities:
Tape density—commonly 800,1600, or 6250 bits per inch (bpi) per track, but
recently as much as 30 000 bpi;
Tape speed—commonly 30 to 200 inches per second (ips); and Size of
interblock gap—commonly between 0.3 inch and 0.75 inch..
Note that a 6250-bpi nine-track tape contains .6250 bits per inch per track, and
6250 bytes per inch when the full nine tracks are taken together. Thus in the
computations that follow, 6250 bpi is usually taken to mean 6250 bytes of data
per inch.
Ma gn e ti c Ta p e 69
and the space requirement for interblock gaps decreases from 300 000 inches
to 6000 inches. The space requirement for the data is of course the same as was
previously. What has changed is the relative amount of space occupied by the
gaps, as compared to the data. Now a snapshot of the tape would look much
different:
TT Y ..... H..........V A
rc Gap Data
I..
Data Gap Data Gap Data Gap Data
We leave it to you to show that the file can fit easily on one 2400-foot tape
when a blocking factor of 50 is used.
When we compute the space requirements for our file, we produce
numbers that are quite specific to our file. A more general measure of the effect
of choosing different block sizes is effective recording density. The effective
recording density is supposed to reflect the amount of actual data
.that can be stored per inch of tape. Since this depends exclusively on the
relative sizes of the interblock gap and the data block, it can be defined as
number of bytes per block ;
number of inches required to store a block
When a blocking factor of 1 is used in our .example, the number of bytes per
block is 100, and the number of inches required to store a block is 0.316.
Hence, the effective recording density is
100 bytes
------ --------- -- 316.4 bpi
0.316 inches
which is a far cry from the nominal recording density of 6250 bpi.
MagneticTape 71
Either way you look at it, space utilization is sensitive to the relative sizes
of data blocks and interblock gaps. Let us now see how they affect the amount
of time it takes to transmit tape data.
In the past, magnetic tape and magnetic disk accounted for the lion’s share of
all secondary storage applications. Disk was excellent for random access 'and
storage of files for which immediate access was desired; tape was ideal for
processing data sequentially and for long-term storage of files. Over time,
these roles have changed somewhat in favor of disk.
. The major reason that tape was preferable to disk for sequential processing
is that tapes are dedicated to one process, while disk generally serves several
processes. This means that between accesses a disk read/write head tends to
move away from the location where the next sequential access will occur,
resulting in an expensive seek; the tape drive, being dedicated to one process,
pays no such price in seek time.
This problem of excessive seeking has gradually diminished, and disk has
taken over much of the secondary storage niche previously occupied by tape.
This change is largely because of the continued dramatic decreases in the cost
of disk and memory storage. To understand this change fully, we need to
understand the role of memory buffer space in performing I/O. 5 Briefly, it is
that performance depends largely on how big a chunk of file we can transmit at
any time; as more memory space becomes available for I/O buffers, the
number of accesses decreases correspondingly, which means that the number
of seeks required goes down as well. Most systems now available, even small
systems, have enough memory to decrease the number of accesses required to
process most files that disk becomes quite competitive with tape for sequential
processing. This change, along with the superior versatility and decreasing
costs of disks, has resulted in use of disk for most sequential processing, which
in the past was primarily the domain of tape.
This is not to say that tapes should not be used for sequential processing.
If a file is kept on tape and there are enough drives available to use them for
sequential processing, it may be more efficient to process the file directly from
tape than to stream it to disk and process it sequentially. .
Although it has lost ground to disk in sequential processing applications,
tape remains important as a medium for long-term archival storage. Tape is
still far less expensive than magnetic disk, and it is very easy and fast to stream
large files or sets of files between tape and disk. In this context, tape has
emerged as one of our most important media (along with CD-ROM) for
tertiary storage.
6. Usually we spell disk with a k, but the convention among optical disc manufacturers is to spell it with a c.
74 Chapter 3 Secondary Storage and System Software
From the outset, there was an interest in using Laser Vision discs to do
more than just record movies. The LaserVision format supports recording in
both a constant linear velocity (CLV) format that maximizes storage capacity
and a constant angular velocity (CAV) format that enables fast seek
performance. By using the CAV format to access individual video frames
quickly, a number of organizations, including the MIT Media Lab, produced
prototype interactive video discs that could be used to teach and entertain.
In the early 1980s, a number of firms began looking at the possibility of
storing digital, textual information on LaserVision discs. LaserVision stores
data in an analog form; it is, after all, storing an analog video signal. Different
firms came up with different ways of encoding digital information in analog
form so it could be stored on the disc. The capabilities demonstrated in the
prototypes and early, narrowly distributed products were impressive. The
videodisc has a number of performance characteristics that make it a
technically more desirable medium than the CD- ROM; in particular, one can
build drives that seek quickly and deliver information from the disc at a high
rate of speed. But, reminiscent.of the earlier disputes over the physical format
of the videodisc, each of these pioneers in the use of LaserVision discs as
computer peripherals had incompatible encoding schemes and error correction
techniques. There was no standard format, and none of the firms was large
enough to impose its format over the others through sheer marketing muscle.
Potential buyers were frightened by the lack of a standard; consequently, the
market never grew.
During this same period Philips and Sony began work on a way to store
music on optical discs. Rather than storing the music in the kind of analog form
used on videodiscs, they developed a digital data format. Philips and Sony had
learned hard lessons from the expensive standards battles over videodiscs. This
time they worked with other players in the consumer products industry to
develop a licensing system that resulted in the emergence of CD audio as a
broadly accepted, standard format as soon as the first discs and players were
introduced. CD audio appeared in the United States in early 1984. CD-ROM,
which is a digital data format built On top of the CD audio standard, emerged
shortly thereafter. The first commercially available CD-ROM drives appeared
in 1985.
Not surprisingly, the firms that were delivering digital data on
LaserVision discs saw CD-ROM as a threat to their existence. They also
recognized, however, that CD-ROM promised to provide what had always
Introduction to CD-ROM 75
eluded them in the past: a standard physical format. Anyone with a CD- ROM
drive was guaranteed that he or she could find and read a sector off of any disc
manufactured by any firm. For a storage medium to be used in publishing,
standardization at such a fundamental level is essential.
What happened next is remarkable considering the history of standards and
cooperation within an industry. The firms that had been working on products to
deliver computer data from videodiscs recognized that a standard physical
format, such as that provided by CD-ROM, was not enough. A standard
physical format meant that everyone would be able to read sectors off of any
disc. But computer applications do not work in terms of sectors; they store data
in files. Having an agreement about finding sectors, without further agreement
about how to organize the sectors into files, is like everyone agreeing on an
alphabet without having settled on how letters are to be organized into words on
a page. In late 1985 the firms emerging from the videodisc/digital data industry,
all of which were relatively small, called together many of the much larger firms
moving into the CD-ROM industry to begin work on a standard file system that
would be built on top of the CD-ROM format. In a rare display of cooperation,
the different firms, large and small, worked out the main features of a file system
standard by early summer of 1986; that work has become an official
international standard for organizing files on CD-ROM.
The CD-ROM industry is still young, though in the past years it has begun
to show signs of maturity: it is moving away from concentration on such.matters
as disc formats to a concern with CD-ROM applications. Rather than focusing
on the new medium in isolation, vendors are seeing it as an enabling mechanism
for new systems. As it finds more uses in a broader array of applications,
CD-ROM looks like an optical publishing technology that will be with us over
the long term.
Recordable CD drives make it possible for users to store information on
CD. The price of the drives and the price of the blank recordable CDs make this
technology very appealing for backup. Unfortunately, while the speed of CD
readers has increased substantially, with 12X (twelve times CD audio speed) as
the current standard, CD recorders work no faster than 2X, or about 300
kilobytes per second.
The latest new technology for CDs is the DVD, which stands for Digital
Video Disc, or Digital Versatile Disk. The Sony Corporation has developed
DVD for the video market, especially for the new high definition TVs, but DVD
is also available for storing files. The density of both tracks and bits has been
increased to yield a sevenfold increase in storage
76 Chapter 3 Secondary Storage and System Software
The audio error correction would result in an average of one incorrect byte for
every two discs. The additional error correction information stored within the
2352-byte sector decreases this error rate to 1 uncor- rectable byte in every
twenty thousand discs.
As we say throughout this book, good file design is responsive to the nature of
the medium, making use of strengths and minimizing weaknesses. We begin,
then, by cataloging the strengths and weaknesses of CD-ROM.
transfer rate is fast enough relative to the CD-ROM’s seek performance that
we have a design incentive to organize data into blocks, reading more data
with each seek in the hope that we can avoid as much seeking as possible.
structures only once; users can enjoy the benefits of this investment again and
again.
Although the best mixture of devices for a computing system depends on the
needs of the system’s users, we can imagine any computing system as a
hierarchy of storage devices of different speed, capacity, and cost. Figure
3.17 summarizes the different types of storage found at different levels in
such hierarchies and shows approximately how they compare in terms of access
time, capacity, and cost.
What happens when a program writes a byte to a file on a dish? We know what
the program does (it makes a call to a write function), and we now know
something about how the byte is stored on a disk. But we haven’t looked at
what happens between the program and the disk. The whole story of what
happens to data between program and disk is not one we can tell here, but we
can give you an idea of the many different pieces of hardware and software
involved and the many jobs that have to be done by looking at an example of a
journey of 1 byte.
Suppose we want to append a byte representing the character P stored in
a character variable ch to a file named in the variable text f i l e stored
somewhere on a disk. From the program’s point of view, the entire journey that
the byte will take might be represented by the statement
write(textfile, ch, 1)
but the journey is much longer than this simple statement suggests.
The write statement results in a call to the computer’s operating system,
which has the task of seeing that the rest of the journey is completed
successfully (Fig. 3.18). Often our program can provide the operating
system'with information that helps it carry out this task more effectively, but
once the operating system has taken over, the job of overseeing the rest of the j
ourney is largely beyond our program’s control.
Figure 3.18 The write statement tells the operating system to send one
character to disk and gives the operating system the location of the
character. The operating system takes over the job of writing, and then
returns control to the calling program.
oo Chapter3 Secondary Storage and System Software
Logical
1. Theprogram asks the operating system to write the contents of the f
variable c to the next available position in TEXT. . ; j _ - ' v ..y
2. The operating system passes the job on to the file manager. ' " ' ‘ A
4. The file manager searches a file allocation table for the physical
location of the-sector that is to contain the byte. .;. -
5. The file managermakes sure that the last sector in the,file has been
stored..ina system I/O buffer in RAM, then deposits the‘P’ in to. its proper
position in the buffer.
6. The file manager gives instructions to the I/O processor about where the
byteis storedin RAM and where it needs to be sent on'the d i s k . -
7. The T/O processor, finds a time when the drive ls available to receive the
.data .and puts the data in proper format for the disk. It may' also ■,
bufferrthe data to send it out in chunks of the proper size for the -
disk. . • -VV:Vr
9. The controller instructs: the drive to move the read/write head to the :,
proper track, waits for the desired, sector to come under the *1 < -
read/write head, then sends the byte to the drive to be-deposited, bit-V
' by-bity: on the surface of the disk.
. y
Physical
physical aspects. Each layer calls the one below it, until, at the lowest level,
the byte is written to the disk.
. The file manager begins by finding out whether the logical characteristics
of the file are consistent with what we are asking it to do with the file. It may
look up the requested file in a table, where it finds out such things as whether
the file has been opened, what type of file the byte is being sent to (a binary
file, a text file, or some other organization), who the file’s owner is, and
whether write access is allowed for this particular user of the file.
The file manager must also determine where in the file t e x t f i l e the P is
to be deposited. Since the P is to be appended to the file, the file manager
needs to know where the end of the file is—the physical location of the last
sector in the file. This information is obtained from the file allocation table
(FAT) described earlier. From the FAT, the file manager locates the drive,
cylinder, track, and sector where the byte is to be stored.
Figure 3.20 The file manager moves P from the program's data ar.ea to a
system output buffer where it may join other bytes headed for the same place
on the disk. If necessary, the file manager may have to load the corresponding
sector from the disk into the system output buffer.
central processing unit. The byte has traveled along data paths that are
designed to be very fast and are relatively expensive. Now it is time for the byte
to travel along a data path that is likely to be slower and narrower than the one
in primary memory. (A typical computer might have an internal data-path
width of 4 bytes, whereas the width of the path leading to the disk might be
only 2 bytes.)
Because of bottlenecks created by these differences in speed and datapath
widths, our byte and its companions might have to wait for an external data
path to become available. This also means that the CPU has extra time on its
hands as it deals out information in small enough chunks and at slow enough
speeds that the world outside can handle them. In fact, the differences between
the internal and external speeds for transmitting data are often so great that the
CPU can transmit to several external devices simultaneously.
The processes of disassembling and assembling groups of bytes for
transmission to and from external devices are so specialized that it is
unreasonable to ask an expensive, general-purpose CPU to spend its valu
A Journey of a Byte 89
able time doing I/O when a simpler device could do the job and free the CPU to
do the work that it is most suited for. Such a special-purpose device is called an
I/O processor.
An I/O processor may be anything from a simple chip capable of taking a
byte and passing it along one cue, to a powerful, small computer capable of
executing very sophisticated programs and communicating with many devices
simultaneously. The I/O processor takes its instructions from the operating
system, but once it begins processing I/O, it runs independently, relieving the
operating system (and the CPU) of the task of communicating with secondary
storage devices. This allows I/O processes and internal computing to overlap. 7
In a typical computer, the file manager might now tell the I/O processor
that there is data in the buffer to be transmitted to the disk, how much data there
is, and where it is to go on the disk. This information might come in the form of
a little program that the operating system constructs and the I/O processor
executes (Fig. 3.21).
The job of controlling the operation of the disk is done by a device called
a disk controller. The I/O processor asks the disk controller if the disk drive is
available for writing. If there is much I/O processing, there is a good chance
that the drive will not be available and that our byte will have to wait in its
buffer until the drive becomes available.
What happens next often makes the time spent so far seem insignificant in
comparison: the disk drive is. instructed to move its read/write head to the track
and sector on the drive where our byte and its companions are to be stored. For
the first time, a device is being asked to do something mechanical! The.
read/write head must seek to the proper track (unless it is already there) and
then wait until the disk has spun around so the desired sector is under the head.
Once the track and sector are located, the I/O processor (or perhaps the
controller) can send out bytes, one at a time, to the drive. Our byte waits until
its turn comes; then it travels alone to the drive, where it probably is stored in a
little 1-byte buffer while it waits to be deposited on the disk.
Finally, as the disk spins under the read/write head, the 8 bits of our byte
are deposited, one at a time, on the surface of the disk (Fig. 3.21). There the P
remains, at the end of its journey, spinning at a leisurely 50 to 100 miles per
hour.
7. On many systems the I/O processor can take data directly from memory, without further involvement
from the CPU. This process is called direct memory access (DMA). On other systems, the CPU must
place the 'data in special I/O registers before the I/O processor can have access to it.
90 Chapters Secondary Storage and System Software
Figure 3.21 The file manager sends the I/O processor instructions in the
form of an I/O processor program.The I/O processor gets the data from
the system buffer, prepares it for storing on the disk, then sends it to the
disk controller, which deposits it on the surface of the disk.
Any user of files can benefit from some knowledge of what happens to data
traveling between a program’s data area and secondary storage. One aspect of
this process that is particularly important is the use of buffers. Buffering
involves working with large chunks of data in memory so the number of
accesses to secondary storage can be reduced. We concentrate on the operation
of system I/O buffers; but be aware that the use of buffers within programs can
also substantially affect performance.
To understand the need for several system buffers, consider what happens
if a program is performing both input and output on one character at a time and
only one I/O buffer is available. When the program asks for its first character,
the I/O buffer is loaded with the sector containing the character, and the
character is transmitted to the program. If the program then decides to output a
character, the I/O buffer is fdled with the sector into which the output character
needs to go, destroying its original contents. Then when the next input character
is needed, the buffer contents have to be written to disk to make room for the
(original) sector containing the second input character, and so on.
Fortunately, there is a simple and generally effective solution to this
ridiculous state of affairs, and that is to use more than one system buffer. For
this reason, I/O systems almost always use at least two buffers—one for input
and one for output.
Even if a program transmits data in only one direction, the use of a single
system I/O buffer can slow it down considerably. We know, for instance, that
the operation of reading a sector from a disk is extremely slow compared with
the amount of time it takes to move data in memory, so we can guess that a
program that reads many sectors from a file might have to spend much of its
time waiting for the I/O system to fill its buffer every time a read operation is
performed before it can begin processing. When this happens, the program that
is running is said to be I/O bound— the CPU spends much of its time just
waiting for I/O to be performed. The solution to this problem is. to use more
than one buffer and to have the I/O system filling the next sector or block of data
while the CPU is processing the current one.
Multiple Buffering
Suppose that a program is only writing to a disk and that it is I/O bound. The
CPU wants to be'filling a buffer at the same time that I/O is being performed. If
two buffers are used and I/O-CPU overlapping is permitted, the CPU can be
filling one buffer while the contents of the other are being transmitted to disk.
When both tasks are finished, the roles of the buffers can be exchanged. This
method of swapping the roles of two buffers after each output (or input)
operation is called double buffering. Double buffering allows the operating
system to operate on one buffer while the other buffer is being loaded or emptied
(Fig. 3.22).
92 Chapter 3 Secondary Storage and System Software
(a)
(b)
Figure 3.22 Double buffering: (a) the contents of system I/O buffer 1 are sent
to disk while I/O buffer 2 is.being filled; and (b) the contents of buffer 2 are
sent to disk while I/O buffer 1 is being filled.
It is based on the assumption that a block of data that has been used recently is
more likely to be needed in the near future than one that has been used less
recently. (We encounter LRU again in later chapters.)
It is difficult to predict the point at which the addition of extra buffers
ceases to contribute to improved performance. As the cost of memory
continues to decrease, so does the cost of using more and bigger buffers. On
the other hand, the more buffers there are, the more time it takes for the file
system to manage them. When in doubt, consider experimenting with different
numbers of buffers.
Scatter/Gather I/O
Suppose you are reading in a file with many blocks, and each block consists of
a header followed by data. You would like to put the headers in one buffer and
the data in a different buffer so the data can be processed as a single entity. The
obvious way to do this is to read the whole block into a single big buffer; then
move the different parts to their own buffers. Sometimes we can avoid this
two-step process using a technique called scatter input. With scatter input, a
single read call identifies not one, but a collection of buffers into which data
from a single block is to be scattered.
The converse of scatter input is gather output. With gather output,
several buffers can be gathered and written with a single write call; this avoids
the need to copy them to a single output buffer. When the cost of
94 Chapter 3 Secondary Storage and System Software
Fundamental File
Structure Concepts
When we list the output file on our terminal screen, here is what we see: AmesMaryl23
MapleStillwaterOK74075MasonAian90 EastgateAdaOK74820
The program writes the information out to the file precisely as specified, as a
stream of bytes containing no added information. But in meeting our
specifications, the program creates a kind of reverse FIumpty-Dumpty
problem. Once we put all that information together as a single byte stream,
there is no way to get it apart again.
There are many ways of adding structure to files to maintain the identity of
fields. Four of the most common methods follow:
■ Force the fields into a predictable length.
■ Begin each field with a length indicator.
■ Place a delimiter at the end of each field to separate it from the next, field.
■ Use a “keyword = value”, expression to identify each field and its contents.
1. Readers should not confuse the terms /ie/d and record with the meanings given to them by some
programming languages, including Ada. In Ada, a record is an aggregate data structure that can contain members
of different types, where each member is referred to as a field. As we shall see, there is often a direct
correspondence between these definitions of the terms and the fields and records that are used in files.
However, the terms field and record as we use them have much more general meanings than they do in
Ada.
Field and Record Organization 121
In C: In C++;
struct Person{ class Person { public: char
char last [11]; char last [11]; char first
first [11]; char [11] ; char address
address [16]; char [16]; char city [16];
city [16]; char state char state [3]; char
[3];1 char zip [10]; zip [10] ;
}; };
04Ames04Mary09123 Maplel0Stillwater02OK0574075
05Mason04Alanll90 Eastgate03Ada020K0574820
—^
last=AmesIfirst=MaryIaddress=123 Maple
Icity=StillwaterI
state=OKIzip=74075I
.-—
Figure 4.3 Four methods for organizing fields within records, (a) Each field is
of fixed length, (b) Each field begins with a length indicator, (c) Each field
ends with a delimiter |. (d) Each field is identified by a key word.
many applications. It is easy to tell which fields are contained in a: file, even
if we don’t know ahead of time which fields the file is supposed to contain. It
is also a good format for dealing with missing fields. If a field is missing, this
format makes it obvious, because the keyword is simply not there.
You may havenoticed in Fig. 4.3(d) that this format is used in combi-
nation with another format, a delimiter to separate fields. While this may not
always be necessary, in this case it is helpful because it shows the division
between each value and the keyword for the following field.
Unfortunately, for the address file this format also wastes a lot of space:
50 percent or more of the file’s space could be taken up by the keywords. But
there are applications in which this format does not demand so much
overhead. We discuss some of these applications in Section 5.6: “Portability
and Standardization.”
Figure 4.4 Extraction operator for reading delimited fields into a Person
object.
Clearly, we now preserve the notion of. a field as we store and retrieve
this data. But something is still missing. We do not really think of this file as a
stream of fields. In fact, the fields are grouped into records. The first six fields
form a record associated with Mary Ames. The next six are a record associated
with Alan Mason.
Field and Record Organization 125
saving the state (or value) of an object that is stored m memory; Reading a
record from a file into a memory resident object restores the state of the
object. It is our goal in designing file structures to facilitate this transfer of
information between memory and files. We will use the term object to refer to
data residing in memory and the term record to refer to data residing in a file.
In C++ we use class declarations to describe objects that reside in
memory. The members, or attributes, of an object of a particular class
correspond to the. fields that need to be stored in a file record. The C++
programming examples are focused on adding methods to classes to support
using files to preserve the state of objects.
Following are some of the most often used methods for organizing the
records of a file:
the chapters that follow, fixed-length record structures are among the most
commonly used methods for organizing files.
The C structure Person (or the C++ class of the same name) that we define
in our discussion of fixed-length fields is actually an example of a fixed-length
record as well as an example of fixed-length fields. We have a fixed number
of fields, each with a predetermined length, that combine to make a fixed-
length record. This kind of field and record structure is illustrated in Fig. 4.5(a).
It is important to realize, however, that fixing the number of bytes in a
record does not imply that the size or number of fields in the record must be
fixed. Fixed-length records are frequently used as containers to hold variable
numbers of variable-length fields. It is also possible to mix fixed- and variable-
length fields within a record. Figure 4.5(b) illustrates how variable-
length fields might be placed in a fixed-length record.
(a)
(b)
Figure 4.5 Three ways of making the lengths of records constant and predictable,
(a) Counting bytes:fixed-length records with fixed-length fields, (b) Counting
bytes: fixed-length records with variable-length fields, (c) Counting fields: six
fields per record. ■ •
Figure 4.6 Record structures for variable-length records, (a) Beginning each record with
a length indicator, (b) Using an index file to keep track of record addresses, (c) Placing
the delimiter # at the end of each record.
Figure 4.10 The number 40, stored as ASCII characters and as a short integer.
pretations of the bytes in the file in ASCII and hexadecimal. These repre-
sentations were requested on the command line with the -xc flag (x = hex; c =
character).
Let’s look at the first row of ASCII values. As you would expect, the
data placed in the file in ASCII form appears in this row in a readable way.
But there are hexadecimal values for which there is no printable ASCII
representation. The only such value appearing in this file is 0 x 00. But there
could be many others. For example, the hexadecimal value of the number
500 000 000 is 0xlDCD6500. If you write this value out to a file, an od of the
file with the option-xc looks like this:
0000000 \03 5\315 e \0 .,
- Idcd 6500
The only printable byte in this file is the one with the value 0x65 (e). Od
handles all of the others by listing their equivalent octal values in the ASCII
representation.
The hex dump of this output from writrec. shows how this file structure
represents an interesting mix Qf a number of the organizational tools we have
encountered. In a single record we have both binary and ASCII data. Each
record consists of a fixed-length field (the byte count) and several delimited,
variable-length fields. This kind of mixing of different data types and
organizational methods is common in real-world file structures.
Now that we understand how to use buffers to read and write information, we
can use C++ classes to encapsulate the pack, unpack, read, and write
operations of buffer objects. An object of one of these buffer classes can be
used for output as follows: start with an empty buffer object, pack field values
into the object one by one, then write the buffer contents to an output stream.
For input, initialize a buffer object by reading a record from an input stream,
then extract the object’s field values, one by one. Buffer objects support only
this behavior. A buffer is not intended to allow' modification of packed values
nor to allow pack and unpack operations to be mixed. As the classes are
described, you will see that no direct access is allowed to the data members
that hold the contents of the buffer. A considerable amount of extra error
checking has been included in these classes.
There are three classes defined in this section: one for delimited fields,
one for length-based fields, and one for fixed-length fields. The first.two field
types use variable-length records for input and output. The fixed- length
fields are stored in fixed-length records, j
definition of the class has the following method for packing delimited text
buffers. The unpack operation is equally simple:
int Person::Pack (DelimTextBuffer & Buffer) const {// pack
the fields into a DelimTextBuffer int result;
result = Buffer . Pack (LastName); result =
result && Buffer . Pack (FirstName);
result = result && Buffer . Pack (Address);
result = result && Buffer . Pack (City);
'result = result && Buffer-. Pack (State);
result = result && Buffer . Pack (ZipCode);
return result;
}.
in Appendix E. The only changes that are apparent from this figure are the
name of the class and the elimination of the delim parameter on the
constructor. The code for the Pack and Unpack methods is substantially
different, but the Read and Write methods are exactly the same.
Class FixedTextBuffer, whose main members and methods are in Fig.
4.13 (full class in fixt ext .hand f ixtext. cpp),is different in two ways from the
other two classes. First, it uses a fixed collection of fixed-length fields. Every
buffer value has the same collection of fields, and the Pack method needs no
size parameter. The second difference is that it uses fixed-length records.
Hence, the Read and Write methods do not use a length indicator for buffer
size. They simply use the fixed size of the buffer to determine how many
bytes to read or write.
The method AddField is included to support the specification of the fields
and their sizes. A buffer for objects of class Person is initialized by the new
method Ini tBuffer of class Person:
int Person::InitBuffer (FixedTextBuffer & Buffer)
// initialize a FixedTextBuffer to be used for Person objects ------------
{.
Buffer . Init (6, 61);//6 fields, 61 bytes total
Buffer . AddField (10); // LastName [11] ;
Buffer . AddField (10); // FirstName [11];
Buffer . AddField (15); // Address [16];
class FixedTextBuffer {
public:
FixedTextBuffer (int maxBytes = 1000); int AddField (int
fieldSize); int Read (istream & file); int Write (ostream &
file) const; int Pack (const char * field); int Unpack (char
* field); private:
char * Buffer; // character array to hold field values int BufferSize;
// size of packed fields
int MaxBytes; // maximum number of characters in the buffer int NextByte;
// packing/unpacking position in buffer int * FieldSizes; // array of
field sizes
A reading of the cpp files for the three classes above shows a striking simi-
larity: a large percentage of the code is duplicated. In this section, we elim-
inate almost all of the duplication through the use of the inheritance
capabilities of C++.
virtual is used to ensure that class i'os is included only once in the ancestry of
any of these classes.
Objects of a class are also objects of their base classes, and generally,
include members and methods of the' base classes.. An object of class
fstream, for example, is also an object of classes fstreambase, iostream,
istream, ostream, and ios and includes all of the members and methods of
those base classes. Hence, the read method and extraction (>>) operations
defined in istream are also available in iostream, if stream, and fstream. The
openand close operations of class f streambase are also members of class
fstream.
An important benefit of inheritance is that operations that work on base
class objects also work on derived class objects. We had an example of this
benefit in the function ReadVariablePerson in Section 4.1.5 that used an
istrstream object strbuf f to contain a string buffer. The code of that function
passed s trbuf f as an argument to the person extraction function that expected
an istream argument. Since istrstream is derived from istream, strbuff is an
istream object and hence can be manipulated by this istream operation.
classes derived from the class. The protected members of IOBuffer can be
used by methods in all of the classes in this hierarchy. Protected members of
VariableLengthBuf f er can be used in its subclasses but not in classes
IOBuffer and FixedLengthBuf f er.
The constructor for class IOBuffer has a single parameter that specifies,
the maximum size of the buffer. Methods are declared for reading, writing,
packing, and unpacking. Since the implementation of these methods depends
on the exact nature of the record and its fields, IOBuffer must leave its
implementation to the subclasses.
Class IOBuffer defines these methods as virtual to allow each subclass
to define its own implementation. The = 0 declares a pure virtual
method. This means that the class IOBuf f er does not include an imple-
mentation of the method. A class with pure virtual methods is an abstract
class. No objects of such a class can be created, but pointers and references to
objects of this class can be declared.
The full implementation of read, write, pack, and unpack operations for
delimited text records is supported by two more classes. The reading and
writing of variable-length records are included in the class VariableLengthBuf
f er, as given in Figure 4.16 and files varlen. h and varlen.cpp. Packing and
unpacking delimited fields is in class . DelimitedPieldBuf f er and in files
delim.h and delim. cpp. The code to implement these operations follows the
same structure as in Section 4.2 but incorporates additional error checking.
The Write method of VariableLengthBuf fer is implemented as follows:
int VariableLengthBuf fer Write (ostream & stream) const // read the
length and buffer from the stream { '
int recaddr = stream , tellp (); unsigned short bufferSize =
BufferSize;
stream . write ((char *)&bufferSize, sizeof(bufferSize));
if ( Istream) return —1;
stream . write (Buffer, BufferSize);
if (1 stream.good ()) return -1;
return recaddr;
The method is implemented to test for all possible errors and to return
information to the calling routine via the return value. We test for failure in
the write operations using the expressions ! stream and I stream. good (),
which are equivalent. These are two different ways to test if the stream has
experienced an error. The Write method returns the address in the stream
where the record was written. The address is determined by calling stream,
tellg () at the beginning of the function. Tellg is a method of ostream that
returns the current location of the put pointer of the stream. If either of the
write operations fails, the value-1 is returned.
An effective strategy for making objects persistent must make it easy for
an application to move objects from memory to files and back correctly. One
of the crucial aspects is ensuring that the fields are packed and unpacked in
the same order. The class Person has been extended to include pack and
unpack operations. The main purpose of these operations is to specify an
ordering on the fields and to encapsulate error testing. The unpack operation
is:
Using Inheritance for Record Buffer Classes 143
};
Person MaryAmes;
DelimFieldBuffer Buffer;
MaryAmes . Unpack (Buffer);
The full implementation of the I/O buffer classes includes class
LengthFieldBuffer, which supports field packing with length plus ■ value
representation. This class is like DelimFieldBuf fer in that it is implemented by
specifying only the pack and unpack methods. The read and write operations are
supported by its base class, VariableLengthBuffer.
field lengths, class FixedFieldBuffer keeps track of the field sizes. The
protected member FieldSize holds the field sizes in an integer array. The
AddField method is used to specify field sizes. In the case of using a
FixedFieldBuf fer'to hold objects of class Person, the InitBuf f er method can
be used to fully initialize the buffer:
int Person:,: Ini tBuf fer (FixedFieldBuffer & Buffer)
// initialize a FixedFieldBuffer to be used for Persons { ■ int result;
result = Buffer . AddField (10); // LastName [11]; result =
result && Buffer . AddField(10); //. FirstName [11];
result = result && Buffer . AddField (15); //Address [16];
result = result && Buffer . AddField (15); // City [16] ;
result = result && Buffer .AddField (2); // State [3];
result = result && Buffer . AddField (9); // ZipCode [10]; return result;
) '/ '
Starting with a buffer with no fields, Ini tBuf fer adds the fields one at a time,
each with its own size. The following code prepares a buffer for use in
reading and writing objects of class Person:
FixedFieldBuffer Buffer(-6, 61) ; // 6 fields, 61 bytes total
HaryAmes.InitBuffer (Buffer);
Now that we know how to transfer objects to and from files, it is appropriate
to encapsulate that knowledge in a class that supports all of our file
operations. Class BufferFile (in files buffile.h and buf f ile . cpp of Appendix
F) supports manipulation of files that are tied to specific buffer types. An
object of class Buf ferFile is created from a specific buffer object and can be
used to open and create files and to read and write records. Figure 4.18 has the
main data methods and members of Buf ferFile.
Once a BufferFile object has been created and attached to an operating
system file, each read or write is performed using the same buffer. Hence,
each record is guaranteed to be of the same basic type. The following code
sample shows how a file can be created and used with a DelimFieldBuffer:
DelimFieldBuffer buffer;
BufferFile file (buffer); file .
Open (myfile); file . Read {) ;
buffer . Unpack (myobject);
class BufferFile
{public:
BufferFile (IOBuffer &); // create with a buffer int Open (char * filename,
int MODE); // open an existing file int Create (char * filename, int MODE);
// create a new file int Close () ;
int Rewind (); // reset to the first data record // Input
and Output operations
int Read (int recaddr = -1); ■ • ■
int Write (int recaddr = -1);
int Append (); // write the current buffer at the end of file protected: •
IOBuffer & Buffer; // reference to the file’s buffer fstream
File; // the C++ stream of the file
Managing F i l e s of
Records
The reason a name is a risky choice for a primary key is that it contains a
real data value. In general, primary keys should be dataless. Even when we
think we are choosing a unique key, if it contains data, there is a danger that
unforeseen identical values could occur. Sweet (1985) cites an example of a
file system that used a person’s social security number as a primary key for
personnel records. It turned out that, in the particular population that was
represented in the file, there was a large number of people who were not United
States citizens, and in a different part of the organization, all of these people
had been assigned the social security number 999-99-9999!
Another reason, other than uniqueness, that a primary key should be
dataless is that a primary key should be unchanging. If information that
corresponds to a certain record changes and that information is contained in a
primary key, what do you do about the primary key? You probably cannot
change the primary key, in most cases, because there are likely to be reports,
memos, indexes, or other sources of information that refer to the record by its
primary key. As soon as you change the key, those references become useless.
A good rule of thumb is to avoid putting data into primary keys. If we
want to access records according to data content, we should assign this content
to secondary keys. We give a more detailed look at record access by primary
and secondary keys in Chapter 6. For the rest of this chapter, we suspend our
concern about whether a key is primary or secondary and concentrate on
finding things by key.
1. If you are not familiar with this “big-oh” notation, you should look it up. Knuth (1997) is a good source.
Chapter 5 Managing Files of Records
the disk. Once data transfer begins, it is relatively fast, although still much
slower than a data transfer within memory. Consequently, the cost of seeking
and reading a record, then seeking and reading another record, is greater than
the cost of seeking just once then reading two successive records. (Once again,
we are assuming a multiuser environment in which a seek is required for each
separate Read call.) It follows that we should be able to improve the
performance of sequential searching by reading in a block of several records all
at once and then processing that block of records in memory.
We began the previous chapter with a stream of bytes. We grouped the .
bytes into fields, then grouped the fields into records. Now we are considering a
yet higher level of organization—-grouping records into blocks. This new level
of grouping, however, differs from the others. Whereas fields and records are
ways of maintaining the logical organization within' the file, blocking is done
strictly as a performance measure. As such, the block size is usually related
more to the physical properties of the disk drive than to the content of the data.
For instance, on sector-oriented disks, the block size is almost always some
multiple of the sector size.
Suppose that we have a file of four thousand records and that the average
length of a record is 512 bytes. If our operating system uses sector- . sized
buffers of 512 bytes, then an unblocked sequential search requires, on the
average, 2,000 Read calls before it can retrieve a particular record. By blocking
the records in groups of sixteen per block so each Read call brings in 8
kilobytes worth of records, the number of reads required for an average search
comes down to 125. Each Read requires slightly more time, since more data is
transferred from the disk, but this is a cost that is usually well worth paying for
such a large reduction in the number of reads.
There are several things to note from this analysis and discussion of record
blocking:
■ Although blocking can result in substantial performance improvements, it
does not change the order of the sequential search operation. The cost of
searching is still O(n), increasing in direct proportion to increases in the
size of the file.
■ Blocking clearly reflects the differences between memory access speed
and the cost of accessing secondary storage.
■ Blocking does not change the number of comparisons that must be done in
memory, and it probably increases the amount of data transferred between
disk and memory. (We always read a whole block, even if the record we
are seeking is the first one in the block.)
■ Blocking saves time because it decreases the amount of seeking. We find,
again and again, that this differential between the cost of seeking and the
cost of other operations, such as data transfer or memory access, is the
force that drives file structure design.
Unix provides a rich array of tools for working with files in this form.
Since this kind of file structure is inherently sequential (records are variable in
length, so we have to pass from record to'record to find any particular field or
record), many of these tools process files sequentially.
Suppose, for instance, that we choose the white-space/new-line structure
for our address file, ending every field with a tab and ending every record with
a new line. While this causes some problems in distinguishing fields (a blank is
white space, but it doesn’t separate a field) and in that sense is not an ideal
structure, it buys us something very valuable: the full use of those Unix tools
that are built around the white-space/new-line structure. For example, we can
print the file on our console using any of a number of utilities, some of which
follow.
cat
% cat myfile
Ames Mary 123 Maple Stillwater OK 74075 MasonAlan
90 Eastgate Ada OK 74820
Or we can use tools like wc and grep for processing the files. . wc
The command wc (word count) reads through an ASCII file sequentially and
counts the number of lines (delimited by new lines), words (delimited by white
space), and characters in a file:
% wc myfile
2 14 76
grep
It is common to want to know if a text file has a certain word or character string
in it. For ASCII files that can reasonably be searched sequentially, Unix
provides an excellent filter for doing this called grep (and its variants egrep and f
grep). The word grep stands for generalized regular expression, which
describes the type of pattern that grep is able to recognize. In its simplest form,
grep searches sequentially through a file for a pattern. It then returns to standard
output (the console) all the lines in the file that contain the pattern.
% grep Ada myfile
MasonAlan 90 Eastgate Ada OK 74820
Record Access 161
We can also combine tools to create, on the fly, some very powerful file
processing software. For example, to find the number of lines containing the
word Ada and the number of words and bytes in those lines we use
we do not have an index. We assume that we know the relative record number
(RRN) of the record we want. RRN is an important concept that emerges from
viewing a file as a collection of records rather than as a collection of bytes. If a
file is a sequence of records, the RRN of a record gives its position relative to
the beginning of the file. The first record in a file has RRN 0, the next has RRN
1, and so forth.2
In our name and address file, we might tie a record to its RRN by assigning
membership numbers that are related to the order in which we enter the records
in the file. The person with the first record might have a membership number of
1001, the second a number of 1002, and so on. Given a membership number,
we can subtract 1001 to get the RRN of the record. •
What can we do with this RRN? Not much, given the file structures we
have been using so far, which consist of variable-length records. The RRN tells
us the relative position of the record we want in the sequence of records, but we
still have to read sequentially through the file, counting records as we go, to get
to the record we want. An. exercise at the end of this chapter explores a method
of moving through the file called skip sequential processing, which can
improve performance somewhat, but looking for a particular'RRN is still an
0(n) process.
To support direct access by RRN, we need to work with records of. fixed,
known length. If the records are all the same length, we can use a record’s RRN
to calculate the byte offset of the start of the record relative to the start of the
file. For instance, if we are interested in the record with an RRN of 546 and our
file has a fixed-length record size of 128 bytes per record, we can calculate the
byte offset as
Byte offset = 546 x 128 = 69 888
In general, given a fixed-length record file where the record size i$ r, the byte
offset of a record with an RRN of n is
Byte o,ffset = n x r
Programming languages and operating systems differ regarding where this
byte offset calculation is done and even whether byte offsets are used for
addressing within files. In C++ (and the .Unix and MS-DOS operating
systems), where a file is treated as just a sequence of bytes, the application
program does the calculation and uses the seekg and seekp methods to
2. In keeping with the conventions of C and C++, we assume that the RRN is a zero-based count. In some file
systems, the count starts at 1 rather than 0.
jump to the byte that begins the record. All movement within a file is in terms of
bytes. This is a very low-level view of files; the responsibility for translating an
RRN into a byte offset belongs wholly to the application program and not at all
to the. programming language or operating system.
Class FixedLengthBuf f er can be extended with its own methods DRead
and DWrite that interpret the recref argument as RRN instead of byte .address.
The methods are defined as virtual in class IOBuf f er to allow this. The code in
Appendix F does not include this extension; it is left as an exercise.
The Cobol language and the operating environments in which Cobol is
often used (OS/MVS, VMS) are examples of a much different, higher-level view
of files. The notion of a sequence of bytes is simply not present when you are
working with record-oriented files in this environment. Instead, files are viewed
as collections of records that are accessed by keys. The operating system takes
care of the translation between a key and a record’s location. In the simplest
case, the key isjust the record’s RRN, but the determination of location within the
file is still not the programmer’s concern.
disk. For instance, if we intend to store the records on a typical sectored disk
(see Chapter 3) with a sec te r size of 512 bytes or some other power of 2, we
might decide to pad the record out to 32 bytes so we can place an integral
number of records in a sector. That way, records will never span sectors.
The choice of a record length is more complicated when the lengths of the
fields can vary, as in our name and address file. If we choose a record length
that is the sum of our estimates of the largest possible values for all the fields,
we can be reasonably sure that we have enough space for everything, but we
also waste a lot of space. If, on the other hand, we are conservative in our use of
space and fix the lengths of fields at smaller values, we may have to leave
information out of a field. Fortunately, we can avoid this problem to some
degree by appropriate design of the field structure within a record.
In our earlier discussion of record structures, we saw that there are two
general approaches we can take toward organizing fields within a fixed- length
record. The first, illustrated in Fig. 5.1(a) and implemented in class
FixedFieldBuf f er, uses fixed-length fields inside the fixed-length record. This
is the approach we took for the sales transaction file previously described. The
second approach, illustrated in Fig. 5.1(b), uses the fixed-length record as a
kind of standard-sized container for holding something that looks like a
variable-length record.
The first approach has the virtue of simplicity: it is very easy to “break
out” the fixed-length fields from within a fixed-length record. The second
approach lets us take advantage of an aver aging-out effect that usually occurs:
the longest names are not likely to appear in the same record as the longest
address field. By letting the field boundaries vary, we can make
more efficient use of a fixed amount of space. Also, note that the two
approaches are not mutually exclusive. Given a record that contains a number
of truly fixed-length fields and some fields that have variable- length
information, we might design a record structure that combines these two
approaches.
One interesting question that must be resolved in the design of this kind
of structure is that of distinguishing the real-data portion of the record from the
unused-space portion. The range of possible solutions parallels that of the
solutions for recognizing variable-length records in any other context: we can
place a record-length count at the beginning of the record, we can use a special
delimiter at the end of the record, we can count fields, and so on. As usual,
there is no single right way to implement this file structure; instead we seek the
solution that is most appropriate for our needs and situation.
Figure 5.2 shows the hex dump output from the two styles of repre-
senting variable-length fields in a fixed-length record. Each file has a header
record that contains three 2-byte values: the size of the header, the number of
records, and the size of each record. A full discussion of headers is deferred to
the next section. For now, however, just look at the structure of the data
records. We have italicized the length fields at the start of the records in the file
dump. Although we filled out the records in Fig. 5.2b with blanks to make the
output more readable, this blank fill is unnecessary. The length field at the start
of the record guarantees that we do not read past the end of the data in the
record.
record, whereas the data records each contain 64 bytes. Furthermore, the data
records of this file contain only character data, whereas the header record
contains integer fields that record the header record size, the number of data
records, and the data record size.
Header records are a widely used, important file design tool. For example,
when we reach the point at which we are discussing the construction of
tree-structured indexes for files, we will see that header records are often
placed at the beginning of the index to keep track of such matters as the RRN
of the record that is the root of the index.
verifies that the file was created using fixed-size records that are the right size
for using the buffer object for reading and writing.
Another aspect of using headers in these classes is that the header can be
used to initialize the buffer. - At the end of FixedLengthBuf f er : :ReadHeader
(see Appendix F), after the buffer has been found to be uninitialized, the record
size of the buffer is set to the record size that was read from the header.
You will recall that in Section 4.5, “Ah Object-Oriented Class for Record
Files,” we introduced class B u f f e r F i l e as a way to guarantee the proper
interaction between buffers and files. Now that the buffer classes support
headers, Buf f e r F i l e : : Create puts the correct header in every file, and
B u f f e r : : Open either checks for consistency or initializes the buffer, as
appropriate. Buf f e r F i l e : : ReadHeader is called by Open and does all of its
work in a single virtual function call. Appendix F has the details of the
implementation of these methods.
Buf f e r F i l e : : Rewind repositions the get and put file pointers to the
beginning of the first data record—that is, after the header record. This method
is required because the Headersize member is protected. Without this method,
it would be impossible to initiate a sequential read of the file.
do not help because Person and Recording do not have a common base type."
It is the C++ template feature that solves our problem by supporting
parameterized function and class definitions. Figure 5.3 gives the definition
of the template class RecordFile.
#include "buffile.h"
If include "iobuffer.h”
// template class to support direct read and write of records // The template
parameter RecType must.support the following //. int Pack (BufferType &); pack
record into buffer .
// int Unpack {BufferType S c ) ; unpack record from buffer
The object Personf ile is a RecordFile that operates on Person objects. All
of the operations of RecordFile<Person> are available, including those from
the parent class Buf ferFile. The following code includes legitimate uses of
PersonFile:
Person person;
PersonFile.CreateCperson.dat", ios::in); // create a file
PersonFile.Read(person); // read a record into person
PersonFile.Append(person); // write person at end of file
PersonFile.Open("person.dat", ios-.:in) ; // open and check header
Template definitions in C++ support the reuse of code. We can write a
single class and use it in multiple contexts. The same RecordFile class
declared here and used for files of Person objects will be used in subsequent
chapters for quite different objects. No changes need be made to RecordFile to
support these different uses.
Program test file . cpp, in Appendix F, uses RecordFile to test all of the
buffer I/O classes. It also includes a template function, TestBuffer, which is
used for all of the buffer tests.
In the course of our discussions in this and the previous chapter, we have
looked at
■ Variable-length records,
■ Fixed-length records,
* Sequential access, and
■ Direct access.
The first two of these relate to aspects of file organization; the last two have to
do with file access. The interaction between file organization and file access is
a useful one; we need to look at it more closely before continuing.
Most of what we have considered so far falls into the category of file
organization:
■ Can the file be divided into fields?
■ Is there a higher level of organization to the file that combines the fields into
records?
■ Do all the records have the same number of bytes or fields?
■ How do we distinguish one record from another?
■ How do we organize the internal structure of a fixed-length record so we can
distinguish between data and extra space?
We have seen that there are many possible answers to these questions and that
the choice of a particular, file organization depends on many things, including
the file-handling facilities of the language you are using and the use you want
to make of the file.
Using a file implies access. We looked first at sequential access, ulti-
mately developing a sequential search. As long as we did not know where
individual records began, sequential access was the only option open to us.
When we wanted direct access, we fixed the length of our records, and this
allowed us to calculate precisely where each record began and to seek directly
to it.
In other words, our desire for direct access caused us to choose a fixed-
length record file organization. Does this mean that we can equate fixed-
length records with direct access? Definitely not. There is nothing about our
having fixed the length of the records in a file that precludes sequential access;
we certainly could write a program that reads sequentially through a fixed-
length record file.
Not only can we elect to read through the fixed-length records sequen-
tially but we can also provide direct access to variable-length records simply
by keeping a list of the byte offsets from the start of the file for the placement
of each record. We chose a fixed-length record structure for the files of Fig.
5.2 because it is simple and adequate for the data we wanted to store.
Although the lengths of our names and addresses vary, the variation is not so
great that we cannot accommodate it in a fixed-length record.
Consider, however, the effects of using a fixed-length record organization
to provide direct access to documents ranging in length from a few hundred
bytes to more than a hundred kilobytes. Using fixed-length
17 2 Chapter 5 Managing Files of Records
t
Organizing Files
for Performance
In this section we look at some ways to make files smaller. There are many
reasons for making files smaller. Smaller files
■ Use less storage, resulting in cost savings;
B • Can be transmitted faster, decreasing access time or, alternatively, allowing the
same access time with a lower and cheaper bandwidth; and B Can be processed
faster sequentially.
Data compression involves encoding the information in a file in such a
way that it takes up less space. Many different techniques are available for
compressing data. Some are very general, and some are designed for specific
kinds of data, such as speech, pictures, text, or instrument data. The variety of
data compression techniques is so large that we can only touch on the topic
here, with a few examples.
Letter: a b c d . e f g
Probability: 0.4 0.1 0.1 0.1 0.1 0.1 0.1
Code 1 010 Oil 0000 0001 0010 0011
Figure 6.2 Example showing the Huffman encoding for a set of seven letters, assuming
certain probabilities (from Lynch, 1985).
distinct codes can be stored together, without delimiters between them, and still be recognized.
effort. This is particularly easy'if you keep the deleted mark in a special field
rather than destroy some of the original data, as in our example.
The reclamation of space from the deleted records happens all at once.
After deleted records have accumulated for some time, a special program is
used to reconstruct the file with all the deleted records squeezed out as shown
in Fig. 6.3(c). If there is enough space, the simplest way to d'o this
compaction is through a file copy program that skips over the deleted
records. It is also possible, though more complicated and time-consuming, to
do the compaction in place. Either of these approaches can be used with both
fixed-and variable-length records.
The decision about how often to run the storage compaction program can
be based on either the number of deleted records or the calendar. In
accounting programs, for example, it often makes sense to run a compaction
procedure on certain files at the end of the fiscal year or at some other point
associated with closing the books.
record is found. If the program reaches the end of the file without finding a
deleted record, the new record can be appended at the end.
Unfortunately, this approach makes adding records an intolerably slow
process, if the program is an interactive one and the user has to sit at the
terminal and wait as the record addition takes place. To make record reuse
happen more quickly, we need
■ A way to know immediately if there are empty slots in the file, and
■ A way to jump directly to one of those slots if they exist.
Linked Lists
The use of a linked list for stringing together all of the available records can
meet both of these needs. A linked list is a data structure in which each element
or node contains some kind of reference to its successor in the list. (See Fig.
6.4.)
If you have a head reference to the first node in the list, you can move
through the list by looking at each node and then at the node’s pointer field, so
you know where the next node is located. When you finally encounter a pointer
field with some special, predetermined end-of-list value, you stop the traversal
of the list. In Fig. 6.4 we use a - I in the pointer field to mark the end of the list.
When a list is made up of deleted records that have become available
space within the file, the list is usually called an avail list. When inserting a
new record into a fixed-length record file, any one available record is just as
good as any other. There is no reason to prefer one open slot over another since
all the slots are the same size. It follows that there is no reason to order the
avail list in any particular way. (As we see later, this situation changes for
variable-length records.)
Stacks
The simplest way to handle a list is as a stack. A stack is a list in which all
insertions and removals of nodes take place at one end of the list. So, if we
have an avail list managed as a stack that contains relative record numbers
(RRN) 5 and 2, and then add RRN 3, it looks like this before and after the
addition of the new node:
When a new node is added to the top or front of a stack, we say that it is
pushed onto the stack. If the next thing that happens is a request for some
available space, the request is filled by taking RRN 3 from the avail list. This
is called popping the stack. The list returns to a state in which it contains only
records 5 and 2.
Figure 6.5 Sample fiie showing linked lists of deleted records, (a) After deletion of records
3 and 5, in that order, (b) After deletion of records 3,5, and 1, in that order, (c) After
insertion of three new records.
214 Chapter 6 Organizing Files for Performance
configuration shown in Fig. 6.5(a). Since there are still-two record slots on the
avail list, we could add two more names to the file without increasing the size
of the file. After that, however, the avail list would be empty as shown in Fig.
6.5(c). If yet another name is added to the file, the program knows that the
avail list is empty and that the name requires the addition • of a new record at
the end of the file.
Implementing mechanisms that place deleted records on a linked avail list and
that treat the avail list as a stack is relatively straightforward. We need a
suitable place to keep the RRN of the first available record on the avail list.
Since this is information that is specific to the data file, it can be carried in a
header record at the start of the file.
When we delete a record, we must be able to mark the record as deleted
and then place it on the avail list. A simple way to doithis is to place an * (or
some other special mark) at the beginning of the record as a deletion mark,
followed by the RRN of the next record on the avail list.
Once we have a list of available records within a file, we can reuse the
space previously occupied by deleted records. For this we would write a single
function thatreturns either (I) the RRN of a reusable record slot or (2) the RRN
of the.next record to be appended if no reusable slots are available.
Let’s address the questions of adding and removing records to and from the list
together, since they are clearly related. With fixed-length records we
HE AD. F IRST _A VA IL : -1
Figure 6.6 A sample file for illustrating variable-length record deletion, (a) Original
sample file stored in variable-length format with byte count (header record not
included), (b) Sample file after deletion of the second record (periods show discarded
characters).
216 Chapter 6 Organizing Files for Performance
could access the avail list as a stack because one member of the avail list is just
as usable as any other. That is not true when, the record slots on the avail list
differ in size, as they do in a variable-length record file. We now have an extra
condition that must be met before we can reuse a record: the record must be the
right size. For the moment we define right size as “big enough.” Later we find
that it is sometimes useful to be more particular about the meaning of right
size.
It is possible, even likely,-that we need to search through the avail list for
a record slot that is the right size. We can’t just pop the stack and expect the
first available record to be big enough. Finding a proper slot on the avail list
now means traversing the list until a record slot that is big enough to hold the
new record is found.
For example, suppose the avail list contains the deleted record slots shown
in Fig. 6.7(a), and a record that requires 55 bytes is to be added. Since the avail
list is not empty, we traverse the records whose sizes are 47 (too small), 38
(too small), and 72 (big enough). Having found a slot big enough to hold our
record, we remove it from the avail list by creating a new link that jumps over
the record as shown in Fig. 6.7(b). If we had reached the end of the avail list
before finding a record' that was large enough, we would have appended the
new record at the end of the file.
Because this procedure for Finding a reusable record looks through the
entire avail list if necessary, we do not need a sophisticated method for putting
newly deleted records onto the list. If a record of the right size is
Figure 6.7 Removal of a record from an avail list with variable-length records,
(a) Before removal, (b) After removal.
somewhere on this list, our get-available-record procedure eventually finds it.
It follows that we can continue to push new members onto the front of the list,
just as we do with fixed-length records.
Development of algorithms for adding and removing avail list records is
left to you as part of the exercises found at the end of this chapter.
Figure 6.8 Storage requirements of sample file using 64-byte fixed-length records.
Figure 6.9 Storage requirements of sample file using variable-length records with a
count field.
218 Chapter 6 Organizing Fifes for Performance
Figure 6.10 Illustration of fragmentation with variable-length records, (a) After deletion of
the second record (unused characters in the deleted record are replaced by periods), (b)
After the subsequent addition of the record for Al Ham.
Reclaiming Space in Files
Figure 6.11 Combating internal fragmentation by putting the unused part of the
deleted slot back on the avail list.
As we would expect, the new record is carved out of the 35-byte record that is
on the avail list. The data portion of the new record requires 25 bytes, and we
need 2 more bytes for another size field. This leaves 8 bytes in the record still
on the avail list.
What are the chances of finding a record that can make use of these 8
bytes? Our guess would be that the probability is close to zero. These 8 bytes
are not usable, even though they are not trapped inside any other record. This
is an example of external fragmentation. The space is actually on the avail list
rather than being locked inside some other record but is too fragmented to be
reused.
There are some interesting ways to combat external fragmentation. One
way, which we discussed at the beginning of this chapter, is storage
compaction. We could simply regenerate the file when external fragmentation
becomes intolerable. Two other approaches'are as follows:
■ If two record slots on the avail list are physically adjacent, combine them
to make a single, larger record slot. This is called coalescing the holes in
the storage space.
■ Try to minimize fragmentation before it happens by adopting a placement
strategy that the program can use as it selects a record slot from the avail
list.
Fi g ure 6 .12 Addition of the second record into the slot originally occupied by a single
deleted record.
Chapter 6 Organizing Files for Performance
Coalescing holes presents some interesting problems. The avail list is not
kept in physical record qrder; if there are two deleted records that are
physically adjacent, there is no reason to presume that they are linked adjacent
to each other on the avail list. Exercise 15 at the.end of this chapter provides a
discussion of this problem along with a framework for developing a solution.
The development of better placement strategies, however, is a different
matter. It is a topic that warrants a separate discussion, since the choice among
alternative strategies is not as obvious as it might seem at first glance.
least a part of the list—not only when we get records from the list, but also
when we put newly deleted records on the list. In a real-time environment,
the extra processing time could be significant.
A less obvious disadvantage of the best-fit strategy is related to the idea
of finding the best possible fit and ensuring that the free area left over after
inserting a new record into a slot is as small as possible. Often this remaining
space is too small to be useful, resulting in external fragmentation.
Furthermore, the slots that are least likely to be useful are the ones that will
be placed toward the beginning of the list, making first-fit searches longer as
time go ?s on.
These problems suggest an alternative strategy. What if we arrange the
avail list so it is in descending order by size? Then the largest record slot on
the avail list would always be at the head of the list. Since the procedure that
retrieves records starts its search at the beginning of the avail list, it always
returns the l irgest available record slot if it returns any slot at all. This is
known as a v^orst-fit placement strategy. The amount of space in the record
slot, beyond jwhat is actually needed, is as large as possible.
A worst-fit strategy does not, at least initially, sound very appealing. But
consider the fol owing:
■ The procedure for removing records can be simplified so it looks only at
the first element of the avail list. If the first record slot is not large enough
to do tne job, none of the others will be.
■ By extracting the space we need from the largest available slot, we are
assured that th e unused portion of the slot is as large as possible,
decreasing the li kelihood of external fragmentation.
What can you cbnclude from all of this? It should be clear that no one
placement strategy ;.s superior under all. circumstances. The best you can do
is formulate a series of general observations, and then, given a particular
design situation, try to select the strategy that seems most appropriate. Here
are some suggestions. The judgment will have to be yours.
■ Placement strategies make sense only with regard to volatile, variable-
length record files. With fixed-length records, placement is simply not
an issue.
■ If space is lost djie to internal fragmentation, the choice is between first
fit and best fit. A worst-fit strategy truly makes internal fragmentation
worse.
■ If the space is lost due to external fragmentation, one should give careful
consideration to a worst-fit strategy.
222 Chapters Organizing Files for Performance
This text begins with a discussion of the cost of accessing secondary stor age.
You may remember that the magnitude of the difference between accessing
memory and seeking information on a fixed disk is such that, if we magnify
the time for a memory.access to twenty seconds, a similarly magnified disk
access would take fifty-eight days.
So far we have not had to pay much attention to this cost. This section;
then, marks a kind of turning point. Once we move from fundamental
organizational issues to the matter of searching a file for a particular piece of
information, the cost of a seek becomes a major factor in determining our
approach. And what is true for searching is all the more true for sorting. If you
have studied sorting algorithms, you know that even a good sort involves
making many comparisons. If each of these comparisons involves a seek, the
sort is agonizingly slow.
Our discussion of sorting and searching, then, goes beyond simply getting
the job done. We develop approaches that minimize the number of disk
accesses and therefore minimize the amount of time expended. This concern
with minimizing the number of seeks continues to be a major focus throughout
the rest of this text. This is just the beginning of a quest for ways to order and
find things quickly.
and we want to find them all? Once again, we would be doomed to looking at
every record in the file. Clearly, we need to find a better way to handle keyed
access. Fortunately, there are many better ways.
int BinarySearch
(FixedRecordFile & file, RecType & obj, KeyType & key)
// binary search for key
// if key found, obj contains corresponding record', 1 returned // if key
not found, 0 returned
{ ' ■ • int low = 0; int high = file.NumRecs()-1; while (low <= high)
{
int guess = (high - low) / 2; file.ReadByRRN (obj, guess);
if (obj.Key() == key) return 1; // record found
if (obj'. Key () < key) high = guess - 1;// search before guess
else low = guess + 1;// search after guess
}
return 0; // loop ended without finding key
Figure 6.14 gives the minimum definitions that must be present to allow a
successful compilation of BinarySearch. This includes a class RecType with
a Key method that returns the key value of an object and class KeyType with
equality and less-than operators. No further details of any of these classes
need be given.
Figure 6.14 Classes and methods that must be implemented to support the
binary search algorithm.
Finding Things Quickly: An Introduction to Internal Sorting and Binary Searching 225
searches on a list of one thousand items requires, on the average, 9.5 accesses
per request. If the list is expanded to one hundred thousand items, the average
search length extends to 16.5 accesses. Although this is a tremendous
improvement over the cost of a sequential search for the key, it is also true that
16 accesses, or even 9 or 10 accesses, is not a negligible cost. The cost of this
searching is particularly noticeable and objectionable, if we are doing a large
enough number of repeated accesses by key.
When we access records by relative record number rather than by key, we
are able to retrieve a record with a single access. That is an order of magnitude
of improvement over the ten or more accesses that binary, searching requires
with even a moderately large file. Ideally, we would like to approach RRN
retrieval performance while still maintaining the advantages of access by key.
In the following chapter, on the use of index structures, we begin to look at
ways to move toward this ideal.
Chapter 8, such merging is a sequential process, passing only once over each
record in the file. This can be an efficient, attractive approach to maintaining
the file.
So, despite its problems, there are situations in which binary searching
appears to be a useful strategy. However, knowing the costs of binary searching
also lets us see better solutions to the problem of finding things by key. Better
solutions will have to meet at least one of the following conditions:
■ They will not involve reordering of the records in the file when a new
record is added, and
■ They will be associated with data structures that allow for substantially
more rapid, efficient reordering of the file.
In the chapters that follow we develop approaches that fall into each of
these categories. Solutions of the first type can involve the use of simple
indexes. They can also involve hashing. Solutions of the second type can
involve the use of tree structures, such as a B-tree, to keep the file in order.
6.4 Keysorting
Keysort, sometimes referred to as tag sort, is based on the idea that when we
sort a file in memory the only things that we really need to sort are the record
keys; therefore, we do not need to read the whole file into memory during the
sorting process. Instead, we read the keys from the file into memory, sort them,
and then rearrange the records in the file according to the new ordering of the
keys.
Keysorting 229
Since keysort never reads the complete set of records into memory, it can
sort larger files than a.regular internal sort, given the same amount of memory.
class KeyRRN
// contains a pair (KEY, RRN)
(public:
KeyType KEY; int RRN;
KeyRRN () ;
KeyRRN (KeyType key, int rrn);
};
int Sort (KeyRRN [], int numKeys); // sort array by key
Figure 6.15 Minimal functionality required for classes used by the keysort algorithm.
230 Chapter 6 Organizing Files for Performance
Figure 6.16 Conceptual view of KEYNODES array to be used in memory by internal sort
routine and record array on secondary store.
time the keysort procedure begins. The RRN field of each array element
contains the RRN of the record associated with the corresponding key.
The actual sorting process simply sorts the KEYNODES [ ] array according to
the KEYfield. This produces an arrangement like that shown in Fig. 6.17. The
elements of KEYNODES [ ] are now sequenced in such a way that the first
element has the RRN of the record that should be moved to the first position
.in the file, the second element identifies the record that should be second, and
so forth.
Once KEYNODES [ ] is sorted, we are ready to reorganize the file according
to this new ordering by reading the records from the input file and writing to a
new file in the order of the KEYNODES []'array.
Figure 6.18 gives an algorithm for keysort. This algorithm works much
the same way that a normal internal sort would work, but with two important
differences:
■ Rather than read an entire record into a memory array, we simply read each
record into a temporary buffer, extract the key, then discard it; and
IS When we are writing the records out in sorted order, we have to read them in
a second time, since they are not all stored in memory.
Keysorting 231
Figure 6.19 Relationship between the index file and the data file.
Chapter 7
Indexing
7 .1 What Is an Index?
The last few pages of many books contain an index. Such an index is a table
containing a list of topics (keys) and numbers of pages where the topics can be
found (reference fields).
All indexes are based on the same basic concept—keys and reference
fields. The types of indexes we examine in this chapter are called simple
indexes because they are represented using simple arrays of structures that
contain the keys and reference fields. In later chapters we look at indexing
schemes that use more complex data structures, especially trees. In this
chapter, however, we want to emphasize that indexes can be very simple and
still provide powerful tools for file processing.
The index to a book provides a way to find a topic quickly. If you have
ever had to use a book that doesn’t have a good index, you already know, that
an index is a desirable alternative to scanning through the book, sequentially to
find a topic. In general, indexing is another way to handle the problem we
explored in Chapter 6: an index is a way to find things.
Consider what would happen if we tried to apply the previous chapter’s
methods, sorting and binary searching, to the problem of finding things in a
book. Rearranging all the wcyds in the book so they were in
alphabetical order certainly would make finding any particular term easier but would
obviously have disastrous effects on the meaning of the book. In a sense, the terms in the
book are pinned records. This is an absurd example, but it clearly underscores the power and
importance of the index as a conceptual tool. Since it works by indirection, an index lets
you impose order on a file, without rearranging thefde. This not only keeps us from
disturbing pinned records, but also makes matters such as record addition much less
expensive than they are with a sorted file.
Take, as another example, the problem of finding books in a library. We
want to be able to locate books by a specific author, title, or subject area. One
way of achieving this is to have three copies of each book and three separate
library buildings. All of the books in one building would be sorted by author’s
name, another building would contain books arranged by title, and the third
would have them ordered by subject. Again, this is an absurd example, but
one that underscores another important advantage of indexing. Instead of
using multiple arrangements, a library uses a card catalog. The card catalog is
actually a set of three indexes, each using a different key field, and all of them
using the same catalog number as a reference field. Another use of indexing,
then, is to provide multiple access paths to a file.
We also find that indexing gives us keyed access to variable-length
record files. Let’s begin our discussion of indexing by exploring this problem
of access to variable-length records and the simple solution that indexing
provides.
One final note: the example data objects used in the following sections
are musical recordings. This may cause some fconfusion as we use the term
record to refer to an object in a file, and recording to refer to a data object. We
will see how to get information about recordings by finding records in files.
We’ve tried hard to make a distinction between these two terms: The
distinction is between the file system view of the elements that make up files
(records), and the user’s or application’s view of the objects that are being
manipulated (recordings).
Identification number
Title
Composer or composers Artist
or artists Label (publisher)
Record ID
address number
How could we organize the file to provide rapid keyed access to indi-
vidual records? Could we sort the file and then use binary searching?
Unfortunately, binary searching depends on being able to jump to the middle
record in the file. This is not possible in a variable-length record file because
direct access by relative record number is not possible; there is no way to
know where the middle record is in any group of records.
An alternative to sorting is to construct an index for the file. Figure 7.3
illustrates such an index. On the right is the data file containing information
about our collection of recordings, with one variable-length data record per
recording. Only four fields are shown (Label, ID number, Title, and
Composer), but it is easy to imagine the other information filling out each
record.
On the left is the index, each entry of which contains a key corresponding
to a certain Label ID in the data file. Each key is associated with a reference
field giving the address of the first byte of the corresponding data record.
ANG3795, for example, corresponds to the reference field containing the
number 152, meaning that the record containing full information on the
recording with Label ID ANG3795 can be found starting at byte number 152
in the record file.
With an open file and an index to the file in memory, RetrieveRecording puts
together the index search, file read, and buffer unpack operations into a single
function.
Keeping the index in memory as the program runs also lets us find
records by key more quickly with an indexed file than with a sorted one since
the binary searching can be performed entirely in memory. Once the byte
offset for the data record is found, a single seek is all that is required to
retrieve the record. The use of a sorted data file, on the other hand, requires a
seek for each step of the binary search.
stops short of providing a read operation whose arguments are a file and a data
object. We want a class RecordFile that makes the following code possible:
Person p; RecordFile pFile; pFile . Read (p);
Recording r; RecordFile rFile; rFile .Read (r);
Append methods that maintain a primary key index of the data file and a Read
method that supports access to object by key.
So far, we have classes Text Index, which supports maintenance and
search by primary key, and RecordFile, which supports create, open, and close
for files as well as read and write for data objects. We have already seen how
to create a primary key index for a data file as a memory object. There are still
two issues to address:
■ How to make a persistent index of a file. That is, how to store the index in
a file when it is not in memory.
■ How to guarantee that the index is an accurate reflection of the contents
of the data file.
Record Addition
Adding a new record to the data file requires that we also add an entry to the
index. Adding to the data file itself uses RecordFile<Recording> : : Write.
The record key and the result-' ing record reference are then inserted into the
index record using Text Index. Insert.-
Since the index is kept in sorted order by key, insertion of the new index
entry probably requires some rearrangement of the index. In a way, the
situation is similar to the one we face as we add records to a sorted data file.
We have to shift or slide all the entries with keys that come in order after the
key of the record we are inserting. The shifting opens up a space for the new
entry. The big difference between the work we have to do on the index entries
and the work required for a sorted data file is that the index is contained
wholly in memory. All of the index rearrangement can be done without any
file access.. The implementation of Text Index: : Insert is given in file text
ind. cpp of Appendix G.
Object-Oriented Support for Indexed, Entry-Sequenced Files of Data Objects 259
Record Deletion
In Chapter 6 we described a number of approaches to deleting records in
variable-length record files that allow for the'reuse of the space occupied by
these records. These approaches are completely viable for our data file
because, unlike a sorted data file, the records in this file need not be moved
around to maintain an ordering on the file. This is one of the great advantages
of an indexed file organization: we have rapid access to individual records by
key without disturbing pinned records. In fact, the indexing itself pins all the
records. The implementation of data record deletion is not included in this
text but has been left as exercises.
Of course, when we delete a record from the data file, we must also delete
the corresponding entry from our index, using Text Index: : Delete. Since the
index is in memory during program execution, deleting the index entry and
shifting the other entries to close up the space may not be an overly expensive
operation. Alternatively, we could simply mark the index entry as deleted,
just as we might mark the corresponding data record. Again, see text ind. cpp
for the implementation of Text Index: .-Delete.
Record Updating
Record updating falls into two categories:
■ The update changes the value of the key field. This kind of update can
bring about a reordering of the index file as well as the data file.
Conceptually, the easiest way to think of this kind of change is as a
deletion followed by an insertion. This delete/insert approach can be
implemented while still providing the program user with the view that he
or she is merely changing a record..
■ The update does not affect the key field. This second kind, of update does
not require rearrangement of the index file but may well involve
reordering of the data file. If the record size is unchanged or decreased by
the update, the record can be written directly into its old space. But if the
record size is increased by the update, a new slot for the record will have
to be found. In the latter case the starting address of the rewritten record
must replace the old address in the corresponding RecAddrs element.
Again, the delete/insert approach to maintaining the index can be used. It
is also possible to implement an operation simply to change the
RecAddrs member.
260 Chapter 7 Indexing
Figure 7.10
Secondary key index
organized by recording
title. -
Record Deletion
Deleting a record usually implies removing all references to that record in the
file system. So removing a record from the data file would mean removing not
only the corresponding entry in the primary index but also all of the entries in
the secondary indexes that refer to this primary index entry. The problem with
this is that secondary indexes, like the primary index, are maintained in sorted
order by key. Consequently, deleting an entry would involve rearranging the
remaining entries to close up the space left open by deletion.
This delete-ali-references approach would indeed be advisable if the
secondary index referenced the data file directly. If we did not delete the
secondary key references and if the secondary keys were associated with
actual byte offsets in the data file, it could be difficult to tell when these
references were no longer valid. This is another instance of the pinned- record
problem. The reference fields associated with the secondary keys would be
pointing to byte offsets that could, after deletion and subsequent space reuse in
the data file, be associated with different data records.
Indexing to Provide Access to Multiple Keys 269
Record Updating
In our discussion of record deletion, we find that the primary key index serves
as a kind of protective buffer, insulating the secondary indexes from changes in
the data file. This insulation extends to record updating as well. If our
secondary indexes contain references directly to byte offsets in the data file,
then updates to the data file that result in changing a record’s physical location
in the file also require updating the secondary indexes. But, since we are
confining such detailed information to the primary index, data file updates
affect the secondary index only when they change either the primary or the
secondary key. There are three possible situations:
270 Chapter? Indexing
H Update changes the secondary key: if the secondary key is changed, we may
have to rearrange the secondary key index so it stays in sorted order. This
can be a relatively expensive operation.
■ Update changes the primary key: this kind of change has a large impact on
the primary key index but often requires that we update only the affected
reference field (Label ID in our example) in all the secondary indexes.
This involves searching the secondary indexes (on the unchanged
secondary keys) and rewriting the affected fixed-length field. It does not
require reordering of the secondary indexes unless the 1 corresponding
secondary key occurs more than once in the index; If a secondary key does
occur more than once, there may be some local reordering, since records
having the same secondary key are ordered by the reference field (primary
key).
■ Update confined to other fields: all updates that do not affect either the
primary or secondary key fields do not affect the secondary key index,
even if the update is substantial. Note that if there are several secondary
key indexes associated with a file, updates to records often affect only a
subset of the secondary indexes.
One of the most important applications of secondary keys involves using two
or more of them in combination to retrieve special subsets of records from the
data file. To provide an example of how this can be done, we will extract another
secondary key ind'ex from our file of recordings. This one uses the recording’s
title as the key, as illustrated in Fig. 7.10. Now we can • respond to requests such
as
■ Find the recording with Label ID COL38358 (primary key access);
■ Find all the recordings of Beethoven’s work (secondary keyncompos- er);
and
■ Find all the recordings titled “Violin Concerto” (secondary keyntitle).
What is more interesting, however, is that we can also respond to a request
that combines retrieval on the composer index with retrieval on the title index,
such as: Find all recordings of Beethoven’s Symphony No. 9. Without the use
of secondary indexes, this kind of request requires a sequential search through
the entire file. Given a file containing thousands,
271
Retrieval Using Combinations of Secondary Keys
or even hundreds, of records, this is a very expensive process. But, with the
aid of seCondary.indexes, responding to this request is simple and quick.
We begin by recognizing that this request can be rephrased as a Boolean
and operation, specifying the intersection of two subsets of the data file:
Find all .data records with:
composer = 'BEETHOVEN' and title = 'SYMPHONY NO. 9'
We begin our response to this request by searching the composer • index for
the list of Label IDs that identify recordings with Beethoven as the composer. This
yields the following list of Label IDs:
ANG3795
DG139201
DG18807
RCA2626
Next we search the title index for the Label IDs associated with records
that have SYMPHONY NO. 9 as the title key:
ANG3795
COL31809
DG18807
Now we perform the Boolean and, which is a match operation, combining
the lists so only the members that appear in both lists are placed in the output
list.
The secondary index structures that we have developed so far result in two
distinct difficulties:
■ We have to rearrange the index file every time a new record is added to the
file, even if the new record is for an existing secondary key. For example,
if we add another recording of Beethoven’s Symphony No. 9 to our
collection, both the composer and title indexes would have to be
rearranged, even though both indexes already contain entries for
secondary keys (but not the Label IDs) that are being added.
a If there are duplicate secondary keys, the secondary key field is repeated for
each entry. This wastes space because it makes the files larger than
necessary. Larger index files are less likely to fit in memory.
Figure 7.11 provides a schematic example of how such an index would look if used with
our sample data file.
The major contribution of this revised index structure is, its help in solving our first
difficulty: the need to rearrange the secondary index file every time a new record is added
to the data file. Looking at Fig. 7.11, we can see that the addition of another recording of
a work by Prokofiev does not require the addition of another record to the index. For
example, if we add the recording
Since we are not adding another record to the secondary index, there is no
need to rearrange any records. All that is required is a rearrangement of the
fields in the existing record for- Prokofiev.
Although this new structure helps avoid the need to rearrange the
secondary index file so often, it does have some problems. For one thing, it
provides space for only four Label IDs to be associated with a given key. In
the very likely case that more than four Label IDs will go with some key, we
need a mechanism for keeping track of the extra Label IDs.
A second problem has to do with space usage. Although the structure
does help avoid, the waste of space due to the repetition of identical keys, this
space savings comes at a potentially high cost. By extending the fixed length
of each of the secondary index records to hold more reference fields, we
might easily lose more space to internal fragmentation than we gained by not
repeating identical keys.
Since we don’t want to waste any more space than we have to, we need
to ask whether we can improve on this record structure. Ideally, what we
would like to do is develop a new design, a revision of our revision, that
274 Chapter 7 Indexing
COREA . WAR23699
PROKOFIEV LON2312
RIMSKY-KORSAKOV MER75016
SPRINGSTEEN COL38358
Figure 7.11 Secondary key index containing space for multiple references for each
secondary key.
Lists of primary
Figure 7.12 Conceptual view of the primary key reference fields as a series of lists.
could grow to be just as long as it needs to be. If we add the new Prokofiev
record, the .list of Prokofiev references becomes
Similarly, adding two new Beethoven recordings adds just two additional
elements to the list of references associated with the Beethoven key. Unlike our
record structure which allocates enough space for four Label IDs for each
secondary key, the lists could contain hundreds of references, if needed, while
still requiring only one instance of a secondary key. On the other, hand, if a list
requires only one element, then no space is lost to internal fragmentation..Most
important, we need to rearrange only the file of secondary keys if a new
composer is added to the file.
276 Chapter 7 Indexing
You can see (Fig. 7.13) that the Label ID for this new recording is the last
one in the Label ID List file, since this file is entry sequenced. Before this
record is added, there is only one Prokofiev recording. It has a Label ID of
LON2312. Since we want to keep the Label ID Lists in order by ASCII
character values, the new recording is inserted in the list for Prokofiev so it
logically precedes the LON2312 recording. .
Associating the Secondary Index file with a new file containing linked
lists of references provides some advantages over any of the structures
considered up to this point:
■ The only time we need to rearrange the Secondary Index file is when a
new composer’s name is added or an existing composer’s name is
changed (for example, it was misspelled.on input). Deleting or adding
recordings for a composer who is already in the index involves changing
only the Label ID List file. Deleting all the recordings for a composer
could be handled by modifying the Label ID List file while leaving the
entry in the Secondary Index file in place, using a value of -1 in its
reference field to indicate that the list of entries for this composer is empty.
■ In the event that we need to rearrange the Secondary Index file, the task is
quicker now since there are fewer records and each record is smaller.
■ Because there is less need for sorting, it follows that there is less of a /
penalty associated with keeping the Secondary Index files off on
secondary storage, leaving more room in memory for other data
structures.
■ The Label ID List file is entry sequenced. That means that it never needs
to be sorted.
■ Since the Label ID List file is a fixed-length record file, it would be very
easy to implement a mechanism for reusing the space from deleted
records, as described in Chapter 6.
There is also at least one potentially significant disadvantage to this kind of
file organization: the Label IDs associated with a given composer are no longer
guaranteed to be grouped together physically. The technical term for such
“togetherness” is locality. With a linked, entry-sequenced structure such as this, it is
less likely that there will be locality associated with the logical groupings of
reference fields for a given secondary key. Note, for example, that our list of Label
IDs for Prokofiev consists of the very last and the very first records in the file. This
lack of locality means that picking up the references for a composer with a long list
of references could involve a large amount of seeking back and forth on the disk.
Note that this kind of seeking would not be required for our original Secondary j
Index file structure. ... j
One obvious antidote to this seeking problem is to keep the Label ID i
List file in memory. This could be expensive and impractical, given many j
secondary indexes, except for the interesting possibility of using the same. |
Label ID List file to hold the lists for a number of Secondary Index files. j
Even if the file of reference lists were too large to hold in memory, it might j
be possible to obtain a performance improvement by holding only a part j
of the file in memory at a time, paging sections of the file in and out of
memory as they are needed.
Several exercises at the end of the chapter explore these possibilities
more thoroughly. These are very important problems, as the notion of dividing
the index into pages is fundamental to the design of B-trees and other methods
for handling large indexes on secondary storage.
up, associating the secondary keys with reference fields consisting of primary
keys allows the primary key index to act as a kind of final check of whether a
record is really in the file. The secondary indexes can afford to be wrong. This
situation is very different if the secondary index keys contain addresses. We
would then be jumping directly from the secondary key into the data file; the
address would need to be right.
This brings up a related safety,aspect: it is always more desirable to make
important changes in one place rather than in many places. With a
bind-at-retrieval-time scheme such as we developed, we need to remember to
make a change in only one place, the primary key index, if we move a data
record. With a more tightly bound system, we have to make many changes
successfully to keep the system internally consistent, braving power failures,
user interruptions, and so on.
When designing a new file system, it is better to deal with this question of
binding intentionally and early m the design process rather than letting the
binding just happen. In general, tight, in-the-data binding is most attractive
when
■ The data file is static or nearly so, requiring little or no adding, deleting, or
updating of records; and .
■ Rapid performance during actual retrieval is a high priority.
For example, tight binding is desirable for file organization on a mass-
produced, read-only optical disk. The addresses will never change because no
new records can ever be added; consequently, there is ho reason not to obtain
the extra performance associated with tight binding.
For file applications in which record addition, deletion, and updating do
occur, however, binding at retrieval time is usually the more desirable option.
Postponing binding as long as possible usually makes these operations simpler
and safer. If the file structures are carefully designed, and, in particular, if the
indexes use more sophisticated organizations such as 13- trees, retrieval
performance is usually quite acceptable, even given the additional work
required by a bind-at-retrieval system.