13.
2 File Organization and Access Notes
Objective:
Show understanding of methods of file organization and select an appropriate method of
file organization and file access for a given problem. Including serial, sequential (using a key
field), random (using a record key)
Show understanding of methods of file access. Including Sequential access for serial and
sequential files. Direct access for sequential and random files.
Show understanding of hashing algorithms. Describe and use different hashing algorithms
to read from and write data to a random/sequential file.
In everyday computer usage, wide variety of file types is encountered. Examples are
graphic files, word-processing files, spreadsheet files and so on. Whatever the file type,
content is stored using a specific binary code that allows file to be used as intended.
There are only two defined file types.
Text file contains data stored according to character code of type. It is possible, by
using a text editor, to create a text file to be used as input to a program.
Text files are stored as a sequence of characters. Each character is represented by
one byte, which is a group of 8 bits. Most common character encoding used for text
files is ASCII, which assigns a unique number to each character. This means that
text files can be easily read and written by humans.
Organization of Text File:
For text file, number of data items per line must be known
and number of characters per item must be known. If these are not known then
item separator characters must be used. File has repeating lines which are defined
by an end-of-line character. Example are .txt file, .csv file.
Binary File:
Binary files are stored as a sequence of bytes. The bytes can represent any data,
including text, numbers, images, and sounds. Binary files are not human-readable,
so they can only be opened by programs that understand the binary format. Binary
file is designed for storing data to be used by a computer program.
Binary file stores data in its internal representation, for example an integer value
might be stored in two bytes in two’s complement representation.
Organization of Binary File:
Organisation of a binary file is based on concept of a
record. A file contains records and each record contains fields. Each field consists of
a value. For a binary file, number of fields per record must be known. If any of fields
represent a string, the length of the string must be known. For any other field the
internal representation will define the number of bytes required to store the field
O-A Level Computer Science By Engr M Kashif 03345606716 paperscambridge.com
File Organization and Access
value. There is no need for field separator characters or for an end-of-record
character. Example are .exe file, .jpg file.
Serial files
Serial file organization method physically stores records of data in a file, one after
another, in the order they were added to the file. New records are appended to the
end of the file.
Serial file organization is often used for temporary files storing transactions to be
made to more permanent files. For example, storing customer meter readings for gas
or electricity before they are used to send the bills to all customers. As each
transaction is added to the file in the order of arrival, these records will be in
chronological order. It is a way of arranging actions, events or things according to
the time they occurred.
Sequential File Organization
Sequential file organization method physically stores records of data in a file, one after
another, in a given order. Order is usually based on the key field of the records as this
is a unique identifier.
Example: File could be used by a supplier to store customer records for gas or
electricity in order to send regular bills to each customer. All records are stored in
ascending customer number order, where customer number is the key field that
uniquely identifies each record.
New records must be added to the file in the correct place; for example, if Customer 5
is added to the file, the structure becomes:
Random file Organization
Random file organisation method physically stores records of data in a file in any
available position. Location of any record in the file is found by using a hashing
algorithm on the key field of a record. Records can be added at any empty position.
Computer Science IGCSE, O & A level By Engr M Kashif 03345606716
13.2 File Organization and Access Notes
File Access Method
File access method are used to physically find a record in the file.
Sequential access: Sequential access method searches for records one after
another from the physical start of the file until the required record is found, or a
new record can be added to the file. This method is used for serial and
sequential files.
ESQ: How Sequential method is used in serial File to find required record?
For a serial file, if a particular record is being searched for, every record needs to be checked
until that record is found or the whole file has been searched and that record has not been
found. Any new records are appended to the end of the file.
ESQ: How Sequential method is used in Sequential File to find required record?
Ans: For a sequential file, if a particular record is being searched for, every record needs to
be checked until the record is found or the key field of the current record being checked is
greater than the key field of the record being searched for. Rest of the file does not need to
be searched as the records are sorted on ascending key field values. Any new records to be
stored are inserted in the correct place in the file.
For example, if record for Customer 6 was requested, each record would be read from the
file until Customer 7 was reached. Then it would be assumed that the record for Customer 6
was not stored in the file.
Sequential access is efficient when every record in the file needs to be processed, for
example, a monthly billing or payroll system. These files have a high hit rate during the
processing as nearly every record is used when the program is run.
Direct access
Direct access method can physically find a record in a file without
other records being physically read. Both sequential and random files can use direct
access. Direct Access allows specific records to be found more quickly than using
sequential access. Direct access is required when an individual record from a file needs
to be processed.
O-A Level Computer Science By Engr M Kashif 03345606716 paperscambridge.com
File Organization and Access
Example, when single customer record needs to be updated when the customer’s
phone number is changed. Here, the file being processed has a low hit rate as only one
of the records in the file is used.
For a sequential file, an index of all the key fields is kept and used to look up the
address of the file location where a given record is stored. For large files, searching the
index takes less time than searching the whole file.
For a random access file, hashing algorithm is used on the key field to calculate
address of the file location where a given record is stored.
Hashing Algorithms
To store and access data in a file, hashing algorithm is used to
perform a calculation on the key field of record. Result of the calculation gives the
address where the record should be found.
Hashing algorithm for numeric key field:
Algorithm chooses a suitable number and divides
this number by the value in the key field. Remainder from this division then identifies
the address in the file for storage of that record. Suitable number works best if it is a
prime number of a similar size to the expected size of the file.
For example: 4-digit values in key field where 1000 is used for dividing number.
Following represent three calculations:
➢ 0045/1000 gives remainder 45 for the address in the file
➢ 2005/1000 gives remainder 5 for the address in the file
➢ 3005/1000 gives remainder 5 for the address in the file
Drawback of above mentioned hashing algorithm:
➢ Addresses calculated do not have any order depending on value in the key field.
➢ Different key field values can produce same remainder and therefore the same
address in the file.
For Non Numeric Key Field, ASCII code for each character can be looked up and values
then added. Sum is then used in same way as described above, to calculate an address as
the remainder from an integer division.
Address Collision:
When same address is calculated for different field values, it is usually
referred to as a collision. Best choice for a hashing algorithm is one that spreads the
addresses most evenly and minimises number of collisions.
Computer Science IGCSE, O & A level By Engr M Kashif 03345606716
13.2 File Organization and Access Notes
How to Deal with Collision:
• Use a sequential search to look for a vacant address following the calculated one.
• Keep a number of overflow addresses at the end of the file
• Have a linked list accessible from each address.
Exam Style Questions
ESQ#1 A binary file is to be used to store data for a program.
a) State the terms used to describe the components of such a file? [2]
Ans: Record , field.
b) Explain the difference between a binary file and a text file. [3]
Ans: A text file contains character data, formatted into lines, there are end-of-
line and end-of-file characters. A binary file has data in internal representation,
contains records with a defined format. There is no need for field separator
characters or for an end-of-record character.
ESQ#2 A binary file might be organized for serial, sequential or direct access.
a) Explain difference between the three types of file organisation. [4]
Ans: No defined order for serial, searching a serial file requires reading
complete records until the data is found. Defined order for sequential, direct-
access file has a position for a record which is computed using an algorithm, a
sequential or direct-access file has a key field used when searching for data.
b) Give an example of file use for which a serial file organisation would be suitable.
Justify your choice. [3]
Ans: A serial file is typically used to store data temporarily as it becomes
available with the intention of processing every single record at some future
time. Examples are: any commercial or business transaction file , a file recording
the ongoing progress of a sporting contest to be used later to create statistics.
c) Give example of file use when direct access would be advantageous. Justify your
choice.
Ans: Typical use is for long-term storage of data when the contents will be
continuously changing and individual data items will needed to be looked up.
***********
O-A Level Computer Science By Engr M Kashif 03345606716 paperscambridge.com