Chapter 5
File Systems
Files
⮚ Processes (threads), address spaces, files are the most
important concepts in OS
⮚ Files are logical units of information created by processes
– Similar to kind of address space
⮚ A file is a collection of correlated information which is
recorded on secondary or non-volatile storage like
magnetic disks, optical disks, and tapes.
Files contd..
⮚ It is a method of data collection that is used as a medium for
giving input and receiving output from that program.
⮚ A file is a sequence of bits, bytes, or records whose meaning is
defined by the file creator and user.
⮚ Every File has a logical location where they are located for
storage and retrieval.
File system
⮚ File system is the part of the operating system which is
responsible for file management.
⮚ It provides a mechanism to store the data and access to
the file contents including data and programs.
⮚ Some Operating systems treats everything as a file for
example Ubuntu.
⮚ Manage files: how they are structured, named, accessed,
used, protected, implemented, etc…
File naming
• Files are abstraction mechanism
⮚ To store information on the disk and read it back
⮚ When a process creates a file, it gives the file a name;
and the file can be accessed by the name
• Two-part file name
⮚ File extension: indicating characteristics of file
⮚ In Unix, file extension is just convention; C compiler is
exception
⮚ In windows, file extensions specify which program
“owns” that extension; when double clicking, program
assigned to it is launched
File Naming
Figure 4-1. Some typical file extensions.
File Structure
Figure 4-2. Three kinds of files. (a) Byte sequence.
(b) Record sequence. (c) Tree.
File Type
It refers to the ability of the operating system to differentiate various
types of files like text files, binary, and source files. However,
Operating systems like MS_DOS and UNIX has the following type of
files:
Regular Files:
– ASCII files or binary files
– ASCII consists of lines of text; can be displayed and printed
Character Special File
It is a hardware file that reads or writes data character by character, like
mouse, printer, and more.
File Type
Ordinary files
These types of files stores user information.
It may be text, executable programs, and databases.
It allows the user to perform operations like add, delete, and modify.
Directory Files
Directory contains files and other related information about those
files. Its basically a folder to hold and organize multiple files.
Special Files
These files are also called device files. It represents physical devices
like printers, disks, networks, flash drive, etc.
File Access
• File descriptor
– A file descriptor is a small integer representing a kernel-
managed object that a process may read from or write to
– Every process has a private space of file descriptors
starting at 0
– By convention, 0 is standard input, 1 is standard output,
and 2 is standard error.
File Access Methods
The way that files are accessed and read into memory is
determined by Access methods.
Usually a single access method is supported by systems while
there are OS's that support multiple access methods.
1. Sequential Access
•Data is accessed one record right after another is an order.
•Read command cause a pointer to be moved ahead by one.
•Write command allocate space for the record and move the
pointer to the new End Of File.
•Such a method is reasonable for tape.
File Access Methods contd..
2. Direct Access
⮚This method is useful for disks.
⮚The file is viewed as a numbered sequence of blocks or
records.
⮚There are no restrictions on which blocks are read/written, it
can be done in any order.
⮚User now says "read n" rather than "read next".
⮚"n" is a number relative to the beginning of file, not relative to
an absolute physical disk location.
File Access Methods contd..
3. Indexed Sequential Access
It is built on top of Sequential access.
It uses an Index to control the pointer while accessing files.
File Attributes
A file has a name and data. Moreover, it also stores meta
information like file creation date and time, current size, last
modified date, etc.
All this information is called the attributes of a file system.
File Attributes contd..
Here, are some important File attributes used in OS:
Name: It is the only information stored in a human-readable form.
Identifier: Every file is identified by a unique tag number within a file
system known as an identifier.
Location: Points to file location on device.
Type: This attribute is required for systems that support various
types of files.
Size. Attribute used to display the current file size.
Protection. This attribute assigns and controls the access rights of
reading, writing, and executing the file.
Time, date and security: It is used for protection, security, and also
used for monitoring
File Attributes
Figure 4-4a. Some possible file attributes.
File Operations
The most common system calls relating to files:
• Create • Append
• Delete • Seek
• Open • Get Attributes
• Close • Set Attributes
• Read • Rename
• Write
What is a directory?
⮚ Directory can be defined as the listing of the related files on the
disk. The directory may store some or the entire file attributes.
⮚ To get the benefit of different file systems on the different
operating systems, A hard disk can be divided into the number of
partitions of different sizes.
⮚ The partitions are also called volumes or mini disks.
⮚ Each partition must have at least one directory in which, all the
files of the partition can be listed.
⮚ A directory entry is maintained for each file in the directory which
stores all the information related to that file.
What is a directory?
⮚ A directory can be viewed as a file which contains the Meta
data of the bunch of files.
Structures of Directory
A directory is a container that is used to contain folders and file. It
organizes files and folders into a hierarchical manner.
Single-level directory
• Simplest directory structure.
• All files are contained in same directory which make it easy to
support and understand.
• Limitations arises when the number of files increases or when
the system has more than one user.
Two-level directory
⮚ A single level directory often leads to confusion of files names
among different users.
⮚ The solution to this problem is to create a separate directory
for each user.
⮚ In the two-level directory structure, each user has there
own user files directory (UFD).
Tree Structured directory
⮚ Once we have seen a two-level directory as a tree of height 2,
the natural generalization is to extend the directory structure to
a tree of arbitrary height.
This generalization allows the user to create there own
subdirectories and to organize on their files accordingly.
Directory Operations
System calls for managing directories:
• Create • Readdir
• Delete • Rename
• Opendir • Link
• Closedir • Unlink
Permissions on the file and directory
The permissions are R W X which are regarding reading, writing and the execution of the
files or directory. The permissions are assigned to three types of users: owner, group and
others.
File Group Everyone
Owner Owner Else
Write Read Execute
Permission Permissio Permissio
n n
File System Implementation
• Users:
– How files are named, what operations are allowed on them,
what the directory tree looks like
• Implementors
– How files and directories are stored, how disk space is
managed and how to make every thing work efficiently and
reliably
Master Boot Record (MBR)
⮚ Master boot record is the information present in the first sector of any
hard disk. It contains the information regarding how and where the
Operating system is located in the hard disk so that it can be booted in
the RAM.
⮚ MBR is sometimes called master partition table because it includes a
partition table which locates every partition in the hard disk.
⮚ Master boot record (MBR) also includes a program which reads the
boot sector record of the partition that contains operating system.
File System Layout
⮚ Due to the fact that the main memory is volatile, when we turn on our
computer, CPU
⮚ cannot access the main memory directly. However, there is a special
program called as BIOS stored in ROM is accessed for the first time by
the CPU.
⮚ BIOS contains the code, by executing which, the CPU access the very
first partition of hard disk that is MBR. It contains a partition table for all
the partitions of the hard disk.
⮚ Since, MBR contains the information about where the operating system
is being stored and it also contains a program which can read the boot
sector record of the partition, hence the CPU fetches all this information
and load the operating system into the main memory.
File System Layout
Superblock: contains all the key parameters about a file
system; read into memory the booted or the FS is used
Figure 4-9. A possible file system layout.
Directory Implementation
⮚ There is the number of algorithms by using which, the directories
can be implemented. However, the selection of an appropriate
directory implementation algorithm may significantly affect the
performance of the system.
⮚ The directory implementation algorithms are classified according
to the data structure they are using. There are mainly two
algorithms which are used in these days.
1. Linear List
⮚ In this algorithm, all the files in a directory are maintained as
singly linked list. Each file contains the pointers to the data
blocks which are assigned to it and the next file in the directory.
⮚ When a new file is created, then the entire list is checked whether
the new file name is matching to a existing file name or not.
⮚ In case, it doesn't exist, the file can be created at the beginning or
at the end. Therefore, searching for a unique name is a big
concern because traversing the whole list takes time.
1. Linear List contd..
⮚ The list needs to be traversed in case of every operation
(creation, deletion, updating, etc) on the files therefore the
systems become inefficient.
2. Hash Table
⮚ To overcome the drawbacks of singly linked list implementation
of directories, there is an alternative approach that is hash table.
This approach suggests to use hash table along with the linked
lists.
⮚ A key-value pair for each file in the directory gets generated and
stored in the hash table. The key can be determined by applying
the hash function on the file name while the key points to the
corresponding file stored in the directory.
2. Hash Table contd..
⮚ Now, searching becomes efficient due to the fact that now, entire list will not be
searched on every operating. Only hash table entries are checked using the key
and if an entry found then the corresponding file will be fetched using the value.
Contiguous Allocation
⮚ If the blocks are allocated to the file in such a way that all the logical blocks of the
file get the contiguous physical block in the hard disk then such allocation
scheme is known as contiguous allocation.
⮚ In the image shown below, there are three files in the directory. The starting block
and the length of each file are mentioned in the table. We can check in the table
that the contiguous blocks are assigned to each file as per its need.
Linked List Allocation
⮚ Each file is considered as the linked list of disk blocks.
⮚ However, the disks blocks allocated to a particular file need not to be contiguous
on the disk.
⮚ Each disk block allocated to a file contains a pointer which points to the next disk
block allocated to the same file.
File Allocation Table
⮚ The main disadvantage of linked list allocation is that the
Random access to a particular block is not provided. In order to
access a block, we need to access all its previous blocks.
⮚ File Allocation Table overcomes this drawback of linked list
allocation. In this scheme, a file allocation table is maintained,
which gathers all the disk block links. The table has one entry for
each disk block and is indexed by block number.
⮚ File allocation table needs to be cached in order to reduce the
number of head seeks. Now the head doesn't need to traverse all
the disk blocks in order to access one successive block.
File Allocation Table contd..
⮚ It simply accesses the file
allocation table, read the
desired block entry from there
and access that block.
⮚ This is the way by which the
random access is
accomplished by using FAT.
⮚ It is used by MS-DOS and pre-
NT Windows versions.
File Allocation Table contd..
Advantages
Advantages
⮚Uses the whole disk block for data.
⮚A bad disk block doesn't cause all successive blocks lost.
⮚Random access is provided although its not too fast.
⮚Only FAT needs to be traversed in each file operation.
Disadvantages
⮚Each Disk block needs a FAT entry.
⮚FAT size may be very big depending upon the number of FAT entries.
⮚Number of FAT entries can be reduced by increasing the block size but it will also
increase Internal Fragmentation.
Inode
⮚ In UNIX based operating systems, each file is indexed by an Inode.
⮚ Inode are the special disk block which is created with the creation of the
file system. The number of files or directories in a file system depends
on the number of Inodes in the file system.
An Inode includes the following information
⮚Attributes (permissions, time stamp, ownership details, etc) of the file
⮚A number of direct blocks which contains the pointers to first 12 blocks of
the file.
⮚A single indirect pointer which points to an index block. If the file cannot be
indexed entirely by the direct blocks then the single indirect pointer is used.
Inode contd..
⮚ A double indirect pointer which points to a disk block that is a collection of the
pointers to the disk blocks which are index blocks.
⮚ Double index pointer is used if the file is too big to be indexed entirely by the direct
blocks as well as the single indirect pointer.
⮚ A triple index pointer that points to a disk block that is a collection of pointers.
Each of the pointers is separately pointing to a disk block which also contains a
collection of pointers which are separately pointing to an index block that contains
the pointers to the file blocks
I-nodes
An example i-node.
Free Space Management
A file system is responsible to allocate the free blocks to the file therefore it
has to keep track of all the free blocks present in the disk.
There are mainly two approaches by using which, the free blocks in the disk
are managed.
1. Bit Vector
⮚In this approach, the free space list is implemented as a bit map vector. It
contains the number of bits where each bit represents each block.
⮚If the block is empty then the bit is 1 otherwise it is 0.
⮚ Initially all the blocks are empty therefore each bit in the bit map vector
contains 1.
⮚LAs the space allocation proceeds, the file system starts allocating blocks to
the files and setting the respective bit to 0.
Free Space Management
2. Linked List
⮚It is another approach for free space management. This approach
suggests linking together all the free blocks and keeping a pointer in
the cache which points to the first free block.
⮚Therefore, all the free blocks on the disks will be linked together with
a pointer.
⮚Whenever a block gets allocated, its previous free block will be linked
to its next free block.
Contiguous Allocation
Figure 4-10. (a) Contiguous allocation of disk space for 7 files.
(b) The state of the disk after files D and F have been removed.
Linked List Allocation
Figure 4-11. Storing a file as a linked list of disk blocks.
Linked List Allocation Using a Table in Memory
Figure 4-12. Linked list allocation using a file allocation table
in main memory.