File System
File
File:
Contiguous logical address space to store data / program.
Contiguous / Non Contiguous blocks on HDD.
File Attributes: (Metadata of file)
Name
Inode Number
Type
Location
Size
Protection
Time, date, and user identification
File Operations
Open
Create
Close
#include <fcntl.h>
#include <unistd.h>
#include <sys/types.h>
int open (const char *filename, int flags [, mode_t mode])
int create (const char *filename, mode_t mode)
int close (int filedes)
Bit vector of: Bit vector of Permission Bits:
• Access modes (Rd, Wr, …) • User |Group|Other X R|W|X
• Open Flags (Create, …)
• Operating modes (Appends,
…)
File Operations
Read
Write
Seek
Delete
ssize_t read (int filedes, void *buffer, size_t maxsize)
- returns bytes read, 0 => EOF, -1 => error
ssize_t write (int filedes, const void *buffer, size_t size)
- returns bytes written
off_t lseek (int filedes, off_t offset, int whence)
File System
File System is method for storing and organizing computer files such that its
easy to find and access them later.
File System Components
Disk Management:
Mapping files to blocks and vice versa
Keep track of free blocks
Naming:
Map file name to its blocks
Protection:
Layer to keep data secure
Reliability/Durability:
Keeping of files durable despite crashes, media failures, attacks, etc
Translating from User to System View
File
System
User says: “Give me bytes 2—12”?
Fetch block corresponding to those bytes
Return just the correct portion of the block
What about: “write bytes 2—12”?
Fetch block
Modify portion
Write out Block
Everything inside File System is in whole size blocks
For example, getc(), putc() buffers something like 4096 bytes,
even if interface is one byte at a time
From now on, file is a collection of blocks
Disk Management Policies
What we store on the disk:
File
user-visible group of blocks arranged sequentially in logical space
Directory
user-visible index mapping names to files
Actually a file only
Need way to structure files: File Header / Inode
Track which blocks belong at which offsets within the logical file
structure
Optimize placement of files disk blocks to match access and usage
patterns
Disk Management Policies
Need way to track free disk blocks
Link free blocks together
Maintain a linked list
Slow
Use bitmap to represent free space on disk
Will consume memory
Disk Management Policies
Access disk as linear array of sectors:
Identify sectors as vectors [cylinder, surface, sector].
Sort in cylinder-major order.
OS must deal with bad blocks
Not used much anymore.
Logical Block Addressing (LBA).
Every sector has integer address from zero up to max number of
sectors.
Controller translates from LBA to physical position
OS need not worry about disk structure
File System Access Patterns
How do user access file?
Need to know the access pattern user is likely to throw at system.
Sequential Access:
Bytes read in ordered fashion.
“Give me “X” bytes”, then give me next “Y” bytes …
Almost all file access are of this type.
Random Access:
Read / write from the middle of file.
“Give me bytes “I” to “J”
Don’t have to read the complete file for this.
Less frequent, but still important.
Content based Access:
“Find 100 bytes starting with BITS”
Structured data can provide this functionality.
Build indexes on the data
File System based on Usage Patterns
Facts:
Most files are small.
Very few files are big.
Large files use up most of the disk space and bandwidth.
Few enormous files are equivalent to an immense number of small files.
File Size Distribution on UNIX Systems—Then and Now
Andrew S. Tanenbaum, Jorrit N. Herder, Herbert Bos
A Large-Scale Study of File-System Contents
John R. Douceur and William J. Bolosky
How to organize files on disk
Goals:
Maximize Sequential Performance
Effective Random Access Performance
Easy Management of file (grow, truncate etc)
Block Allocation Methods - Contiguous
Allocate continuous range of blocks on HDD
User pre-declares max file size (disadvantage)
Search bit-map for space using best fit/first fit
What if not enough contiguous space for new file?
File Header Contains:
First sector/LBA in file
File size (# of sectors)
Pros:
Fast Sequential Access
Easy Random access
Cons:
External Fragmentation
Hard to grow files
Could compact space, but that would be really expensive
Continuous Allocation was used by IBM 360
Linked List Allocation
Each block points to next block on the disk.
File Header
Pros:
Null
Can grow files dynamically
Free list same as file
Cons:
Bad Sequential Access (seek between each block),
Unreliable (lose a block -> lose rest of file)
Serious Con: Bad random access!!!!
Technique originally from Alto (First PC, built at Xerox)
No attempt to allocate contiguous blocks
Indexed Allocation
System allocates file header block to hold array of pointers big enough to
point to all data blocks
User pre-declares max file size
Pros:
Can easily grow up to space allocated for index
Random access is fast
Cons:
Clumsy to grow file bigger than table size
Still lots of seeks, blocks may be spread over disk