Lecture2
Lecture2
Introduction to Filesystems
Optional reading:
Operating Systems: Principles and Practice (2 nd Edition): Chapter 11,
Section 12.1, 12.2 and Section 13.3 (up through page 567)
4
Topic 1: Filesystems - How can
we design filesystems to manage files
on disk, and what are the tradeoffs
inherent in designing them? How
can we interact with the filesystem in
our programs?
5
CS111 Topic 1: Filesystems
Filesystems - How can we design filesystems to manage files on disk, and what
are the tradeoffs inherent in designing them? How can we interact with the
filesystem in our programs?
Filesystems Filesystem
Case study: Unix
introduction and System calls and Crash recovery
V6 Filesystem
design file descriptors
7
Learning Goals
• Understand the key responsibilities and requirements of a filesystem
• Get practice identifying tradeoffs in different filesystem designs
• Explore the design of the Unix V6 filesystem
8
Plan For Today
• Filesystems Introduction
• Methods for Storing Files
• Contiguous Allocation
• Linked Files
• Windows FAT
• Multi-level indexes
• The Unix V6 Filesystem
• Inodes
9
Plan For Today
• Filesystems Introduction
• Methods for Storing Files
• Contiguous Allocation
• Linked Files
• Windows FAT
• Multi-level indexes
• The Unix V6 Filesystem
• Inodes
10
Filesystems
A filesystem is the portion of the OS that manages the disk.
• A hard drive (or, more commonly these days, flash storage) is persistent
storage – it can store data between power-offs.
12
Hard Drives
Hard drives have peculiar performance
characteristics that have a big impact on how
we build filesystems.
• Reading and writing requires seeking (moving
arm to position heads over desired track) and
waiting for desired location to pass
underneath. Want to minimize this time.
• We can only read data in chunks of sectors.
Example of virtualization; making one thing
look like another.
…
sector 0 sector 1 sector 2 sector 3 sector 4 sector 5 sector 6 13
Hard Disks are Sector-Addressable
…
sector 0 sector 1 sector 2 sector 3 sector 4 sector 5 sector 6
If we are the OS, the hard disk creators might provide this API (“application
programming interface”) – a set of public functions - to interface with the disk:
This is all we get! We (the OS) must build a filesystem by layering functions on
top of these to ultimately allow us to read, write, lookup, and modify entire files.14
Filesystem Functionality
We want to read/write file on disk and have them persist even when the device
is off. This may include operations like:
15
Filesystems
16
Filesystem Challenges
Problems addressed by modern file systems:
• Disk space management:
• Fast access to files (minimize seeks)
• Sharing space between users
• Efficient use of disk space
• Naming: how do users select files?
• Reliability: information must survive OS crashes and hardware failures.
• Protection: isolation between users, controlled sharing.
17
Flash Storage
Recently, flash storage (“SSD”) has become
more popular and commonplace, especially
with the growth in mobile devices.
• Much faster (100x faster access), but more
expensive
• No moving parts, so more reliable
• Issues with wear-out; once a chunk of the https://www.samsung.com/us/computing/memory-
drive has been erased many times (~100k), it storage/solid-state-drives/980-pro-pcie-4-0-nvme-ssd-1tb-
mz-v8p1t0b-am/
18
Plan For Today
• Filesystems Introduction
• Methods for Storing Files
• Contiguous Allocation
• Linked Files
• Windows FAT
• Multi-level indexes
• The Unix V6 Filesystem
• Inodes
19
Sectors and Blocks
A filesystem generally defines its own unit of data, a "block," that it reads/writes
at a time.
• "Sector" = hard disk storage unit
• "Block" = filesystem storage unit (1 or more sectors) - software abstraction
Key insight: both must be stored on the hard disk. Otherwise, we will not have
it across power-offs! (E.g. without storing metadata we would lose all filenames
after shutdown). This means some blocks must store data other than payload
data.
21
Storing Files on Disk
Two types of data we will be working with:
1. file payload data - contents of files (e.g. text in documents, pixels in
images)
2. file metadata - information about files (e.g. name, size)
Key insight: both must be stored on the hard disk. Otherwise, we will not have
it across power-offs! (E.g. without storing metadata we would lose all filenames
after shutdown). This means some blocks must store data other than payload
data.
22
Contiguous Allocation
First key question: should we store files contiguously on disk? What would it
look like if we did?
• Called contiguous allocation – allocate a file in one contiguous group of blocks
• For each file, keep track of the number of its first sector and its length
• Keep a free list of unused areas of the disk
• Example: IBM OS/360
• Advantages?
…
block 0 block 1 block 2 block 3 block 4 block 5 block 6
23
Contiguous Allocation
First key question: should we store files contiguously on disk? What would it
look like if we did?
• Called contiguous allocation – allocate a file in one contiguous group of blocks
Advantages:
• simple
• can read sequentially or easily jump to any location in file (“random access”)
• all data in one place (few seeks)
What about disadvantages?
…
block 0 block 1 block 2 block 3 block 4 block 5 block 6
24
Contiguous Allocation
First key question: should we store files contiguously on disk? What would it
look like if we did?
• Called contiguous allocation – allocate a file in one contiguous group of blocks
Disadvantages:
• hard to grow files
• hard to lay out files on disk – we may not be able to squeeze a new file in a
block of free space (external fragmentation – occurs when we have space on
disk, but can’t use it to store files)
…
block 0 block 1 block 2 block 3 block 4 block 5 block 6
25
Linked Files
First key question: should we store files contiguously on disk? What would it
look like if we didn’t?
• Problem: we need to know what blocks are associated with what files
One idea: linked files – like a linked list
• Each block contains file data as well as the location of the next block
• For each file, keep track of the number of its first block in separate location
• Approximate examples: TOPS-10, Xerox Alto
• Advantages?
File 0 Start: 10 File 0 File 2 File 1 File 2 File 0 File 2
File 1 Start: 12 … …
File 2 Start: 13 Next: 14 Next: END Next: END Next: 15 Next: END Next: 11
block 10 block 11 block 12 block 13 block 14 block 15 26
Linked Files
First key question: should we store files contiguously on disk? What would it
look like if we didn’t? One idea: linked files – like a linked list
• Each block contains file data as well as the location of the next block
Advantages:
• Easy to grow files
• Easier to fit files in available space – less fragmentation
• Still supports simple sequential access
What about disadvantages?
File 0 Start: 10 File 0 File 2 File 1 File 2 File 0 File 2
File 1 Start: 12 … …
File 2 Start: 13 Next: 14 Next: END Next: END Next: 15 Next: END Next: 11
block 10 block 11 block 12 block 13 block 14 block 15 27
Linked Files
First key question: should we store files contiguously on disk? What would it
look like if we didn’t?
One idea: linked files – like a linked list
• Each block contains file data as well as the location of the next block
Disadvantages:
• Can’t easily jump to any arbitrary location in the file
• Data scattered throughout disk (more seeks)
File 0: 10, 14
File 1: 12
File 2: 13, 15, 11
38
Unix V6 Filesystem
Key Idea: files don’t need to be stored contiguously on disk, but we want to
store all the block numbers in order that make up the data for a file.
Where could we store this information for each file for easy lookup?
Let’s reserve some space on disk to store this information for each file,
separately from its payload data. This per-file space is called an inode.
39
Inodes
An inode ("index node") is a grouping of data about a single file, stored on disk.
• For Unix v6, an inode contains an ordered list of block numbers that store the
file’s payload data, and also stores other metadata like file size.
• Unix v6 stores inodes on disk together in a reserved portion of blocks starting
at block 2, called the inode table, for quick access.
• Inodes can be read into memory when used for quicker access
• Some other filesystems (e.g., contiguous allocation/linked files, but not FAT)
store file metadata in inodes, too
40
Unix V6 Inodes
The Unix v6 filesystem stores inodes on disk together in the inode table for
quick access.
• Inodes are 32 bytes big, and 1 block = 1 sector = 512 bytes, so 16 inodes/block.
• inodes are stored in a reserved region starting at block 2 (block 0 is "boot
block" containing hard drive info, block 1 is "superblock" containing filesystem
info). Typically, at most 10% of the drive stores metadata.
• Filesystem goes from filename to inode number ("inumber") to file data.
41
Unix V6 Inodes
We need inodes to be a fixed size, and not too large. So how should we store
the block numbers? How many should there be?
1. if variable number, there's no fixed inode size
2. if fixed number, this limits maximum file size
The inode design here has space for 8 block numbers, which are stored in
order. (i.e. first block number stores first chunk of file, etc.). But we will see
later how we can build on this to support very large files.
42
Recap
• Filesystems Introduction Lecture 2 takeaway:
• Methods for Storing Files Filesystems need to store
• Contiguous Allocation
both file metadata and
• Linked Files
• Windows FAT payload data. There are
• Multi-level indexes various ways to store
• The Unix V6 Filesystem payload data, each with
• Inodes different pros/cons. The Unix
V6 filesystem uses inodes to
Next time: more about the Unix v6 store file data, including
Filesystem
block numbers.
43