0% found this document useful (0 votes)
25 views30 pages

Introduction To File Organization

The document provides an introduction to file organization and file systems. It discusses that files are abstractions that contain data and metadata, and are organized hierarchically through directories. The file system maps file names to disk blocks to allow access and storage of files. There is a distinction between how users and applications view files as information versus how operating systems view files as containers of data blocks. Path name translation involves searching through the directory structure to locate a file's location on disk.

Uploaded by

Jbkhun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views30 pages

Introduction To File Organization

The document provides an introduction to file organization and file systems. It discusses that files are abstractions that contain data and metadata, and are organized hierarchically through directories. The file system maps file names to disk blocks to allow access and storage of files. There is a distinction between how users and applications view files as information versus how operating systems view files as containers of data blocks. Path name translation involves searching through the directory structure to locate a file's location on disk.

Uploaded by

Jbkhun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

Introduction to File Organization

File (an abstraction)


A (potentially) large amount of information or
data that lives a (potentially) very long time
Often much larger than the memory of the computer
Often much longer than any computation
Sometimes longer than life of machine itself

(Usually) organized as a linear array of bytes or


blocks
Internal structure is imposed by application
(Occasionally) blocks may be variable length

(Often) requiring concurrent access by multiple


processes
Even by processes on different machines!

CS-4513 D-term 2008

Introduction to File
Systems

File Systems and Disks


User view
File is a named, persistent collection of data

OS & file system view


File is collection of disk blocks i.e., a container
File System maps file names and offsets to disk blocks

CS-4513 D-term 2008

Introduction to File
Systems

Fundamental ambiguity
Is the file the container of the information
or the information itself?
Almost all systems confuse the two.
Almost all people confuse the two.

CS-4513 D-term 2008

Introduction to File
Systems

Example Suppose that you e-mail me a


document
Later, how do either of us know that we are using
the same version of the document?
Windows/Outlook/Exchange/MacOS:
Time-stamp is a pretty good indication that they are
Time-stamps preserved on copy, drag and drop, transmission
via e-mail, etc.

Unix/Linux
By default, time-stamps not preserved on copy, ftp, e-mail, etc.
Time-stamp associated with container, not with information

CS-4513 D-term 2008

Introduction to File
Systems

Rule of Thumb
Almost always, people and applications
think in terms of the information
Many systems think in terms of containers
Professional Guidance: Be aware of the
distinction, even when the system is not
CS-4513 D-term 2008

Introduction to File
Systems

Attributes of Files
Name:

Size:

Although the name is not


always what you think it is!

Type:

Length in number of bytes;


occasionally rounded up

Protection:

May be encoded in the


name (e.g., .cpp, .txt)

Dates:
Creation, updated, last
accessed, etc.
(Usually) associated with
container
Better if associated with
content
CS-4513 D-term 2008

Owner, group, etc.


Authority to read, update,
extend, etc.

Locks:
For managing concurrent
access

Introduction to File
Systems

Definition File Metadata


Information about a file
Maintained by the file system
Separate from file itself
Usually attached or connected to the file
E.g., in block # 1

Some information visible to user/application


Dates, permissions, type, name, etc.

Some information primarily for OS


Location on disk, locks, cached attributes
CS-4513 D-term 2008

Introduction to File
Systems

Observation some attributes are not visible


to user or program
E.g., location
Location is stored in metadata
Location can change, even if file does not
Location is not visible to user or program

CS-4513 D-term 2008

Introduction to File
Systems

Example Location
Example 1:
mv ~lauer/project1.doc ~cs4513/public_html/d08

Example 2:
System moves file from disk block 10,000 to disk block
20,000
System restores a file from backup

May or may not be reflected in metadata


CS-4513 D-term 2008

Introduction to File
Systems

10

Question is location an attribute of file?


Answer: It is an attribute of the container
Not an attribute of the information!

CS-4513 D-term 2008

Introduction to File
Systems

11

File Types

CS-4513 D-term 2008

Introduction to File
Systems

12

Operations on Files
Open, Close
Gain or relinquish access to a file
OS returns a file handle an internal data structure letting it
cache internal information needed for efficient file access

Read, Write, Truncate


Read: return a sequence of n bytes from file
Write: replace n bytes in file, and/or append to end
Truncate: throw away all but the first n bytes of file

Seek, Tell
Seek: reposition file pointer for subsequent reads and writes
Tell: get current file pointer

Create, Delete:
Conjure up a new file; or blow away an existing one
CS-4513 D-term 2008

Introduction to File
Systems

13

File a very powerful abstraction


Documents, code
Databases
Very large, possibly spanning multiple disks

Streams
Input, output, keyboard, display
Pipes, network connections,

Virtual memory backing store


Temporary repositories of OS information

Any time you need to remember something beyond the life
of a particular process/computation
CS-4513 D-term 2008

Introduction to File
Systems

14

Methods for Accessing Files


Sequential access
Random access
Keyed (or indexed) access

CS-4513 D-term 2008

Introduction to File
Systems

15

Sequential Access Method


Read all bytes or records in order from the
beginning
Writing implicitly truncates
Cannot jump around
Could possibly rewind or back up

Appropriate for certain media or systems

Magnetic tape or punched cards


Video tape (VHS, etc.)
Unix-Linux-Windows pipes
Network streams

CS-4513 D-term 2008

Introduction to File
Systems

16

Random Access Method


Bytes/records can be read in any order
Writing can
Replace existing bytes or records
Append to end of file
Cannot insert data between existing bytes!

Seek operation moves current file pointer


Maintained as part of open file information
Discarded on close

Typical of most modern information storage


Data base systems
Randomly accessible multi-media (CD, DVD, etc)

CS-4513 D-term 2008

Introduction to File
Systems

17

Keyed (or indexed) Access Methods


Access items in file based on the contents of
(part of) an item in the file
Provided in older commercial operating
systems (IBM ISAM)
(Usually) handled separately by modern
database systems

CS-4513 D-term 2008

Introduction to File
Systems

18

Questions?

CS-4513 D-term 2008

Introduction to File
Systems

19

Directory A Special Kind of File


A tool for users & applications to organize
and find files
User-friendly names
Names that are meaningful over long periods of time

The data structure for OS to locate files


(i.e., containers) on disk

CS-4513 D-term 2008

Introduction to File
Systems

20

Directory structures
Single level
One directory per system, one entry pointing to each file
Small, single-user or single-use systems
PDA, cell phone, etc.

Two-level
Single master directory per system
Each entry points to one single-level directory per user
Uncommon in modern operating systems

Hierarchical
Any directory entry may point to
Individual file
Another directory

Common in most modern operating systems


CS-4513 D-term 2008

Introduction to File
Systems

21

Directory Considerations
Efficiency locating a file quickly.
Naming convenient to users.
Separate users can use same name for separate files.
The same file can have different names for different
users.
Names need only be unique within a directory

Grouping logical grouping of files by


properties
e.g., all Java programs, all games,

CS-4513 D-term 2008

Introduction to File
Systems

22

Directory Organization Hierarchical


Most systems support idea of current (working) directory
Absolute names fully qualified from root of file system
/usr/group/foo.c, ~/kernelSrc/config.h

Relative names specified with respect to working directory


foo.c, bar/bar2.h

A special name the working directory itself


.

Modified Hierarchical Acyclic Graph (no loops) and


General Graph
Allow directories and files to have multiple names
Links are file names (directory entries) that point to existing
(source) files

CS-4513 D-term 2008

Introduction to File
Systems

23

Links
Symbolic (soft) links: uni-directional relationship between
a file name and the file
Directory entry contains text describing absolute or relative path
name of original file
If the source file is deleted, the link exists but pointer is invalid

Hard links: bi-directional relationship between file names


and file
A hard link is directory entry that points to a source files metadata
Metadata maintains reference count of the number of hard links
pointing to it link reference count
Link reference count is decremented when a hard link is deleted
File data is deleted and space freed when the link reference count
goes to zero
CS-4513 D-term 2008

Introduction to File
Systems

24

Unix-Linux Hard Links


File may have more than one name or path
rm, mv directory operations, not file operations!
The real name of a Unix file is internal name of its
metadata
Known only to OS!

Hard links are not used very often in modern Unix


practice
Exception: Linked copies of large directory trees!
(Usually) safe to regard last element of path as name of
file
CS-4513 D-term 2008

Introduction to File
Systems

25

Path Name Translation


Assume that I want to open /home/lauer/foo.c
fd = open(/home/lauer/foo.c, O_RDWR);

File System does the following


Opens directory / the root directory is in a known place on
disk
Search root directory for the directory home and get its location
Open home and search for the directory lauer and get its location
Open lauer and search for the file foo.c and get its location
Open the file foo.c
Note that the process needs the appropriate permissions at every
step


CS-4513 D-term 2008

Introduction to File
Systems

26

Path Name Translation (continued)



File Systems spend a lot of time walking down
directory paths
This is why open calls are separate from other file
operations
File System attempts to cache prefix lookups to speed
up common searches
~ for users home directory
. for current working directory

Once open, file system caches the metadata of the file


CS-4513 D-term 2008

Introduction to File
Systems

27

Directory Operations
Create:
Make a new directory

Add, Delete entry:


Invoked by file create & destroy, directory create & destroy

Find, List:
Search or enumerate directory entries

Rename:
Change name of an entry without changing anything else about it

Link, Unlink:
Add or remove entry pointing to another entry elsewhere
Introduces possibility of loops in directory graph

Destroy:
Removes directory; must be empty
CS-4513 D-term 2008

Introduction to File
Systems

28

Directories (continued)
Orphan: a file not named in any directory
Cannot be opened by any application (or even OS)
May not even have name!

Tools
FSCK check & repair file system, find orphans
Delete_on_close attribute (in metadata)

Special directory entry: .. parent in hierarchy


Essential for maintaining integrity of directory system
Useful for relative naming

CS-4513 D-term 2008

Introduction to File
Systems

29

Directories Summary
Fundamental mechanism for interpreting
file names in an operating system
Widely used by system, applications, and
users

CS-4513 D-term 2008

Introduction to File
Systems

30

You might also like