Unit 02
Unit 02
Additionally, file systems provide tools which allow the manipulation of files,
provide a logical organization as well as provide services which map the
logical organization of files to physical devices.
From the beginners perspective, the Unix file system is essentially composed
of files and directories. Directories are special files that may contain other
files.
The Unix file system has a hierarchical (or tree-like) structure with its highest
level directory called root (denoted by /, pronounced slash). Immediately
below the root level directory are several subdirectories, most of which contain
system files. Below this can exist system files, application files, and/or user
data files. Similar to the concept of the process parent-child relationship, all
files on a Unix system are related to one another. That is, files also have a
parent-child existence. Thus, all files (except one) share a common parental
link, the top-most file (i.e. /) being the exception.
Below is a diagram (slice) of a "typical" Unix file system. As you can see, the
top-most directory is / (slash), with the directories directly beneath being
system directories. Note that as Unix implementaions and vendors vary, so
will this file system hierarchy. However, the organization of most file systems
is similar.
While this diagram is not all inclusive, the following system files (i.e.
directories) are present in most Unix filesystems:
bin - short for binaries, this is the directory where many commonly used
executable commands reside
dev - contains device specific files
etc - contains system configuration files
home - contains user directories and files
lib - contains all library files
mnt - contains device files related to mounted devices
proc - contains files related to system processes
root - the root users' home directory (note this is different than /)
sbin - system binary files reside here. If there is no sbin directory on
your system, these files most likely reside in etc
tmp - storage for temporary files which are periodically removed from
the filesystem
usr - also contains executable commands
File Types
From a user perspective in a Unix system, everything is treated as a file. Even
such devices such as printers and disk drives.
How can this be, you ask? Since all data is essentially a stream of bytes, each
device can be viewed logically as a file.
All files in the Unix file system can be loosely categorized into 3 types,
specifically:
1. ordinary files
2. directory files
3. device files 1
While the latter two may not intuitively seem like files, they are considered
"special" files.
The first type of file listed above is an ordinary file, that is, a file with no
"special-ness". Ordinary files are comprised of streams of data (bytes) stored
on some physical device. Examples of ordinary files include simple text files,
application data files, files containing high-level source code, executable text
files, and binary image files. Note that unlike some other OS implementations,
files do not have to be binary Images to be executable (more on this to come).
The second type of file listed above is a special file called a directory (please
don't call it a folder?). Directory files act as a container for other files, of any
category. Thus we can have a directory file contained within a directory file
(this is commonly referred to as a subdirectory). Directory files don't contain
data in the user sense of data, they merely contain references to the files
contained within them.
It is perhaps noteworthy at this point to mention that any "file" that has files
directly below (contained within) it in the hierarchy must be a directory, and
any "file" that does not have files below it in the hierarchy can be an ordinary
file, or a directory, albeit empty.
The third category of file mentioned above is a device file. This is another
special file that is used to describe a physical device, such as a printer or a
portable drive. This file contains no data whatsoever, it merely maps any data
coming its way to the physical device it describes.
1
Device file types typically include: character device files, block device files,
Unix domain sockets, named pipes and symbolic links. However, not all of
these file types may be present across various Unix implementations.
By rule, Unix file names do not have to have ending extensions (such as .txt
or .exe) as do some other operating systems. However, certain applications
with which you interact may require extensions, such as Adobe's Acrobat
Reader (.pdf) or a web browser (.html). And as always character case matters
Thus the following are all valid Unix file names (note these may be any file
type):
While file names are certainly important, there is another important related
concept, and that is the concept of a file specification1 (or file spec for short).
A file spec may simply consist of a file name, or it might also include more
information about a file, such as where is resides in the overall file system.
There are 2 techniques for describing file
specifications, absolute and relative.
With absolute file specifications, the file specification always begins from the
root directory, complete and unambiguous. Absolute file specs are sometimes
referred to as fully qualified path names2. Thus, absolute file specs always
begin with /. For example, the following are all absolute file specs
/etc/passwd
/bin
/usr/bin
/home/mthomas/bin
/home/mthomas/class_stuff/foo
Note the the first slash indictes the top of the tree (root), but each succeeding
slash in the file spec acts merely as a separator. Also note the files named bin
in the file specifications of /bin, /usr/bin, and /home/mthomas/bin are different
bin files, due to the differing locations in the file system hierarchy.
With relative file specifications, the file specification always is related to the
users current position or location in the file system. Thus, the beginning (left-
most part) of a relative file spec describes either:
an ordinary file, which implies the file is contained within the current
directory
a directory, which implies a child of the current directory (i.e. one level
down)
a reference to the parent of the current directory (i.e. one level up)
What this means then is that a relative file specification that is valid from one
file system position is probably not valid from another location. Beginning
users often ask "How do I know where I am?" The command to use to find this
is the pwd (print working directory) command, which will indicate the users
current position (in absolute form) in the file system.
To identify where we are, we type and the system returns the following:
$ pwd [Enter]
/home/mthomas/class_stuff
$ pwd [Enter]
/home/mthomas
$ pwd [Enter]
/home/mthomas/class_stuff
If we wish to change directories to the /home/mthomas/bin
directory, we can type
or
or
Novice users sometimes ask which file specification method should they use,
or which one is better. The simple and open ended answer is "it depends."
That is it depends upon where one currently is in the file system hierarchy,
and what one is trying to do. This also depends upon how long the file
specification is, or how easy it is to type, including any special characters, or
how familiar one is with the current location in the file system hierarchy, etc.
1
Some might choose to call this a path name or path specifier, but I prefer to
call this a file specification since all the individual components are files.
2
Fully qualified path names, also referred to as FQPN, are frequently used by
system administrators who are used to working with DNS. More information
can be found by searching on fully qualified path names in your favorite
search engine.
3
A user can also move to their "home" directory using the command cd ~
(tilde), where the ~ character represents a users "home" directory. If used in
this simple way, cd and cd ~ are equivalent. However, much more
sophisticated behaviorcan be achieved using the ~ character as
described here.
From the above output, we can observe 7 attribute fields listed for each file.
From right to left, the attribute fields are:
file name: the name associated with the file (recall, this can be any type
of file)
modification date: the date the file was last modified, i.e. a "time-
stamp". If the file has not been modified within the last year (or six
months for Linux), the year of last modification is displayed.
size: the size of the file in bytes (i.e. characters).2
group: associated group for the file
owner: the owner of the file
number of links: the number of other links associated with this file
permission modes: the permissions assigned to the file for the owner,
the group and all others.
1
Note this listing is for example purposes and not necessarily an accurate or
complete representation of the root (/) directory.
2
This is the number of characters in the file, not necessarily the size on disk,
since files are written to disk in 1024 byte blocks. Note also if the file is a
directory, this is the size of the structure needed to manage the directory
hierarchy.
Understanding and Modifying File Permissions
If you look closely at the permission field above, you will notice the permission
field for each file consists of 10 characters as described by the diagram below:
The first (leftmost) character indicates the "type" of the file. Another way to
describe this is whether the file has any special attributes associated with it. If
it is an ordinary file (i.e. no special attributes), it will have a dash in this first
position. If it is a directory file, it will have the letter d in this position. Or, if it is
a link to another file it will have the letter l (ell) in this first position. You can
see examples of an ordinary file and a directory in the ls -l output above.
Other special attributes exist but do not merit discussion here.
To illustrate with another example, if the owner of a file wanted to set read and
execute for the user permissions, read only for group permissions and disable
all permissions for all others, the command to set this would be
$ umask [Enter]
022
$ umask -S [Enter]
u=rwx,g=rx,o=rx
As with chmod, umask uses a single octal digit for each of the owner, group
and other fields. If a 3 digit mode value is given (without the -S), the mode
value specified is "removed from" 777 (e.g. drwxrwxrwx) for newly
created directories and "removed from" 666 (e.g. -rw-rw-rw) for newly
created files. Note this is not subtraction 2, the octal values determine what
permissions are disabled, as described in the table below:
Octal digit in
Permissions disabled during file creation
umask command
0 none, all original permissions will remain
1 execute permission is disabled
2 write permission is disabled
3 write and execute permission are disabled
4 read permission is disabled
5 read and execute permissions are disabled
6 read and write permission are disabled
7 all permissions are disabled
Recall that chmod value of 4 for an owner, group or other grants read access.
As mentioned, umask is somewhat the opposite of chmod. A umask mode of
4 disables read access for that owner, group or other. The most common
setting for umask mode value is 022, set as follows:
$ umask 022[Enter]
which leaves all permissions unchanged (from 777 for directories or 666 for
ordinary files) for the owner, disables write permission for the group, as well
as disables write permission for other (refer to table above). This results in
drwxr-xr-x for directories and -rw-r--r-- for ordinary files. Looking at another
example:
$ umask 077[Enter]
leaves all permissions unchanged for the owner, disables all permissions for
the group, as well as disables all permissions for other, resulting in drwx------
for directories and -rw------- for ordinary files.
Note there is no file specification present in the umask command since this
command sets the default mode for all newly created files (and directories).
1
There is another way to use the chmod command called symbolic mode
which uses symbolic characters instead of numbers.
2
One can use the table to see what umask values set what, but one can also
use binary logic to calculate this. This works as each default octal digit is
ANDed with the NOTed value of each umask octal digit. For example, if the
default digit is 7 (1112), and your umask digit is 2 (0102), then the binary
values of 111 AND NOT 010 = 111 AND 101 = 101 = 5 (r-x), i.e. write
permission disabled (see table above). Similarly, if the starting default digit is
6 (1102), and your umask digit is 2 (0102) then the binary values of 110 AND
NOT 010 = 110 AND 101 = 100 = 4 (r--), again write permission disabled. The
latter example is documented in the man pages as 666 & ~022 = 644; i.e., rw-
r--r--.
mkdir file_spec
where the file_spec is any valid file specification. Keep in mind that to create
ordinary or directory files, the user must have write permission for the target
location.
Once files are created and saved, users typically find the need to make copies
of various files. Files are copied in Unix using the cp command as follows:
cp source_file_spec(s) destination_file_spec(s)
cp source_file_spec(s) .
The dot is the place holder of the 2nd argument and results in all source files
being copied to the current working directory with the exact file names. Note
without the dot, there is only one argument which will result in a syntax error
by the shell.
Moving files is very similar to copying files, the difference being with copy, the
source files remain intact while with move, the source files no longer exist in
their original location. The command to move files is mv and the syntax is
analogous to copy:
mv source_file_spec(s) destination_file_spec(s)
Some examples using the cp command based upon the diagram above given
the working directory as follows:
$ pwd [Enter]
/home/stu1
Users can also remove files (including directories) from the file system. The
command to delete ordinary files from the file system is rm and its syntax is
rm file_spec(s)
Note: if a file has no write permission and the standard input comes from a
terminal, most newer versions of rm will prompt the user for whether to
remove the file (assuming no -f flag). If any yes (or y or Y) response is given,
the file will be removed even though write permission is disabled. This
behavior can be circumvented by disabling the write permission on the parent
directory.
In similar fashion, users can remove directories from the file system using
the rmdir command as follows:
rmdir file_spec(s)
Keep in mind that both of these commands require valid file specifications as
well as sufficient permissions.
$ cp /home/mthomas/class_stuff/* .
would copy all files in the class_stuff directory to your current location in the
file system tree, keeping all names the same. Note the wildcard character
can only be used for the source specification. Similarly, the command:
$ chmod 755 *
would change the file protections modes to -rwxr-xr-x for all files in your
current directory. You can also use the wildcard character to selectively match
the names of files. For example, a* would match (or select) all files that start
with the letter a. Looking at an example:
$ ls net*
$ chmod 755 net*
the first command would list all files starting with the three characters "net",
while the second would change the mode (as mentioned above) for all files
starting with these three characters. Note when using wildcards as part of a
filename, no spaces can be between the literal characters and the wildcard
character.
One should be careful using the wildcard character though, as this can have
dangerous results. Can you tell what the following command does?
$ rm *
Another topic which can aid in working within the file system is using the copy
(cp) command when working with directories. If one wishes to copy all files
within a directory to another location, one could simply use the command:
$ cp * destination_file_spec