0% found this document useful (0 votes)
75 views42 pages

Comp 3000 - Exam Notes Before Midterm 1

Uploaded by

Lauda Vargas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views42 pages

Comp 3000 - Exam Notes Before Midterm 1

Uploaded by

Lauda Vargas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

COMP 3000 - EXAM NOTES

Important things to know / Definitions


Resource Management
CPU + RAM + STORAGE
Benefits of abstraction
Monolithic vs Microkernels
Kernel space, vs user space
Processes
Life cycle & Symbols (ASM)
Address Space Layout
Direct vs Indirect vs Limited Direct
Context Switching
Thrashing
When the overhead of switching process contexts is larger than the time slice

Fork
Unique PID
Libcall in C
Makes a copy of the parent

Exec

Replaces current image with another.


Syscall in OS

External Fragmentation
When free memory is not contiguous
EXAMPLE

Basically there is free space outside of a process, on the ram


In general, cannot be used because it is too small
Internal Fragmentation
When a process uses less memory than it is allocated:

There is space inside a processes allocated memory that remains unused


Paging causes this to basically always be the case for the last block.

Paging
Need for contiguous allocation in physical memory
Used to address EXTERNAL FRAGMENTATION
Needs to be of the same size so that there is no external fragmentation
managing and allocating memory in pages of the same size in page tables

System Calls
Invoked when an OS service is needed.
I/O is done by the OS
Expensive

Library Calls
Done in library files .so.
Used to increase portability and abstraction
You can dynamically link them so that they dont live in the program binary
Will often make sys calls.****
Terminals
Virtual Terminals

Mimics a physical terminal

Pseudo Terminals

A type of virtual terminal


Support terminal Emulators

Terminal Emulator

Apps that mimic a real terminal


Terminal on mac

Getty

Program to initialize a virtual terminal, then calls login

Password Files
/etc/passwd
Account info
- username
- userid
- group id
- full name
- home directory
- shell
Password not stored in here

/etc/shadow
Used to store hashes of passwords, salts, so that a user can login

Login Process
Init
getty
login
bash
programs
User and Group IDs
UID GID
User and group id represent the user and group of the user
For example
Root UID: 0

EUID EGID
By default it is the user id.
When a user calls the setuid() syscall it doesn't actually change the uid
It changes the EUID
- Effective user id.
EUID and EGID are used to check permission bits.
Only changed per process
And child processes
Parent and sibling processes don't inherit the new EUID and EGID
sudo changes the EUID and EGID

Permission Bits
Split up into 3 octets:
The first is OWNER permissions
The second is Group permissions (users in the same group as the owner)
Other permissions (other users that are not the owner or in the same group as the
owner)
Can be represented by a rwx or an octet (a triple of bits: 000)
There is also an s or S that can be used in place of the x in the rwx.
These are setuid and setgid bits. Anytime the program runs, it will be run with
the permissions of the owner, or group respectively.
a lowercase means that there is execution privileges for the user/group and the
user/group running the program will always have the euid/egid of the owner.
Bits:
mapped to rwx with a 1 or 0.
For the set uid and gid parts, there is a 4th tuple added onto the front that
represent whether it is set uid and set gid
1st bit is setuid
2nd bit is setgid
3rd is sticky (we don't need to know)
Decimal
Taken from bits

Examples

letters bits decimal description


rwx rwx 000 111 0777 All users have read write execute
rwx 111 111
rws r-S 110 111 6740 All users that run the program have their EUID (in the
--x 100 000 process) set to the owners, and they can read write
execute

All users have their EGID set to the owner's but the
people in the group cannot execute it (seems
useless, but can be in very certain scenarios)

Others can execute (that means that the euid and


egid can be set)
r-x r-x - 000 101 0550 owner and people in owner's group can read and
-- 101 000 execute

Zombie process
Parent must use wait() on the child
If not it becomes a zombie.
If a process is zombie, the only way to "reap" it, or remove it from the process table,
is to kill the parent.

File Descriptors (fd)


just an int
Data structure in kernelt
stdin, stdout, and stderr are special file descriptors
represented by 0, 1, 2
Represent a file
always used after file is created

Shell pipeline (|)


Take std out, move it to stdin
Dentry
Filename -> Inode Number

Special File Systems


Reside on RAM, not the disk
Can contain regular files or special files.

Special Files
Files that do special things
Device Drivers
ETC.

Drivers
Special files that live in the /dev file system

Driver Hierarchy

Device/Special File
Device Files have:
A major number
Represents type of device
Kernel does this
Minor number
Distinguishes between devices in type
Driver does this
Character/Block Devices
Special files that represent a physical device
Link driver in kernel space to user space

Physical vs Log Size


Internal fragmentation and exernal fragmentation cause Phys > Log
Sparse files cause Log > Phys

What ifs?
Interruption while updating on disk structures

Crash inconsistency
Meta data is gone, super bad
Inodes are good with missing/inconsistent blocks
Random data on the block
Good blocks but missing inodes
Cant find data because
File becomes orphaned
We can find them again
Optimal outcome if it has to happen
Superblock is corrupted
Do a repair
Fetch backup copies

First-Half LECTURES
Intro
What is an OS?
OS + kernel + UI + language runtime

Role of an OS?
Resource management
Hardware
Software
Abstraction
What resources does it manage?
CPU

Time shares
Shares CPU between tasks
Illusion:
Each task owns the whole CPU
Memory
Memory allocation and placements
Illusion:
Each task owns the whole address space
Disk
- Simplified view of data - or file systems
- Unified storage management despite
- Heterogenous devices
This all helps with hardware support

Abstraction

Means to achieve

Simplicity

No need to consider other tasks

Security

Containing errors
Seg Fault
BSOD
Kernel Panic

Portability

One things working on many devices


Lib Calls > Sys CAlls

Create an Illusion

A view or interface
Present new semantics

Hide low level details

Comparison to virtualization

Abstractions have different nature


Virtualization have the same nature

Types of Kernels

Monolothic

Used by:
Desktops
Mobile
Server

Almost everything is in the kernel space

Microkernel

Used by
- Real-Time operating systems
- IoT devices
Everything above the top red line is in the user space
Supervisor vs User mode
Kernel Space

Privileged
Runs on Ring 0
Can do anything
OS kernel
Device Driver

User Space

Runs at Ring 3
No I/O Access
needs to request from kernel
File reads
Network
Peripherals
Contained by the privileged (in the process)

Processes
Definition

Virtualized version of CPU


Virtualized memory (address space)

What makes a process

PID
Memory Image (content)
CPU Context
I/O Resources

Lifecycle of a program

What is a user to the OS?

Label (UID)
Purposes
Accountability
Security
Root is a user, a special user
UID = 0
Not ring 0

Files
What is a file?

Ways to organize data


A linear array of bytes to which we can read and write
Resides on persistent storage devices

Structure

Is up to the application creating it, not the OS

File Systems

Ways of organizing files


ext4, NTFS, FAT32
VFS (Virtual file system)

Function Calls, Lib Calls, Sys Calls


Within a process:
Direct function invocation

Function calls

Loaded into initial memory image

Lib Calls

Dynamic lib calls are made with the help of the OS (dynamically loading new code)

Sys Calls

Way for processes to talk to kernel and get OS services.

Note

Function and lib calls cant do anything with the outside.

POSIX

syscalls not helpful for end users


Portability
OS-LEVEL
Lib-level
Standards for OS interfaces (OS-level)
Windows doesn't really implement posix
Both Lib and Syscalls are defined in POSIX

Abstraction
Key concepts
Made possible by
Omnipotence - (Higher Privilege)
Omniscience - (Sees everything)
OS must have both
Abstraction

New semantics

Virtualization

making a copy

Direct Exec

Task directly on CPU


Lose control (SW vs. SW)

Indirect Exec:

Interpreted (VM like JVM)


Not always possible due to
Too many layers
Too slow

Limited Direct Exec

Allow direct execution for certain things


This is what we do now

Virtualizing the CPU


Harder than RAM

Purposes

Security
Simplicity
Maximizing utilization

Expected Illusion

Processes operate without seeing or thinking about other processes

Execution Context

Processor is like a state machine


Program is input states
X86 STATES

Counter E/R IP (Instruction pointer for next instruction)


Status/flags E/R Flags
General purpose E/R AX, BX, CX
Other registers
Memory info
Much more!

IP - 16 bits
EIP 32 bits
RIP - 64 bits

Same register, just compatibility

GDB

List source code


Break points
Info registers - examine registers at break point
si - di source destination
bp - sp - stack

Process abstraction
Running a program
Loaded into mem as binary

Hidden

Other processes using the CPU


Privileged facilities

New Semantics

Processes with PID


APIS
Syscalls and Signals

Schedules
Mechanisms

What is done
Policies

How it is done

Principle 1: Keep the CPU Busy

Mechanism
Execution Context
Policy
Parameters of exec context
Maxing CPU utilization
How to get IO?
Ask outside resources
Pending IO it can finish
Containment

Principle 2: Finish tasks as soon as possible

Turn around time -> Between Arrival and completion


If you have multiple cores, take advantage
May hurt fairness
SHORTEST JOB FIRST -> SJF
Response time -> Arrival, first execution

Principle 3: Be Fair to all Processes

Be fair

Round Robin

Mechanism to schedule tasks

Time Slice

A section of time to execute some instructions


Each process gets a time slice or many

Context Switch
Mechanism to
Save context of one process
Restore another context
Data structure where it is stored is called
Process Control Block
Can cause:

Thrashing !!

When the overhead of switching is larger than the time slice

Concurreny vs. Parallelism


Concurrency

Multiple tasks happening in the same time frame

Parallelism

Happening at the exact same time on different cores

There is no true parallelism because there is not enough cores.

Working with processes


The APIs are part of abstraction

Fork !

Unique PID
Libcall in C
Makes a copy of the parent
Exec !

Replaces current image with another.


Syscall in OS

Processes Vs Threads !
Parallelization of a program

Thread

Multiple execution contexts


Shares address space
Called a lightweight process

Saves Execution Context with the Process Control Block

Process

Doesnt share PID or address space


Info

The kernel doesnt see user threads


Theyre the same (low level)
Working with threads

Pthread (POSIX)
Pthread_create() libcall
At OS level, syscall used to create thread
clone() on linux

Command Line Arguments


argc, argv
ls is one argement, so argc = 1
Parsing is defined by the program, (bash)

Virtualizing Memory

Goals

Security
Reliability
Simplicity
Max efficiency

How is it organized?

Addressability
Granularity of bytes
Accessed
Process architecture determines (32 bits = 4 byte accessibility)

Packing and Unpacking

in a struct, it could be
packed
There is no gap in data
Unpacked
Gap in data
Aligned to size of architecture
Consequence of unpacked
Data can go over a boundary and cause bad performance
Element can go across boundary
Problem to Solve: Space allocation

Mapping between the memory can cause fragmentation


External fragmentation
Leave holes between data
Have that much memory, but dont have contiguous units.
Hierarchy

Address Space Abstraction


Seems independent
No way to affect other processes
They are virtual
Detemined by kernel
Has discretion, translation isn't fixed
Done by kernel
Hardware support is essential
If done by software it has overhead

Metadata

Stack
ASM, push pop
Heap
Lives elsewhere, requested at run time

Segmentation
Assigns different segments to different sections
Segmentation is no longer used, but seg fault is still used
The start of a segment serves as the address by applying an offset

Paging !
Need for contiguous allocation in physical memory
Used to address EXTERNAL FRAGMENTATION
Needs to be of the same size so that there is no external fragmentation
managing and allocating memory in pages of the same size in page tables

Swapping
When there isn't enough ram on the system for everything to be run
Put less used data in memory on a "pagefile" or swap partition on the disk so
that it can be accessed later need be.
When the pages are needed, its caused a page fault, then they are fetched

Abstractions Provided By OS
Most of the abstractions are not seen in the kernel space
Similar mechanisms apply.

Linking And System Calls


Static Linking

Avoid dependency issues


All functions live in the data segment of the memory
Dynamic Linking

Reuse of common functions


Loaded in memory
Make sure same version is used.

System Calls

Invoked when an OS service is needed.


I/O is done by the OS
Expensive

Facilities for Users/Programmers


Talking to the Computer
1. Connect to the computer
Using the terminal
Terminal is a device "Device", physical or virtual
2. Log In
Authentication
3. Send commands
Using the shell
The shell is a program
4. Program Management
Control Programs
Process management
Feed data
Data exchange inter programs or user
OS Services
System calls, no access to outside resources

What's a terminal
A device used to enter data

Virtual Terminals

Mimics a physical terminal

Pseudo Terminals

A type of virtual terminal


Support terminal Emulators

Terminal Emulator

Apps that mimic a real terminal


Terminal on mac

Getty

Program to initialize a virtual terminal, then calls login

User and Access Control


File protection
Username -> Uid
User group -> GID

Login Process

Establishes a session

Password Files

/etc/passwd

Account info
- username
- userid
- group id
- full name
- home directory
- shell
Password not stored in here

/etc/shadow

Used to store hashes of passwords, salts, so that a user can login

Permissions
File based access control
Each object has an owner
there are permission bits
rwx, rws Permission Bits
The Shell
Command interpreter, like python

Steps of a shell

1. Init env (env vars)


2. Display a prompt
3. Parse user input
4. Fork (so that it can exec later, the shell still wants to be run
5. Try to find the program
6. Manipulate the file descriptors (0 stdin, 1 stdout, 2 stderr, files)
7. Exec
8. Report errors, terminate child

Controlling running programs

All processes have a parent (except init, PID 1)


This is because of fork()
The parent process id
Forms a tree
The wait system call
A process needs to be waited on (or it will be a zombie process)
If a process doesn't wait on its children (read the exit status), it becomes a
zombie process, so it will still be in the process table.

Reaping a zombie process

If a process is zombie, the only way to "reap" it, or remove it from the process table,
is to kill the parent.

Signals

A user wants to notify a program of something


The user or another program can send a signal

Signals are a limited form of IPC (Inter-Process Communication)

Asynchronous
Predefined
Use SIGUSR1 and SIGUSR2 to define custom processes (can't adda ny data)
An OS artifact
From POSIX

Signal Handling

OS interrupts the signalled process and calls the handler function


Processes are always listening
Signal hadlers are defined by the process
Except (SIGKILL, SIGSTOP, a few others)
They can be:
Regular functions
The C runtime
specified with sigaction()
To register signal handler functions
Concurrency
Can be invoked any time
Want to do as little as possible

Pipelines and Redirection


Both provided by shell

Pipe

Uniderication
Ouput from 1 | Input to another
| = stdout -> stdin
|& = stout+stderr -> stdin

Redirection

Redirecting stdin, out and err to different file descriptors

Why does this work?

seperation of exec and fork()


If they weren't separated, then you couldnt change the execution context.

File abstractions and file systems


File:

Way to organize data


File System

Way to organize files

User perspective

Identifier: filename
Path+filename
Read and written
Meaning of a file name is just a name

Pathnames

Hierarchical
can be relative
CWD per process

Operations on Files

POSIX
Create()
path,
Permission bits
Open
A file descriptor needed for operations (open is used to get it)
Read Write Close
Seek
Something out of band
ioctl

File Systems

Types are determined by purpose


Not just an arbitary choice
Special File Systems
Devfs, udev
configfs
sysfs
procfs
tmpfs
Mounting File Systems

Connected to the uniform file system tree


Mount-point (existing directory)
Pathnames become relative
Flexibility

File Systems and Storage Management


Permission bits
above

Memory vs Storage
Storage is persistent
Ram is volatile, higher speed, low capacity

Why cant programs run on hard drive?

Not understood by Architecture


It is IO

Drivers

Piece of kernel code that supports io/device


Device, file system, network kernel
User space can't talk to the outside
Device Drivers

Physical Devices
CD, USB, HDD,
Exposing
Not readable semantics

Generic Block Layer

Unifies everything

Generic Block Layer

Key to persistence
leads to file systems and raw
Always accessed in the size of access unit (blocks)

File System

We dont work with blocks, we work with files


File System Driver exposes the new semantics
Consumes blocks from lower levels.

Raw

Can also get raw access to blocks, (dev/sda/whatever)

Application

always refers to file descriptor, not file

Device Driver

Actual Media'
Has it's own drivers
Maybe even many layers
Hidden by the operating system
Block size can differ from the systems
File systems can reside on any block devices.
Some cases they reside on ram
Performance is important too
Write sys call doesnt automatically cause a write to the block
It writes to ram until a threshold, then writes to the device.
File System Layer

Abstraction from blocks to files


For the time being, no ideal FS abstraction enables full portability
Tight coupling with the OS kernel, file access based control.

Types of Files

Regular file (seeing content)


Content
Directory
Folder
Symlink
Shortcut/link
FIFO (pipe)
|
Socket
Device file (block, character)

File Descriptors
Non negative integer that points to a data structure in the kernel
Stdin, out and err are special file descriptors
You can check the file descriptor of any process in:
/proc/self/fd
Self refers to the calling process

Tracing Down File Access


File Descriptor Table

There is an FD table per process


Lists fds usable in the process
Maps to file pointers

Open File Table

System-wide table that maps file pointers to inodes

Inode table

System wide table that stores metadata


There is an in memory copy of this table.
What is an inode?

POSIX (VFS) concept


I node is basically the file
Stores meta data
Identified with an inode number (unique to each file system)

Inode Types

Directory
Regular file
Char device
Block Device
(named) pipe
symbolic link
socket

What is stored in an Inode?

A bunch of metadata
Block is a set of pointers

Where are file names stored?

Dentry

Directory Entry

Also known as a dentry

Syscall: getdents()

lib call: readdir

What does a directory store?

A dentry
Filename -> inode
File -> Dentry -> Inode -> Data

Root Directory

Root directory inode is always 2


System wide root per file system
When I mount to /mnt, it becomes its own file
Mounting points to another FS with its own root.

Hard Links and Sym Links


Sym Link

Link to a file/pathname
Stores the filename in the datablocks
If target is deleted, then cant use the file
If sym link is deleted that's okay.
Weak binding between symlink and file it points to

Hard Link

Technically no such thing (because of dentry)


Filename is associate to inode through a dentry
Nature of a hardlink is a dentry
Hard links to the same inode
Everything is the same about the linked files except the name

Why do directories have multiple links

The name, and then the . in the directory


Name meaning the dentry itself
. as in the file inside the file that links to itself.
Every (first) child also has a link to it in the format of ..
Every directory as at LEAST 2 hard links

Copy/move/remove
Copy

creates a new inode


For a hard link, it creates a new identical copy (other than name)
For a symbolic link -> it will be dereferenced (as in there will be a copy of the file
it points to) (it wont copy the link)
Move

On same file system, it is just relinked to the new pathname


On different file systems new inodes are created

Remove

Decreases link count if greater than one, and decreases that directory entry
Removes the inode as well if the link count = 1
In terms of directories, the . is deleted first, so it will be one
Then it will be deleted.

How do we access devices?


Anything that exists as a device is a special file
Mostly /dev/*
Kernel-mode code behind each special file

What are special files?

They are not files on a special file system

What is a special file system?

A file system that lives on ram


/proc

So what are they?

Files that do special things:


Device drivers
Represent a device
Can live on a regular file system /dev

Emulated vs Virtual terminal

Device access shows the difference between virtual and emulated terminal
Need a pseudo terminal to talk the OS

Device Files
Represent physical or virtual hardware devices
File system interface between devices drivers and user space applications
Identification

Major Number
Type
Minor Number
Distinguisher

/proc/devices

Names provided by driver


But type is fixed

Character Devices

Accessed granularity of bytes


Not addressable (it is a stream)
Terminal is a char device
USB ports are char devices

Block Devices

Accessed at blocks
Addressable
Storage device

Super Blocks
Filesystem meta data
Sort of like an inode for the filesystem
We need same metadata for file systems as we do for files
Block devices get further identification
There is a primary, and a backup super block.
What happens if it's corrupted

Mounting failure
Cant mount a filesystem
Data inconsistency
If using a backup, is may not be accessible because backups dont store
everything

Physical vs Logical Size


Logical size

The actual size of the data in the file

Physical Size

Amount of space that is allocated on the disk

Holes in a file

Sparse files

Where we have a large file with maybe a huge amount of 0s, these 0s are part of the
logical size
But the physical size is less because the OS determines that there is enough space
to put a file in there.
Log > Phys

External Fragmentation

Fragmentation outside the file size


- To fix this, allocate space in blocks
Phys > Log

Internal Fragmentation

Shows 4k bytes, but internally its too much


Wasting space inside the allocated space
Phys > Log

dd command
Experimenting with real devices is bad
Use a virtual version
command to copy and convrt data
uses an input and output file
Could be other special files

dd vs cp
They both copy, and cp works find for a granularity of files
dd has more control over data, you can seek, skip, use block size
like a file based pipe

File System Corruption


All types of persistent storage share the same risk

On Disk

Lifespan is long and damage is persistent

RAM

Data can be recreated

What ifs
Interruption while updating on disk structures

Crash inconsistency
Meta data is gone, super bad
Inodes are good with missing/inconsistent blocks
Random data on the block
Good blocks but missing inodes
Cant find data because
File becomes orphaned
We can find them again
Optimal outcome if it has to happen
Superblock is corrupted
Do a repair
Fetch backup copies

Lazy approach

fsck tool:
Check super blocks
Link count
Allocation
Bad blocks

Like a lost and found

Only cares about fs integrity, not so much about data

Better approach

Journalling

We can journal our updates and go back to the previous working one

Special file systems


Procfs

Process information

Sysfs

Bidirectional interaction between kernel, hardware, and associated drivers


exposes kobject structures to kernel code and files to user space

FUSE

Moving to user space


Why?
Portability
Convenience
Security
Stability

Another layer of abstraction

This is how special files are generated

FUSE is a framework

Allows creation of processes in userspace

exposes /dev/fuse

Code talks to file, implement operations

Network File Systems

SSHFS
A FUSE file system

Why?

Showing remote files as local files

Allows traffic through the same port

NFS

not a fuse file system


Need dedicated server

RANDOM QUESTIONS THAT YOU SHOULD PROBABLY BE


ABLE TO ANSWER
If a program is contained how does it use IO? How does it get outside resources?

Why does true parallelism not need task scheduling

How does task scheduling work with containment?

Is there a case where TGID is not same as PID?

How are segments done now?


Paging vs Swapping? What does this mean? What is the real difference?

Paging: Moving pages

Swapping: entire process

-z lazy? What does that do (runtime or load time)

Static vs Runtime vs Loadtime linking?

Where are environment variables stored when they're not for the whole environment?

Video terminal vs Virtual terminal?

Difference?

Are pseudo terminals considered virtual or abstractions?

With ssh daemon, is bash a child?

What is FUSE?

What is reaping zombie processes?

Set uid allows allows program to run on behalf a user? How? What does it do?

Fork(), wait in the parent process, Why fork here?

What is setuid, seteuid, setguid, setegid?

What is a group?

Ls proc what is shown?

Inter process communicaiton?

Is a process a virtualization or abstraction of the CPU?

| vs >>

Why do we need a file system?

What is a symbol is asm?

Why do I need to link math lib if using sqrt with lazy dynamic linking?

Is access unit same as block size on OS?


Difference of files and file systems?

Why do we need a file system?

What is a login?

Init->getty->login->bash?

What is /proc?

What are stat() lstat() fstat()?

Dentry: store path or name?

Do we always have to mount in the root?

Where is the denty for a file stored?

How does symlink store the data of the filename?

Is root a special file system? Or virtual file system? Or what?

Where are special files located?

What is pre-emptive multitasking?

What is a file descriptor?

What is contained on a superblock?

Internal vs external fragmentation?

When can logical size be greater than physical?


How?

Sparse files?

You might also like