0% found this document useful (0 votes)
8 views

CompArch CS

The document provides an overview of Boolean algebra and its application in digital circuits, emphasizing the importance of NAND and NOR gates for constructing logic circuits. It covers fundamental concepts such as adders, latches, memory types, and the organization of memory in computers, including RAM and its addressing methods. Additionally, it discusses data representation for integers and floating-point numbers, including various arithmetic operations and standards like IEEE 754.

Uploaded by

Alexander Arzt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

CompArch CS

The document provides an overview of Boolean algebra and its application in digital circuits, emphasizing the importance of NAND and NOR gates for constructing logic circuits. It covers fundamental concepts such as adders, latches, memory types, and the organization of memory in computers, including RAM and its addressing methods. Additionally, it discusses data representation for integers and floating-point numbers, including various arithmetic operations and standards like IEEE 754.

Uploaded by

Alexander Arzt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

1 Boolean Algebra and Logic Note that NAND and NOR gates are the most commonly used

at NAND and NOR gates are the most commonly used blocks to
Boolean algebra is a type of math used in computing to represent logical build circuits (Schaltkreise). Moreover, a NAND gate is called a ”complete
operations with true or false values, which is essential in order to model gate” because any logical function can be implemented using NAND gates
and design logic gates and circuits. It allows complex computations to be alone. This means that any logical circuit can be constructed using only
made with simple building blocks, such as AND, OR, and NOT gates. NAND gates, without the need for any other type of gate.
1.1 Boolean Algebra - Fundamentals 2 Basic Circuits and Memory
· TRUE and FALSE are replaced by 1 and 0 An adder is a digital circuit which performs addition of numbers.
· Propositions are replaced by variables - e.g R = ”The sky is blue” It is a fundamental building block not only used in ALUs, but
also in other parts of the CPU, where they are used to calculate
· Operators are replaced by symbols: addresses, table indices and similar operations. We will look at As can be seen from the truth table, the sum output (S) is the XOR of
types of adders: Half Adder, Full Adder, Latches (allow to store data) the three inputs, while the carry-out output (C-out) is the result of the
– OR: ”’”
2.1 Half Adder combination of the AND and OR gates of the three inputs.
– NOT: ”+” A half adder is a digital circuit used to perform addition of two
– AND: ”·” 1-bit binary numbers. It is called ”half” adder because it can
only add two digits and cannot take into account any carry from
Their precedences are: a previous addition.

The half adder has


· two inputs A and B, representing the two 1-bit binary numbers
· and two outputs, namely the
– sum (S): result of the addition of A and B Full adders are commonly used in the design of arithmetic logic units
1.2 Boolean Algebra - Rules – and the carry (C): the carry that is generated when adding A and B (ALUs), microprocessors, and other digital circuits that require arithmetic
Note that A and B can be any Boolean Expression, then we have the operations.
following rules: The truth table of the half adder looks as follows:
2.2.1 Ripple Adders
If we chain full adders (called Ripple Adders) we can add more bits than
just two 1-bit numbers. The following graphic shows a four-bit adder,
which adds two 4-bit numbers (A & B) and produces the result (S) and
the carry output (C):

As can be seen from the truth table, the sum output (S) is the XOR of A
and B, while the carry output (C) is the AND of A and B:

2.3 Latches
Let’s recap: The functions (gates) we know (OR gate, AND gate, NOT
Moreover, we have DeMorgan’s Rule, where we basically negate the inputs gate, XOR gate, NAND gate) are the building blocks for combinatorial
and change the operators: circuits. And all of them can be build using NAND or NOR gates.

But what if we would like to store values? We can use a


feedback mechanism where the output values depend indirectly
Half adders are often used as building blocks to construct more complex on themselves, which is the idea behind latches.
arithmetic circuits, such as full adders.
2.2 Full Adder A latch has two stable states, often referred to as the set state (S) and
A full adder is a digital circuit that performs the arithmetic the reset state (R), and it can be triggered to switch between these
operation of adding two binary digits and a carry-in digit to states by a control signal.
produce a binary sum and a carry-out digit. It is an extension
of the half adder circuit, which can only add two binary digits
without considering a carry-in digit.
1.3 Boolean Functions - Schematic Representation
Boolean functions (gates) are represented with the following symbols: The full adder has
· three inputs A and B, and C-in (the two 1-bit binary numbers and the
carry bit)
· and two outputs, namely the
– sum (S): the result of adding the three inputs
– and the carry (C-out): the carry generated when adding the three
inputs. As we can see, on both sides we have an output (Q and Q’) that serves
directly as an input on the other side. That’s what creates the feedback
The truth table of the half adder looks as follows: mechanism.

An SR latch is an asynchronous circuit, which means that its output


changes as soon as its inputs change. Further, we can have get race
conditions between R and S.
2.4 Memory 3.2 Decoder 4.1 Addressing
A flip-flop improved version of an S/R latch. Instead of S and R being two A decoder is a multiple-input-multiple-output circuit, which converts coded In order to get the instructions and the data from RAM for the CPU, we
different inputs, it just uses one input (D) and its negation (D’) as input. inputs into coded outputs. Decoders are used in the ALU to select which need to know how and where to find it physically in RAM.
Moreover, Flip-Flops are synchronous circuits, i.e. they are designed for input operands to use for the operation. For example, in a 2-input ALU,
digital systems that need to synchronize the timing of their operations. the decoder can select either input A or input B as the operand for the The smallest addressable unit of memory in a computer is
The synchronization is achieved using a global clock as a second input operation, based on the control signals. 1 byte. I.e. 8 memory cells, each storing 1 bit using flip-flops.
to D. The clock input provides a way to ensure that the flip-flop only 3.3 The Arithmetic Logic Unit (ALU)
updates its output at specific times, determined by the timing of the clock The ALU is We can visualize RAM as one big block of R rows with W bits length:
signal. Without a clock input, the output of the flip-flop would be updated
continuously as the data input changes, which could lead to unpredictable · a digital circuit that performs arithmetic and logical operations
behavior and errors. · is made up of all the components that we have looked at so far (full
adder, decoder, logic gates)
· a fundamental building block of the CPU
With all the components we have looked at, we can now built a 1-bit ALU
(i.e. it can perform operations such as addition, subtraction, AND, OR,
and NOT on two single-bit inputs and produce a single-bit output):

· In a word addressable memory system, each row (called a word) has


Now, we can actually understand what memory is: Memory/RAM is an address. So we can access memory at the row level and manipulate
nothing else but a bunch of flip-flops. one row at a time.

We have two basic types of memory/RAM:


· In a byte addressable memory system, each byte in memory has a
· Static RAM (SRAM): primarily used in cache unique address, which means that data can be accessed and manipulated
· Dynamic RAM (DRAM): primarily used in main memory at the byte level.
Any type of memory holds binary values:
· Data
· CPU instructions
· memory addresses
In this ALU we have:
3 Chip Design
· The Logic Unit: Performs the logical operations using the AND, OR
3.1 (De-)Multiplexer and NOT gates.
A multiplexer (mux) is an electronic device used in digital circuits to re-
duce the number of wires needed to connect different components together. · Full Adder: Does the addition.
Instead of having separate wires for each input signal, a multiplexer allows · Decoder: based on a signal, the decoder decides which unit (full adder Moreover, when it comes to byte addressing, we distinguish between
multiple signals to be combined into a single output signal, using a smaller or logic) to activate. two formats:
number of wires. I.e. a mux can be thought of as a multiple-input-single-
output switch: We can chain multiple 1-bit ALUs together to create a multiple-bit ALU,
which can perform operations on binary numbers that are wider than a – Big Endian (stores the MSByte first):
single bit:

A demultiplexer (demux) is the opposite of a multiplexer. Instead of


selecting one input signal to send to the output, it takes a single input 4 Main Memory Organisation
signal and distributes it to one of several outputs. It allows a single signal When we talk about main memory we predominantly talk about RAM.
to be split into multiple output signals, using a smaller number of wires. – Little Endian (stores the LSByte first):
5.3.2 One’s Complement
· Negative numbers are the complement of the positive numbers
5.3.3 Twos’s Complement
· Negative of an integer is achieved by inverting each of the bits and
Now there are two different ways of organizing the memory addresses over adding 1 to it.
the modules in RAM:
5.3.4 Excess-n
· Higher Order Interleave: · we add whatever we need to each number to make the smallest number
equal to 0. Now all numbers are ≥ 0 and thus comparable
· Example: Assume we have an integer space of 3 bits. With 3 bits we
can represent 23 = 8 integers. Assuming we start at −4 then we can
Now if we store multibyte items than it would look like this: Assume represent −4, −3, . . . , 2, 3. The smallest value is −4 so we shift by 4:
we want to store the number ”5” at memory address 24 using 2 bytes,
then:

5.3.5 Binary Coded Decimal


consecutive memory addresses are in same module =⇒ in-
structions and data are in different modules =⇒ allows · Each decimal digit is represented by a fixed number of bits.
The word addresses would be the same but the byte ad- pre-fetching: next instruction can already be fetched from 5.3.6 Summary
dresses would be different! instruction module while current instruction is being fetched
from data module
4.2 Memory Modules and Chips · Lower Order Interleave:
In reality, RAM consists of memory modules rather than one
monolithic block.

5.4 Integer Arithmetic


The reason we split it is because of parallel access: Having dif- 5.4.1 Unsigned
ferent modules allows us to access them in parallel. So let’s split Addition
the entire block in e.g. in two modules, each with 64M x 8 bit chips:
Can access the next word while current word is being accessed
(array elements) can be accessed in parallel

5 Data Representation & Arithmetic: Integers


5.1 Why binary Numbers?
Computers work with binary numbers because they are made of electronic
circuits that can represent and manipulate data using two electrical states
(on/off or high/low).
5.2 Hexadecimal vs. Binary
Hexadecimal is used by programmers in order to represent long binary
values: Subtraction

4.2.1 Memory Interleaving


However, now that we split RAM into modules, we also have to know in
which of modules the byte is that we want to address. Assume we have a
RAM consisting of 5.3 Representing Integers
· 4M words, each 32 bits wide. There are several possibilites to represent integers within a computer.
5.3.1 Sign & Magnitude
· we organize those in 4 x 1M x 32 bit modules
· Leftmost (”most significant”) bit represents the sign of the integer.
· our addresses are of length 22 bits: 2 bits to address module and 20
bits address word within module. · Remaining bits represent the integers magnitude.
· Problem: 2 representations for 0 → +0 and −0
Multiplication 6 Data Representation & Arithmetic: Floats
Recall scientifc notation:

6.4.2 Addition
This is the basis for most floating point representation schemes
where. A floating point addition such as 4.5 × 103 + 0.64 × 102 is not a simple
coefficient addition unless the exponents are the same. Hence we take the
· M is the coefficient (called significand or mantissa) smaller number and shift its coefficient such that it aligns with coefficient
· E is the exponent of the other number:
Division simply do quotient x divisor + remainder.
5.4.2 Signed · 10 (or 2 for binary) is the base
We will look at Two’s complement because of its widespread use.
Addition 6.1 Normalised Floating Point Numbers
Depending on the coefficient and exponent, the same floating point number
can have multiple forms:
· add the values and discard any carry-out-bit 6.4.3 Exponent Overflow and Underflow
· Exponent Overflow occurs when the result is too large, i.e. the
result’s exponent > max. exponent.

For hardware implementations, it is necessary that one floating point


number has only one unique representation. Hence why we represent
floating point numbers in normalized form:
Subtraction · Exponent Underflow occurs when the result is too small, i.e. the
result’s exponent < min. exponent.
· accomplished by negating the subtrahend and adding it to the minuend.
Any carry-out bit is discarded.

6.5 Comparing Floating Point Values


6.2 From Binary to Decimal
Since floating point numbers are not always exactly represented in a com-
puter (due to the fact that we try to model decimal numbers with a finite
number of binary digits), we can’t just simply compare them with one
another like we do with integers. Instead, we have to compare them using
factoring in some margin of error:
Overflow can occur if 2 Two’s Complement numbers are added/sub-
tracted but the result has the opposite sign, i.e. we have overflown the i f abs ( a − b ) < e p s i l o n :
range. Example: print ( ” a and b a r e c o n s i d e r e d e q u a l ” )
6.3 From Decimal to Binary else :
print ( ” a and b a r e n o t e q u a l ” )

6.6 IEEE Standard


In a computer, floating-point numbers are represented using a standard
format called the IEEE 754 floating-point standard. This standard defines
how floating-point numbers are stored, operated on, and rules for error
conditions.
6.6.1 Single Precision Format (32 bit)

6.4 Arithmetic
6.4.1 Multiplication

· The value represented is 1.F × 2E−127


5.5 Representing Characters · Important: The 8-bit exponent uses excess 127 notation, i.e. the
· Characters are mapped to integer values exponent is represented by a number greater than 127. Reason: We
· Common mapping standards are ASCII and Unicode can do an integer comparison to determine if one floating point number
is larger than another, given both have the same sign.
Let’s look at the ASCII character set and how the string ”Fred” in encoded: · the normal bit (the 1.) is ommited from the significand field, i.e. we
have a hidden bit.
· single precision yields 24 bits (first bit of the significand is always
assumed to be 1, so it does not need to be stored explicitly)
Note: For many computations, the result of the floating point operation
is too large to store the coefficient. So hence why we round:
6.6.2 Double Precision Format (64 bit) 7 CPU Organisation & Operation 7.2.2 CPU Organisation
7.1 Introduction
The CPU is a chip which executes the instructions of a program. It
receives the instructions from the computer’s RAM (where the program
is loaded), performs calculations on the data, and sends the results back
to the memory for storage or to other components for further processing.
Thus, the CPU is the ”brain” of any computer.
· The value represented is 1.F × 2E−1023 High-level languages, low-level languages, and machine code are dif-
· double precision yields 53 bits ferent ways to write instructions for a CPU. They differ as follows:
6.6.3 Conversion To IEEE Format · High-Level Languages: designed to be easily readable and understand-
What is 42.6875 in IEEE? able by humans (e.g Java, Python, C++)
· Low-Level Languages: The CPU cannot understand the high-level
instructions. So they need to be broken down to lower-level instruc-
tions (assembly language; e.g. Pentium or Java Bytecode). The
program that translates the high-level language to assembly is called · All computations in our toy architecture are carried out in the CPU.
the compiler. · ALU inside the CPU does the arithmetic computations and logical
· Machine Code: The final step is to convert the low-level instructions to operations.
binary code so the computer can understand them. A software called · In order to access the data, the CPU has to tell which memory we want
assembler converts the low-level instructions (assembly) to machine to access.
code.
· The address bus specifies is to specify the memory address of the data
7.2 Toy 1 Architecture that the CPU needs to read from or write to.
Now we will look at a computer with a simple toy architecture (designed · The data bus is responsible for transmitting data between the CPU
for illustration purposes): and other components in a computer system.
6.6.4 Conversion From IEEE Format · The control bus specifies if CPU reads from or writes to RAM.
· RAM (word-addressable): 1024 × 16bit
· Integers are represented using Two’s Complement Now
· 4 general purpose registers (16-bit each): R0, R1, R2, R3 (GPR sare · Remember, we have a set of 16 instructions, each of length 16 bit.
small storage locations used to hold data that is currently being pro-
cessed) · The assembler now takes the instructions and puts them into
some memory locations in RAM in sequential order.
· 16 instructions e.g. LOAD, STORE, ADD, SUBTRACT etc.
· Now that the instructions are loaded in RAM (assume the data of the
Let’s look at some of the instructions in more detail: program as well), the CPU can fetch them, decode them and execute
them (known as the fetch-decode-execute cycle)
· The program counter inside the CPU is pointing to first instruction of
the program.
· With each executed instruction, the program counter increases by 1. I.e.
6.6.5 Addition with IEEE a program is executed in sequential manner (unless we have
Example: Carry out the addition of 42.6875+0.375 in IEEE single precision jump instructions),hence why it must be stored in memory
arithmetic. like this .
7.3 Toy1 Assembly Programming
Let’s assume we want to write a program which does multiplication for us.
The pseudocode would look:

· To add these numbers we must make exponents the same → shift ex- 7.2.1 Instruction Format
ponent of smaller number so that it is the same as exponent of larger In order to be able to compute in our toy architecture environment, we
number. need the following instruction format:
· Note: we must restore the hidden bit when carrying out floating point
operations

where
· OPCODE: selects instruction for CPU (e.g. LOAD). We have 16
instruction so need 4 bit Op codes. Assume that we allocate the following things to memory:
· REG: The first operand for instruction, which is a register. We have · Variable A to Memory[100H]
for GPRs so REG needs 2 bits. · Variable B to Memory[101H]
· ADDRESS: The second operand for instruction, which is a memory · Variable C to Memory[102H]
address. We have 1024 words so we need 10 bits to address them.
· Literal 0 to Memory[200H]
· Variable sum to R1 8.1.1 Instruction Pointer Register (IPR) 8.5 Data Declaration Directives
· Variable n to R2 In the Pentium architecture we have a IPR called eip, which holds the We can use data declaration directives to declare global variables, i.e. data
address of the next instruction to be executed. Thus, the eip which is mapped to fixed memory locations and can be accessed using the
· 1st instruction at 080H (i.e. this is where program starts) register corresponds to the program counter register in other name of the variable:
Then the program in Assembly could look like this: architectures.

8.1.2 Flag Register (eflags)


The Pentium Architecture also has flags register called eflags, which holds
information about the current state of the CPU.

For example, the data declaration directives above are:


Some of eflags bits are set/cleared after arithmetic instructions are exe- · db: declare byte
cuted, and these bits are used by conditional branch instructions: · dw: declare word (2 bytes)
· Zero Flag (Bit 6): Set 1 if result is zero, cleared (=0) otherwise. · dd: declare double word (4 bytes)
· Overflow Flag (Bit 11): Set 1 if result is too large/small. Set 0 other- Moreover we can declare constants with an eq directive:
wise.
..
8 Pentium Architecture .
Having looked at a toy architecture before, we are now going to look at a 8.2 Basic Data Types
real computer architecture called Pentium. and we can reserve memory for unitialised data:
8.1 Registers
Pentium has the following registers (all 32-bit):

8.6 Operands
We have the following operands:
· Register Operands: refer to values stored in the CPU’s internal
registers.
8.3 Main Memory
The RAM is byte addressable and uses little endian to organize the data. – e.g eax, dx
– Register operands are the fastest to access since they are located
directly within the CPU.
Each of those 32-bit registers features a 16-bit register subset:
· Immediate Operands: are constant values encoded directly within
the instruction itself.
– e.g. 22
– Immediate operands do not need to be fetched from registers or
memory but may require decoding or processing by the CPU, mak-
If we want to multi-byte read or writes, then we start at the ing their access speed intermediate between register and memory
address of the first byte continue as long as needed (we need to operands.
know the length though). But always keep in mind the RAM is
a little endian system. I.e. if we want to read/write two bytes · Memory Operands: refer to values stored in the main memory.
of data, the LSByte comes first.
8.4 Instruction Format – Accessing memory operands involves reading from or writing to mem-
Most Pentium instructions have either 2, 1 or 0 operands and take one of ory locations, which is slower than accessing register operands due
E.g. the register ax is a 16-bit subset of the eax register.The to the higher latency of memory access.
advantage of just using the ax register (16 bit) instead of the eax (32 bit) the forms:
register is because if an operation only requires a 16-bit operand then – Specifiy an address using expressions of the form:
there’s no need to use the full 32 bits. Thus, we safe time because the [BaseReg + Scale*IndexReg + Displacement]
longer the operand, the longer it will take to complete an operation. – [24], [bp], [esi+2], [bp+8*di+16]
Some of the 16-bit register can even be broken down further into – example instruction (base): mov ax, [22]
one highest significant byte register and one least significant byte register: [ ] means take the value stored at address 22

where label is an optional placeholder for a variable (i.e. a user-defined


identifier that will have a value that is a memory address of an instruction
or some data).
– example instruction (base + displacement): mov ax, [bx+4] 8.7.3 Integer Divide 8.7.8 Jump Instructions
· Operands must be registers or memory operands All jump instructions are of the form
”[bx+4]” is the source operand, which specifies the memory
location being accessed. The square brackets indicate that OPCODE location; //location is memory address to jump to
the value at the memory location pointed to by the bx e.g.
register plus an offset of 4 bytes will be used as the source
operand. cmp ax, bx ; Compare ax and bx
je equal_case ; If equal , jump to address "equal_case"
jmp not_equal_case ; Else jump to "not_equal_case"
8.7.9 If Statement
Let’s look at a simple if-else statement:

8.7.4 Expressions
Assume we have

– example instruction (scale*index + displacement): int alpha=7, beta=4, gamma=-3 //global variables
mov ax, [2*ecx+4] and we want to execute to expression
The [ ] indicate that the value at the memory location alpha = (alpha*beta + 5*alpha) * (alpha-beta)
pointed to by the expression ”2 times the value of ECX First we have to declare the variables with their length and value
register plus 4” will be used as the source operand.
alpha dw 7 //i.e. define word storing value 7 at memory address alpha
beta dw 4
gamme dw -3
Now to execute the expression our program does the following steps
mov ax, [alpha]
imul ax, [beta]
mov bx, 5
imul bx, [gamma]
add ax, bx
– example instruction (base + (scale*index) + displacement): mov bx, [ælpha]
mov eax, [ebx+4*edx+10] sub bx, [beta]
imul ax, bx
mov [alpha], ax

8.7.5 Integer Overflow


Arithmetic operations which result in an overflow set the overflow flag in And now a more sophisticated one:
the eflags register, which we can test:
add ah, bh // assume overflow can occur here
jo of_label // jump to of_label if of occured
...
8.7 Pentium Programming of_label: // of handler
8.7.1 Integer Addition & Subtraction 8.7.6 Integer Divide by Zero
· Operands can be byte, word, or doubleword sized We can guard against a division-by-zero exception by explicitly checking
the divisor before divison:
cmp bh, 0 // check if divisor in register bh == 0
je zero_div // if yes, jump to exception handler
....
zero_div: ... // divide-by-0 exception handler
8.7.7 Booleans
We use a full byte to represent a Boolean value with the following inter- 8.7.10 While-Loop
pretation
False = 0000 0000B, True = 1111 1111B
8.7.2 Integer Multiply For example the expression
· Operands can be word, or doubleword sized
okay = (man && rich) || (!man) // all variables are booleans
would be executed in assembly as follows
mov al, [man] // al = man
and al, [rich] // al = man && rich
mov ah, [man] // ah = man
not ah // ah = not man
or al, ah // al = (man && rich) || (! man)
mov [okay], al // okay = al
8.7.11 For-Loop pop wordop // wordop = mem[esp] and esp = esp+2 mov esp, ebp // restore stack pointer to that on entry,
i.e. free the stack of entire frame
pop ebp // restore caller’s frame pointer
· If we push or pop double word operators such as eax then the stack
grows and shrinks by 4 bytes, i.e.
push dwordop // esp=esp-4, i.e. mem addr.
in esp decreases by 4 and mem[esp]=dwordop Array & Object Variables
· For array and object we push the start address of the array/object onto
pop dwordop // dwordop = mem[esp] and esp = esp+4 stack rather than its value (i.e. they are passed by reference).
· I.e. within the function we access the array/object indirectly via its
address
Pentium stack in detail · The address of an array/object can be computed with the Load Effective
· The way things work sequentially are as follows: Address (lea) instruction which takes the general form

1. The calling function (caller): lea Register, [BaseReg+Scale*IndexReg+Displacement]


8.8 Function Calls
(a) push parameters for callee on stack in reversed order (i.e. last which performs the following assignement:
The following things are what we require for a function (call): parameter first and first parameter last). Register = BaseReg + Scale*Index + Displacement
· jump to the beginning of a function and on completion jump back to (b) calls the function using call instruction · Note that (lea) only computed the address and assigns it to the register.
the instruction that comes after the function call call func // = push eip + jump func , It does not access the memory location pointed to by the computed
· the ability to pass a result value of a function back to the calling i.e. push return address and jump to func address!
procedure
· the ability to pass parameters to a function 2. The called function (callee - which has control now): Saving & Restoring Registers
· the ability to allocate and access variable that are local to the function · Its is the callee’s responsibility to save/restore the registers of the caller
(a) sets up frame pointer (ebp)
· for object methods, we need to be able to access the attributes of the before/after execution
object (b) allocates local variables
· Moreover, by convention eax will be the register where the callee stores
· the ability to make recursive calls (c) save the current contents of the registers on the stack so that they the result in.
can be recovered after execution
8.8.1 Stack 8.9 Pentium I/O & Interrupt
(d) execute code inside body 8.9.1 Introduction
Functions are implemented using a stack.
(e) restore registers from stack Assume the user wrote typed in a string ”345” using some text editing
· A stack is data structure stored in a specific region in RAM. application. Now he wants to print this string to the printer. The ”journey”
It ensures that each function executes in the correct order (f) de-allocate local variables
of this printing operation looks as follows
and that its local variables and other data are kept separate (g) restore frame pointer (ebp)
from those of other functions. · The application makes a system call for a print job to the operating
(h) return result (by storing it in eax!) and control flow to caller system.
· There are 2 basic operations using return instruction:
· The OS sends it to the device driver of the printer (which is a software
– PUSH: When a function is called, its execution context, including ret // = pop return address in eip used by the OS to communicate with the printer. THe driver trans-
its local variables, parameters, and return address, is added to the lates the print job sent by the OS into a format that the printer can
top of the stack. understand).
3. Now the caller:
– POP: When the function completes execution, its context is removed · The device driver is split into user task (handles user-level requests for
from the stack, and control is returned to the calling function. (a) removes parameters the device) and an interrupt handler (excecutes in kernel mode and is
(b) copies or applies the result returned by the callee responsible for handling interrupts generated by the device).
· The stack operates in LIFO manner.
· The driver sends the print job to the printer’s controller, which is an
interface between the printer and its driver.
The Pentium provides Local Variables
· The printer’s controller reads the print job and sends signals to the
· a stack for the OS (”system” stack) to manage the execution of programs · as we said, the lifetime of local variables is limited to the execution of printer’s hardware components, such as the print head, to create the
the function they are declared in printed output.
· a stack pointer register esp which holds the address of the top of
the stack · we can allocate/deallocate local variables on the stack, but for opti-
misation purposes we use registers instead of the stack
· a base pointer register ebp to access data on the stack (e.g. local
· I.e. local variable allocated on the stack will be accessed indirectly via
variables of a function) the base pointer register (ebp). When used this way ebp is known as
· a push instruction (i.e. push some operand, i.e. some data to top of the frame pointer
the stack) · Unlike the stack pointer which can change during a function’s execution,
· a pop instruction (i.e. pop some operand, i.e. some data off the top of the frame pointer is ”anchored” for the execution of the function.
the stack)
Method Entry & Exit
Pentium stack - preliminaries: · Entry: setup frame pointer and allocate space for local variables
· On Pentium we grow the system stack downwards (i.e. higher push ebp // save caller’s frame pointer on stack
addresses to lower addresses). So pushing/popping a (double)word mov ebp, esp // set frame pointer for callee
onto/off stack would look like sub esp, nbytes // allocate nbytes for local variables
push wordop // esp=esp-2, i.e. mem addr.
in esp decreases by 2 and mem[esp]=wordop · Exit: de-allocate variables and restore frame pointer
8.9.2 Pentium I/O Structure · Upon completion, the DMA controller interrupts the CPU.
Since we transfer entire blocks now, we have less interrupts
and thus are more efficient!
4. I/O Processor: Delegate complex I/O processing tasks to a dedicated
processor.
· Sometimes even DMA I/O is in-adequate.
· A more powerful approach is to use one or more dedicated I/O
processors to relieve the CPU of I/O related tasks.
I/O ports ”appear” to the programmer as normal memory
locations. 8.9.5 Device Drivers
· Any instruction that accepts a memory operand can be used to Let’s recap our example from the introduction section where the user
manipulate an I/O port. E.g. wanted to print “345”.
1. We issue the writing operation - the receiving end of that is the OS
mov [80004444H], al ; move data from register al to I/O port at 8...
2. Inside the OS there is a piece of software - namely the printer device
driver.
8.9.3 I/O Controllers 8.9.4 I/O Schemes
3. Now, when we write “345” to the printer, the driver receives this
I/O Controllers provide the interface between the CPU and various I/O Pentium has 4 I/O schemes to manage input/output operations between command.
devices, such as hard drives, network cards, USB ports, and sound cards. the CPU and I/O devices: 4. The driver has 2 parts:
1. Programmed I/O: Continually “poll” a device’s control port until
the device is ready, then initiate data transfer. Let’s look at an example · User task: handles user requests
where we write a block of data to an I/O device: · Interrupt handler: handles the interrupts generated by the device
(i.e. in our example the printer generates an interrupt when
it printed a char).

Let’s look how the drivers components are work using pseudocode:

Within each I/O controller we have certain I/O ports: · User Task:
copy 1st char from string to printer’s data port;
Pro: Simple to program, Con: CPU time wasted due to busy issue write request by writing 1 to printer’s control port;
waiting OS.sleep(self);
2. Interrupt Driven I/O: Initiate data transfer and then do something
else. The device will ”interrupt” the CPU when it needs its attention. · Interrupt Handler:
This way the CPU can do other tasks while the I/O operation is being IF not end-of-string{
· Data Port(s): used for passing data to/from CPU to I/O device performed. The CPU’s fetch-decode-excecution cycle now becomes: copy next char from string to data port;
W=1 // set W bit in control port
· Control Port(s): used to issue I/O commands (e.g. write char) and to to initiate transfer of next byte
check status of a device (e.g. busy or error). } ELSE {
The data and the control ports must be in the main memory. Pentium OS.resume(user_task)
provides two kinds of addressing to identify the memory address of the }
I/O ports:
return from interrupt
1. Separate I/O Address Space:
The printer’s I/O controller would look as follows
On detecting an interrupt, the CPU calls the device’s inter-
rupt handling procedure (interrupt handler).
Pro: Big improvement compared to programmed I/O, Con:
Interrupt processing time is expensive.
3. DMA I/O: Device will transfer data block directly to/from RAM and
then ”interrupt” the CPU after block it’s done. I.e. the device has
I/O ports have their own (very small) I/O address space and direct access to RAM and the CPU does not need to be involved.
the architecture provides special I/O instructions to access
them:
in ax, 20 ; copy 16-bits from I/O port 20 into register ax
out 35, al; copy 8-bits from register al to I/O port 35

· The control bus is used to signal if a data transfer is for I/O address
space or RAM address space.
· Pentium provides 64K 8-bit I/O ports numbered from 0 to 655535 The user task and interrupt handler are written in assembly as follows: Let
We have a new controller here: DMA I/O Controller: controlport, dataport be the relevant I/O ports of the printer, string
2. Memory-Mapped I/O: be a 8Kb string buffer and strptr be a pointer to a char in string. Then
· The CPU writes the start address of block, the number of bytes of
block and direction of transfer to DMA’s I/O ports. · User Task
· The DMA controller transfers block of data between device
and RAM without intervention of the CPU.
mov eax, string ; get address of string 4. finally, call the interrupt handler using the entry in the IDT
mov [strptr], eax; save pointer to 1st char
mov al, [eax]; get 1st char
jz skip ; skip if char is zero-byte Summary of Device Drivers
mov [dataport], al; else copy char to data port
or [controlport], 20H; and issue write request · Device Drivers control I/O devices by reading/writing to the I/O ports
call sleep; suspend yourself of the device.
skip: ret · I/O devices signal completion of I/O requests and errors by sending an
· Interrupt Handler Interrupt Vector Number to the CPU. This causes the CPU to call the
device driver’s interrupt handler.
sti ; re-enable interrupts · The interrupt handler is a routine that services the interrup, i.e. checks
push eax; save current registers onto stack for errors and copies data from/to the memory area it shares with the
inc [strptr]; advance to next char device driver’s user task.
mov eax, [strptr]; copy pointer to register
mov al, [eax]; and get char · The user task of the device driver run as a thread within the OS and
jz endofstr; skip if end of string interacts with the device (via I/O ports) and user tasks (via shared
memory) of other user-level processes.
mov [dataport], al ; copy char to data port
or [controlport], 20H; issue write request
jmp exit
endofstr: call OS.resume; ask OS to resume user task

exit: pop eax; restore register


iret

Locating the Interrupt Handler


· There is a table in RAM called the interrupt descriptor table (IDT).
· When a device generates an interrupt, it sends a special signal to the
CPU along with integer ID.
· That integer ID (called interrupt vector number) is used as the index
of the IDT. The corresponding table entry holds the start address of
the interrupt handler.

Types of Interrupt
· I/O device generated Interrupts: I/O device send interrupt vector
number to the CPU via buses.
· CPU-generated Interrupts (Exceptions): e.g. attempt to execute ille-
gal opertion such as divide-by-zero. Vector numbers 0-18 are reserved
for exceptions.
· Software-generated Interrupts (System Calls): system calls are made
to the OS by user-level processes in order to request some OS service
(e.g. read/write from/to memory). System calls are also called TRAPS.

Calling the Interrupt Handler When an interrupt occurs the Pentium


CPU will
1. push the eflags register onto stack because it contains important in-
formation about the state of the CPU/system that needs to be saved.
Doing so ensures that the system can return to its previous state without
losing any important information.
2. clear the interrupt flag in eflags register to disable further interrupts,
preventing the system from becoming overwhelmed with interrupts.
3. push the return address (contents of eip register) onto stack.

You might also like