CompArch CS
CompArch CS
at NAND and NOR gates are the most commonly used blocks to
Boolean algebra is a type of math used in computing to represent logical build circuits (Schaltkreise). Moreover, a NAND gate is called a ”complete
operations with true or false values, which is essential in order to model gate” because any logical function can be implemented using NAND gates
and design logic gates and circuits. It allows complex computations to be alone. This means that any logical circuit can be constructed using only
made with simple building blocks, such as AND, OR, and NOT gates. NAND gates, without the need for any other type of gate.
1.1 Boolean Algebra - Fundamentals 2 Basic Circuits and Memory
· TRUE and FALSE are replaced by 1 and 0 An adder is a digital circuit which performs addition of numbers.
· Propositions are replaced by variables - e.g R = ”The sky is blue” It is a fundamental building block not only used in ALUs, but
also in other parts of the CPU, where they are used to calculate
· Operators are replaced by symbols: addresses, table indices and similar operations. We will look at As can be seen from the truth table, the sum output (S) is the XOR of
types of adders: Half Adder, Full Adder, Latches (allow to store data) the three inputs, while the carry-out output (C-out) is the result of the
– OR: ”’”
2.1 Half Adder combination of the AND and OR gates of the three inputs.
– NOT: ”+” A half adder is a digital circuit used to perform addition of two
– AND: ”·” 1-bit binary numbers. It is called ”half” adder because it can
only add two digits and cannot take into account any carry from
Their precedences are: a previous addition.
As can be seen from the truth table, the sum output (S) is the XOR of A
and B, while the carry output (C) is the AND of A and B:
2.3 Latches
Let’s recap: The functions (gates) we know (OR gate, AND gate, NOT
Moreover, we have DeMorgan’s Rule, where we basically negate the inputs gate, XOR gate, NAND gate) are the building blocks for combinatorial
and change the operators: circuits. And all of them can be build using NAND or NOR gates.
6.4.2 Addition
This is the basis for most floating point representation schemes
where. A floating point addition such as 4.5 × 103 + 0.64 × 102 is not a simple
coefficient addition unless the exponents are the same. Hence we take the
· M is the coefficient (called significand or mantissa) smaller number and shift its coefficient such that it aligns with coefficient
· E is the exponent of the other number:
Division simply do quotient x divisor + remainder.
5.4.2 Signed · 10 (or 2 for binary) is the base
We will look at Two’s complement because of its widespread use.
Addition 6.1 Normalised Floating Point Numbers
Depending on the coefficient and exponent, the same floating point number
can have multiple forms:
· add the values and discard any carry-out-bit 6.4.3 Exponent Overflow and Underflow
· Exponent Overflow occurs when the result is too large, i.e. the
result’s exponent > max. exponent.
6.4 Arithmetic
6.4.1 Multiplication
· To add these numbers we must make exponents the same → shift ex- 7.2.1 Instruction Format
ponent of smaller number so that it is the same as exponent of larger In order to be able to compute in our toy architecture environment, we
number. need the following instruction format:
· Note: we must restore the hidden bit when carrying out floating point
operations
where
· OPCODE: selects instruction for CPU (e.g. LOAD). We have 16
instruction so need 4 bit Op codes. Assume that we allocate the following things to memory:
· REG: The first operand for instruction, which is a register. We have · Variable A to Memory[100H]
for GPRs so REG needs 2 bits. · Variable B to Memory[101H]
· ADDRESS: The second operand for instruction, which is a memory · Variable C to Memory[102H]
address. We have 1024 words so we need 10 bits to address them.
· Literal 0 to Memory[200H]
· Variable sum to R1 8.1.1 Instruction Pointer Register (IPR) 8.5 Data Declaration Directives
· Variable n to R2 In the Pentium architecture we have a IPR called eip, which holds the We can use data declaration directives to declare global variables, i.e. data
address of the next instruction to be executed. Thus, the eip which is mapped to fixed memory locations and can be accessed using the
· 1st instruction at 080H (i.e. this is where program starts) register corresponds to the program counter register in other name of the variable:
Then the program in Assembly could look like this: architectures.
8.6 Operands
We have the following operands:
· Register Operands: refer to values stored in the CPU’s internal
registers.
8.3 Main Memory
The RAM is byte addressable and uses little endian to organize the data. – e.g eax, dx
– Register operands are the fastest to access since they are located
directly within the CPU.
Each of those 32-bit registers features a 16-bit register subset:
· Immediate Operands: are constant values encoded directly within
the instruction itself.
– e.g. 22
– Immediate operands do not need to be fetched from registers or
memory but may require decoding or processing by the CPU, mak-
If we want to multi-byte read or writes, then we start at the ing their access speed intermediate between register and memory
address of the first byte continue as long as needed (we need to operands.
know the length though). But always keep in mind the RAM is
a little endian system. I.e. if we want to read/write two bytes · Memory Operands: refer to values stored in the main memory.
of data, the LSByte comes first.
8.4 Instruction Format – Accessing memory operands involves reading from or writing to mem-
Most Pentium instructions have either 2, 1 or 0 operands and take one of ory locations, which is slower than accessing register operands due
E.g. the register ax is a 16-bit subset of the eax register.The to the higher latency of memory access.
advantage of just using the ax register (16 bit) instead of the eax (32 bit) the forms:
register is because if an operation only requires a 16-bit operand then – Specifiy an address using expressions of the form:
there’s no need to use the full 32 bits. Thus, we safe time because the [BaseReg + Scale*IndexReg + Displacement]
longer the operand, the longer it will take to complete an operation. – [24], [bp], [esi+2], [bp+8*di+16]
Some of the 16-bit register can even be broken down further into – example instruction (base): mov ax, [22]
one highest significant byte register and one least significant byte register: [ ] means take the value stored at address 22
8.7.4 Expressions
Assume we have
– example instruction (scale*index + displacement): int alpha=7, beta=4, gamma=-3 //global variables
mov ax, [2*ecx+4] and we want to execute to expression
The [ ] indicate that the value at the memory location alpha = (alpha*beta + 5*alpha) * (alpha-beta)
pointed to by the expression ”2 times the value of ECX First we have to declare the variables with their length and value
register plus 4” will be used as the source operand.
alpha dw 7 //i.e. define word storing value 7 at memory address alpha
beta dw 4
gamme dw -3
Now to execute the expression our program does the following steps
mov ax, [alpha]
imul ax, [beta]
mov bx, 5
imul bx, [gamma]
add ax, bx
– example instruction (base + (scale*index) + displacement): mov bx, [ælpha]
mov eax, [ebx+4*edx+10] sub bx, [beta]
imul ax, bx
mov [alpha], ax
Let’s look how the drivers components are work using pseudocode:
Within each I/O controller we have certain I/O ports: · User Task:
copy 1st char from string to printer’s data port;
Pro: Simple to program, Con: CPU time wasted due to busy issue write request by writing 1 to printer’s control port;
waiting OS.sleep(self);
2. Interrupt Driven I/O: Initiate data transfer and then do something
else. The device will ”interrupt” the CPU when it needs its attention. · Interrupt Handler:
This way the CPU can do other tasks while the I/O operation is being IF not end-of-string{
· Data Port(s): used for passing data to/from CPU to I/O device performed. The CPU’s fetch-decode-excecution cycle now becomes: copy next char from string to data port;
W=1 // set W bit in control port
· Control Port(s): used to issue I/O commands (e.g. write char) and to to initiate transfer of next byte
check status of a device (e.g. busy or error). } ELSE {
The data and the control ports must be in the main memory. Pentium OS.resume(user_task)
provides two kinds of addressing to identify the memory address of the }
I/O ports:
return from interrupt
1. Separate I/O Address Space:
The printer’s I/O controller would look as follows
On detecting an interrupt, the CPU calls the device’s inter-
rupt handling procedure (interrupt handler).
Pro: Big improvement compared to programmed I/O, Con:
Interrupt processing time is expensive.
3. DMA I/O: Device will transfer data block directly to/from RAM and
then ”interrupt” the CPU after block it’s done. I.e. the device has
I/O ports have their own (very small) I/O address space and direct access to RAM and the CPU does not need to be involved.
the architecture provides special I/O instructions to access
them:
in ax, 20 ; copy 16-bits from I/O port 20 into register ax
out 35, al; copy 8-bits from register al to I/O port 35
· The control bus is used to signal if a data transfer is for I/O address
space or RAM address space.
· Pentium provides 64K 8-bit I/O ports numbered from 0 to 655535 The user task and interrupt handler are written in assembly as follows: Let
We have a new controller here: DMA I/O Controller: controlport, dataport be the relevant I/O ports of the printer, string
2. Memory-Mapped I/O: be a 8Kb string buffer and strptr be a pointer to a char in string. Then
· The CPU writes the start address of block, the number of bytes of
block and direction of transfer to DMA’s I/O ports. · User Task
· The DMA controller transfers block of data between device
and RAM without intervention of the CPU.
mov eax, string ; get address of string 4. finally, call the interrupt handler using the entry in the IDT
mov [strptr], eax; save pointer to 1st char
mov al, [eax]; get 1st char
jz skip ; skip if char is zero-byte Summary of Device Drivers
mov [dataport], al; else copy char to data port
or [controlport], 20H; and issue write request · Device Drivers control I/O devices by reading/writing to the I/O ports
call sleep; suspend yourself of the device.
skip: ret · I/O devices signal completion of I/O requests and errors by sending an
· Interrupt Handler Interrupt Vector Number to the CPU. This causes the CPU to call the
device driver’s interrupt handler.
sti ; re-enable interrupts · The interrupt handler is a routine that services the interrup, i.e. checks
push eax; save current registers onto stack for errors and copies data from/to the memory area it shares with the
inc [strptr]; advance to next char device driver’s user task.
mov eax, [strptr]; copy pointer to register
mov al, [eax]; and get char · The user task of the device driver run as a thread within the OS and
jz endofstr; skip if end of string interacts with the device (via I/O ports) and user tasks (via shared
memory) of other user-level processes.
mov [dataport], al ; copy char to data port
or [controlport], 20H; issue write request
jmp exit
endofstr: call OS.resume; ask OS to resume user task
Types of Interrupt
· I/O device generated Interrupts: I/O device send interrupt vector
number to the CPU via buses.
· CPU-generated Interrupts (Exceptions): e.g. attempt to execute ille-
gal opertion such as divide-by-zero. Vector numbers 0-18 are reserved
for exceptions.
· Software-generated Interrupts (System Calls): system calls are made
to the OS by user-level processes in order to request some OS service
(e.g. read/write from/to memory). System calls are also called TRAPS.