We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 23
UNIT 5
v RISC Characteristics.
The main idea behind this is to simplify hardware by using an instruction set
composed of a few basic steps for loading, evaluating, and storing operations just
like a load command will load data, a store command will store the data.
Requires only one clock cycle, and each cycle contains three parameters: fetch,
decode and execute.
Examples of RISC processors are SUN's SPARC, PowerPC, Microchip PIC
processors, RISC-V.
RISC Architecture
Itis a highly customized set of instructions used in portable devices due to system
reliability such as Apple iPod, mobiles/smartphones,
Pmsintemory |
RISC Architecture
Features of RISC Processor
1. One cycle execution time: For executing each instruction in a computer, the
RISC processors require one CPI (Clock per cycle). And each CPI includes the
fetch, decode and execute method applied in computer instruction.2. Pipelining technique: The pipelining technique is used in the RISC processors
to execute multiple parts or stages of instructions to perform more efficiently.
3. A large number of registers: RISC processors are optimized with multiple
registers that can be used to store instruction and quickly respond to the
computer and minimize interaction with computer memory.
4, It supports a simple addressing mode and fixed length of instruction for
executing the pipeline.
5. Ituses LOAD and STORE instruction to access the memory location.
Advantages of RISC Processor
1. The RISC processor's performance is better due to the simple and limited
number of the instruction set.
2. Itrequires several transistors that make it cheaper to design
3. RISC allows the instruction to use free space on a microprocessor because of its,
simplicity,
Disadvantages of RISC Processor
1. The RISC processor's performance may vary according to the code executed
because subsequent instructions may depend on the previous instruction for
their execution in a cycle.
2. Programmers and compilers often use complex instructions.
3. RISC processors require very fast memory to save various instructions that
require a large collection of cache memory to respond to the instruction in a
short time.
v CISC Characteristics
The CISC Stands for Complex Instruction Set Computer, developed by the Intel. It
has a large collection of complex instructions that range from simple to very complex
and specialized in the assembly language level, which takes a long time to execute
the instructions.
It emphasizes to build complex instructions directly in the hardware because the
hardware is always faster than software. However, CISC chips are relatively sloweras compared to RISC chips but use little instruction than RISC.
Examples of CISC processors are VAX, AMD, Intel x86 and the System/360.
Characteristics of CISC Processor
Following are the main characteristics of the CISC processor:
1. The length of the code is shorts, so it requires very little RAM.
2. CISC or complex instructions may take longer than a single clock cycle to
execute the code,
3. Less instruction is needed to write an application.
4, It provides easier programming in assembly language.
5. Support for complex data structure and easy compilation of high-level
languages.
6. Itis composed of fewer registers and more addressing nodes, typically 5 to 20.
7. Instructions can be larger than a single word,
CISC Processors Architecture
The CISC architecture helps reduce program code by embedding multiple
operations on each program instruction, which makes the CISC processor more
complex.
Control Unit — 'nstruction and
ory
Cea
trol Unit
|
CISC Architecture
Pree)Advantages of CISC Processors
1. The compiler requires little effort to translate high-level programs or statement
languages into assembly or machine language in CISC processors.
2. The code length is quite short, which minimizes the memory requirement.
3. To store the instruction on each CISC, it requires very less RAM.
4. Execution of a single instruction requires several low-level tasks.
5. CISC creates a process to manage power usage that adjusts clock speed and
voltage.
6. It uses fewer instructions set to perform the same instruction as the RISC.
advantages of CISC Processors
1. CISC chips are slower than RSIC chips to execute per instruction cycle on each
program
2. The performance of the machine decreases due to the slowness of the clock
speed
3. Executing the pipeline in the CISC processor makes it complicated to use.
4. The CISC chips require more transistors as compared to RISC design
5. In CISC it uses only 20% of existing instructions in a programming event.
v Parallel Processing
A parallel processing system can carry out simultaneous data-processing to achieve
faster execution time. For instance, while an instruction is being processed in the
ALU component of the CPU, the next instruction can be read from memory.
The primary purpose of parallel processing is to enhance the computer processing
capability and increase its throughput, i.e, the amount of processing that can be
accomplished during a given interval of time.
A parallel processing system can be achieved by having a multiplicity of functional
units that perform identical or different operations simultaneously, The data can be
distributed among various multiple functional units.ol Aas Sttractor
>{_ tnger muti
eal ut
— Ingen
‘Tomemory <—o "ales
Fatng point
"stat
Fratng point
mult
Fratng pant
aie
+ The adder and integer multiplier performs the arithmetic operation with integer
numbers.
+ The floating-point operations are separated into three circuits operating in
parallel.
* The logic, shift, and increment operations can be performed concurrently on
different data. All units are independent of each other, so one number can be
shifted while another number is being incremented.
v Pipelining
Pipelining is a computer processor design technique used to enhance instruction
throughput by allowing multiple instructions to be in various stages of execution
simultaneously. It divides the processing of instructions into a series of discrete
stages, each handled by a different segment of the processor's circuitry.
The registers provide isolation between each segment so that each can operate on
distinct data simultaneously.
The structure of a pipeline organization can be represented simply by including an
input register for each segment followed by a combinational circuit.Pipeline Processing:
A,
Registers R1, R2, R3, and R4 hold the data and the combinational circuits operate in
a particular segment.
The output generated by the combinational circuit in a given segment is applied as
an input register of the next segment. For instance, from the block diagram, we can
see that the register R3 is used as one of the input registers for the combinational
adder circuit
Arithmetic Pipeline
Arithmetic Pipelines are mostly used in high-speed computers. They are used to
implement floating-point operations, multiplication of fixed-point numbers, and similar
computations encountered in scientific problems
To understand the concepts of arithmetic pipeline in a more convenient way, let us
consider an example of a pipeline unit for floating-point addition and subtraction.
The inputs to the floating-point adder pipeline are two normalized floating-point
binary numbers defined as:
=A‘ 2a = 0.9504 * 20 3
ate
b = 0.8209 * 19 2
Where A and B are two fractions that represent the mantissa and a and b are the
exponents.The combined operation of floating-point addition and subtraction is divided into four
segments, Each segment contains the corresponding suboperation to be performed
in the given pipeline. The suboperations that are shown in the four segments are:
1. Compare the exponents by subtraction.
2. Align the mantissas.
3, Add or subtract the mantissas.
4. Normalize the result.
Note: Registers are placed after each suboperation to store the intermediate
results.Pipeline organization for floating point addition and subtraction:
Compare exponents
segment: | Of on
ements: { Choose exponent +n mans
+
segment: | Adjust exponent
(oe
1, Compare exponents by subtraction:
The exponents are compared by subtracting them to determine their difference. The
larger exponent is chosen as the exponent of the result.
The difference of the exponents, i.e., 3 - 2 = 1 determines how many times the
mantissa associated with the smaller exponent must be shifted to the right.
2. Align the mantissas:
The mantissa associated with the smaller exponent is shifted according to the
difference of exponents determined in segment one.X= 0.9504 * 10 3
0.08200 * 10
3. Add mantissas:
The two mantissas are added in segment three.
z
= 1.0324 * 40 3
4. Normalize the result
After normalization, the result is written as:
z
Instruction Pipeline
Pipeline processing can occur not only in the data stream but in the instruction
stream as well
0.1324 * 40 4
Most of the digital computers with complex instructions require instruction pipeline to
carry out operations like fetch, decode and execute instructions.
The organization of an instruction pipeline will be more efficient if the instruction
cycle is divided into segments of equal duration. One of the most common examples
of this type of organization is a Four-segment instruction pipeline.
A four-segment instruction pipeline combines two or more different segments and
makes it as a single one. For instance, the decoding of the instruction can be
combined with the calculation of the effective address into one segment.ewes
Segment 1:
The instruction fetch segment can be implemented using first in, first out (FIFO)
butfer.
Segment 2:
The instruction fetched from memory is decoded in the second segment, and
eventually, the effective address is calculated in a separate arithmetic circuit
Segment 3:
An operand from memory is fetched in the third segment.
Segment 4:
The instructions are finally executed in the last segment of the pipeline organization
Vv RISC Pipeline
RISC stands for Reduced Instruction Set Computers. It was introduced to execute
as fast as one instruction per clock cycle. This RISC pipeline helps to simplify the
computer architecture's design.
unis. 10The main benefit of RISC to implement instructions at the cost of one per clock cycle
is continually not applicable because each instruction cannot be fetched from
memory and implemented in one clock cycle correctly under all circumstances.
Principles of RISCs Pipeline
There are various principles of RISCs pipeline which are as follows -
+ Keep the most frequently accessed operands in CPU registers.
+ It can minimize the register-to-memory operations.
It can use a high number of registers to enhance operand referencing and
decrease the processor memory traffic.
+ It can optimize the design of instruction pipelines such that minimum compiler
code generation can be achieved.
+ Itcan use a simplified instruction set and leave out those complex and
unnecessary instructions.
Let us consider a three-segment instruction pipeline that shows how a compiler can
optimize the machine language program to compensate for pipeline conflicts.
A frequent collection of instructions for a RISC processor is of three types are as
follows —
+ Data Manipulation Instructions ~ Manage the data in processor registers.
+ Data Transfer Instructions — These are load and store instructions that use an
effective address that is obtained by adding the contents of two registers or a
register and a displacement constant provided in the instruction.
+ Program Control Instructions - These instructions use register values and a
constant to evaluate the branch address, which is transferred to a register or the
program counter (PC).
Vv Vector Processing
Vector processing is a central processing unit that can perform the complete
vector input in individual instruction. It is a complete unit of hardware resources that
implements a sequential set of similar data elements in the memory using individual
instruction,It is a parallel computing
Main memory
Vector
concroller
Vector!
processor
Functional Diagram of Vector Computer
Here are some key characteristics related to vector processing:
1. Vector Instructions:
* Vector processors use specialized vector instructions that perform the same
operation on multiple data elements simultaneously. These instructions are
designed to take advantage of the inherent parallelism in vectorized data.
2. Vector Registers:
+ Vector processors have dedicated vector registers that store the vector
operands. These registers can hold multiple data elements, and vector
instructions operate on these registers in a single instruction.
3. SIMD (Single Instruction, Multiple Data):
+ Vector processing follows the SIMD paradigm, where a single instruction is
executed on multiple data elements at the same time. This contrasts with
traditional scalar processors, which operate on one data element per
instruction.
4, Vector Length:+ The vector length determines the number of elements that can be processed
in parallel by a single vector instruction, Longer vectors result in higher
throughput but may require more resources.
5, Parallelism and Throughput:
+ Vector processing is designed to exploit parallelism in data processing. By
performing the same operation on multiple data elements simultaneously,
vector processors can achieve high throughput for certain types of
computations, such as numerical simulations, scientific computing, and
multimedia processing
v Array Processor
Aprocessor that performs computations on a vast array of data is known as an array
processor. Multiprocessors and vector processors are other terms for array
processors. It only executes one instruction at a time on an array of data.
ary
a —
oe a
Attached Array SIMD Array Processor
Processor
Attached Array Processor
The attached array processor is the auxiliary processor connected to a general-
purpose computer to enhance and improve the machine's performance in numerical
computational tasks.
The attached array processor includes a common processor with an input/output
interface and a local memory interface.
The main memory and the local memory are linked.General Purpose
Computer |
Attached Array
| Processor
eitarrtnenaroteny bs
‘Main memory Local memory
Fig- The interconnection of Attached Array Processor to Host computer
SIMD refers to the organization of a single computer with multiple parallel
processors. The processing units are designed to work together under the
supervision of a single control unit, resulting in a single instruction stream and
multiple data streams.
An array processor's general block diagram is given below. It comprises several
identical processing elements (PEs), each with its local memory M. An ALU and
registers are included in each processor element. The master control unit controls
the processing elements’ actions. It also decodes instructions and determines how
they should be carried out.
The program is stored in the main memory. The control unit retrieves the
instructions, Vector instructions are sent to all PEs simultaneously, and the results
are stored in memory.
Usage of Array Processors
* Array processors enhance the total speed of instruction processing.
+ Most array processors’ design optimizes its performance for repetitive arithmetic
operations, making it faster at vector arithmetic than the host CPU. Since most
Array processors run asynchronously from the host CPU, the system’s overall
capacity is thus improved.
* Array Processors have their own local memory, providing additional extra,
memory to systems with limited memory. This is an essential consideration for
the systems with a limited physical memory or address space.Applications of Array Processors
Array processing is used at various places, including:-
+ Radar Systems
* Sonar Systems
* Anti-jamming
* Seismic Exploration
* Wireless communication
+ Medical applications
v Characteristics of Multiprocessors
‘A Multiprocessor is a computer system with two or more central processing units
(CPUs) share full access to a common RAM. The main objective of using a
multiprocessor is to boost the system’s execution speed
There are two types of multiprocessors, one is called shared memory multiprocessor
and another is distributed memory multiprocessor. In shared memory
multiprocessors, all the CPUs shares the common memory but in a distributed
memory multiprocessor, every CPU has its own private memory.
There are the major characteristics of multiprocessors are as follows —
+ Parallel Computing ~ This involves the simultaneous application of multiple
processors. These processors are developed using a single architecture to
execute a common task. In general, processors are identical and they work
together in such a way that the users are under the impression that they are the
only users of the system. In reality, however, many users are accessing the
system at a given time.
* Distributed Computing - This involves the usage of a network of processors.
Each processor in this network can be considered as a computer in its own right
and have the capability to solve a problem. These processors are
heterogeneous, and generally, one task is allocated to a single processor.
+ Supercomputing - This involves the usage of the fastest machines to resolve
big and computationally complex problems. In the past, supercomputingmachines were vector computers but at present, vector or parallel computing is
accepted by most people.
+ Pipelining - This is a method wherein a specific task is divided into several
subtasks that must be performed in a sequence. The functional units help in
performing each subtask. The units are attached serially and all the units work
simultaneously.
+ Vector Computing - It involves the usage of vector processors, wherein
operations such as ‘multiplication’ are divided into many steps and are then
applied to a stream of operands (‘vectors’).
+ Systolic - This is similar to pipelining, but units are not arranged in a linear
order. The steps in systolic are normally small and more in number and
performed in a lockstep manner. This is more frequently applied in special-
purpose hardware such as image or signal processors.
¥ Interconnection Structures
Interconnection structures :
The processors must be able to share a set of main memory modules & /O devices
in a multiprocessor system. This sharing capability can be provided through
interconnection structures. The interconnection structure that are commonly used
can be given as follows —
1, Time-shared / Common Bus
2. Cross bar Switch
3. Multiport Memory
4. Multistage Switching Network
Time-shared / Common Bus (Interconnection structure in Multiprocessor System)
In a multiprocessor system, the time shared bus interconnection provides a common
communication path connecting all the functional units like processor, /O processor,
memory unit etc. The figure below shows the multiple processors with common
communication path (single bus)Common Bus _
°
= =
Single-Bus Multiprocessor Organization
To communicate with any functional unit, processor needs the bus to transfer the
data. To do so, the processor first need to see that whether the bus is available / not
by checking the status of the bus. If the bus is used by some other functional unit,
the status is busy, else free.
A processor can use bus only when the bus is free. The sender processor puts the
address of the destination on the bus & the destination unit identifies it.
In order to communicate with any functional unit, a command is issued to tell that
unit, what work is to be done. The other processors at that time will be either busy in
internal operations or will sit free, waiting to get bus.
We can use a bus controller to resolve conflicts, if any. (Bus controller can set
priority of different functional units)
This Single-Bus Multiprocessor Organization is easiest to reconfigure & is simple.
This interconnection structure contains only passive elements. The bus interfaces of
sender & receiver units controls the transfer operation here.
Advantages —
* Inexpensive as no extra hardware is required such as switch
+ Simple & easy to configure as the functional units are directly connected to the
bus
Disadvantages -
+ Major fight with this kind of configuration is that if malfunctioning occurs in any of
the bus interface circuits, complete system wil fail+ Decreased throughput — At a time, only one processor can communicate with
any other functional unit.
v Interprocessor arbitration
There is something known as a system bus that connects CPU, Input-Output
processor, and memory (main components of a computer). In some cases, there
exists a dispute in the system bus such that only one of the mentioned components
can access it, The mechanism to solve this dispute is known as Interprocessor
Arbitration.
Computer systems need buses to facilitate the transfer of information between their
various components. There is a dispute in the system bus that connects the CPU,
Input-Output processor, and memory (main components of a computer). Only one
between the CPU, Input-Output processor, and memory gets the grant to use the
bus simultaneously. Hence, an appropriate priority resolving mechanism is required
to decide which processor should get control of the bus. Therefore, a mechanism is
needed to handle multiple requests for the bus, known as Interprocessor Arbitration.
Static Arbitration Techniques
In this technique, the priority assigned is fixed. It has two types. They are serial
arbitration procedures and parallel arbitration logic.
Serial Arbitration
Highest Lowest
priority priority
To next
Bus Bus Bus pian
pL PO +} Pl POR Pl PO
arbiter | arbiter 3 arbiter 4
Serial (daisy-chain) arbitration
Itis also known as Daisy chain Arbitration. It is obtained by the daisy-chain
connection of bus arbitration circuits. The scheme got the term from the structure of
the grant line, which chains through each device from the higher to lowest priority.The highest priority device will pass the grant line to the lower priority device only if it
does not want it, Then the priority is forwarded to the next in the sequence. All
devices use the same line for bus requests.{f a busy bus line returns to its idle state,
the most high-priority arbiter enables the busy line, and its corresponding processor
can then run the required bus transfer.
Advantage
i) Itis a simple design.
il) Less number control lines are used.
Disadvantage
i) Priority depends on the physical location of the device
il) Propagation delay due to serially granting of bus
ii) Failure of one of the devices may fail the entire system
iv) Cannot assure fairness- a low priority device may be locked out indefinitely
Parallel Arbitration
BPRN BREQ
Bus busy ine
Parallel arbitration
Ituses an external priority encoder and decoder, Each bus arbiter has a bus request
output line and a bus acknowledge input line. Each arbiter enables request lineswhen its processor is requesting the system bus. The one with the highest priority
determined by the output of the decoder gets access to the bus.
Dynamic Arbitration Techniques
Serial and Parallel bus arbitration are static since the priorities assigned are fixed. In
dynamic arbitration, priorities of the system change while the system is in operation,
The various algorithms used are:-
Time Slice
Polling
v Cache Coherence
Cache is hardware or software that is used to store something, usually data,
temporarily in a computing environment.
Acache coherence issue results from the concurrent operation of several
processors and the possibility that various caches may hold different versions of the
identical memory block.
The cache coherence problem is the issue that arises when several copies of
the same data are kept at various levels of memory.
Cache coherence has three different levels:
+ Each writing operation seems to happen instantly.
+ Each operand's value changes are seen in every processor in precisely the
same order.
+ Non-coherent behavior results from many processors interpreting the same
action in various ways.Cache Cache Cache Cache
Methods to resolve Cache Coherence
The two methods listed below can be used to resolve the cache coherence issue:
* Write Through
* Write Back
Write Through
The easiest and most popular method is to write through. Every memory write
operation updates the main memory. If the word is present in the cache memory at
the requested address, the cache memory is also updated simultaneously with the
main memory.
The benefit of this approach is that the RAM and cache always hold the same
information. In systems with direct memory access transfer, this quality is crucial. It
makes sure the information in the main memory is up-to-date at all times so that a
device interacting over DNA can access the most recent information.
Advantage - It provides the highest level of consistency.
Disadvantage - It requires a greater number of memory access.
Write BackOnly the catch location is changed during a write operation in this approach. When
the word is withdrawn from the cache, the place is flagged, so it is replicated in the
main memory. The right-back approach was developed because words may be
updated numerous times while they are in the cache. However, as long as they are
still there, it doesn't matter whether the copy that is stored in the main memory is
outdated because requests for words are fulfilled from the cache.
‘An accurate copy must only be transferred back to the main memory when the word
is separated from the cache. According to the analytical findings, between 10% and
30% of all memory references in a normal program are written into memory.
‘Advantage - A very small number of memory accesses and write operations.
ADVERTISEMENT
Disadvantage - Inconsistency may occur in this approach,
The important terms related to the data or information stored in the cache as well as
in the main memory are as follows:
+ Modified - The modified term signifies that the data stored in the cache and
main memory are different. This means the data in the cache has been modified,
and the changes need to be reflected in the main memory.
+ Exclusive - The exclusive term signifies that the data is clean, i.e., the cache
and the main memory hold identical data.
+ Shared - Shared refers to the fact that the cache value contains the most
current data copy, which is then shared across the whole cache as well as main
memory.
* Owned - The owned term indicates that the block is currently held by the cache
and that it has acquired ownership of it, .e., complete privileges to that specific
block,
* Invalid - When a cache block is marked as invalid, it means that it needs to be
fetched from another cache or main memory.
Below is a list of the different Cache Coherence Protocols used in multiprocessor
systems
* MSI protocol (Modified, Shared, Invalid)
+ MOS protocol (Modified, Owned, Shared, Invalid)* MESI protocol (Modified, Exclusive, Shared, Invalid)
* MOES! protocol (Modified, Owned, Exclusive, Shared, Invalid)