0% found this document useful (0 votes)
27 views23 pages

Unit 5

See for yourself

Uploaded by

shanedoll12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
27 views23 pages

Unit 5

See for yourself

Uploaded by

shanedoll12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 23
UNIT 5 v RISC Characteristics. The main idea behind this is to simplify hardware by using an instruction set composed of a few basic steps for loading, evaluating, and storing operations just like a load command will load data, a store command will store the data. Requires only one clock cycle, and each cycle contains three parameters: fetch, decode and execute. Examples of RISC processors are SUN's SPARC, PowerPC, Microchip PIC processors, RISC-V. RISC Architecture Itis a highly customized set of instructions used in portable devices due to system reliability such as Apple iPod, mobiles/smartphones, Pmsintemory | RISC Architecture Features of RISC Processor 1. One cycle execution time: For executing each instruction in a computer, the RISC processors require one CPI (Clock per cycle). And each CPI includes the fetch, decode and execute method applied in computer instruction. 2. Pipelining technique: The pipelining technique is used in the RISC processors to execute multiple parts or stages of instructions to perform more efficiently. 3. A large number of registers: RISC processors are optimized with multiple registers that can be used to store instruction and quickly respond to the computer and minimize interaction with computer memory. 4, It supports a simple addressing mode and fixed length of instruction for executing the pipeline. 5. Ituses LOAD and STORE instruction to access the memory location. Advantages of RISC Processor 1. The RISC processor's performance is better due to the simple and limited number of the instruction set. 2. Itrequires several transistors that make it cheaper to design 3. RISC allows the instruction to use free space on a microprocessor because of its, simplicity, Disadvantages of RISC Processor 1. The RISC processor's performance may vary according to the code executed because subsequent instructions may depend on the previous instruction for their execution in a cycle. 2. Programmers and compilers often use complex instructions. 3. RISC processors require very fast memory to save various instructions that require a large collection of cache memory to respond to the instruction in a short time. v CISC Characteristics The CISC Stands for Complex Instruction Set Computer, developed by the Intel. It has a large collection of complex instructions that range from simple to very complex and specialized in the assembly language level, which takes a long time to execute the instructions. It emphasizes to build complex instructions directly in the hardware because the hardware is always faster than software. However, CISC chips are relatively slower as compared to RISC chips but use little instruction than RISC. Examples of CISC processors are VAX, AMD, Intel x86 and the System/360. Characteristics of CISC Processor Following are the main characteristics of the CISC processor: 1. The length of the code is shorts, so it requires very little RAM. 2. CISC or complex instructions may take longer than a single clock cycle to execute the code, 3. Less instruction is needed to write an application. 4, It provides easier programming in assembly language. 5. Support for complex data structure and easy compilation of high-level languages. 6. Itis composed of fewer registers and more addressing nodes, typically 5 to 20. 7. Instructions can be larger than a single word, CISC Processors Architecture The CISC architecture helps reduce program code by embedding multiple operations on each program instruction, which makes the CISC processor more complex. Control Unit — 'nstruction and ory Cea trol Unit | CISC Architecture Pree) Advantages of CISC Processors 1. The compiler requires little effort to translate high-level programs or statement languages into assembly or machine language in CISC processors. 2. The code length is quite short, which minimizes the memory requirement. 3. To store the instruction on each CISC, it requires very less RAM. 4. Execution of a single instruction requires several low-level tasks. 5. CISC creates a process to manage power usage that adjusts clock speed and voltage. 6. It uses fewer instructions set to perform the same instruction as the RISC. advantages of CISC Processors 1. CISC chips are slower than RSIC chips to execute per instruction cycle on each program 2. The performance of the machine decreases due to the slowness of the clock speed 3. Executing the pipeline in the CISC processor makes it complicated to use. 4. The CISC chips require more transistors as compared to RISC design 5. In CISC it uses only 20% of existing instructions in a programming event. v Parallel Processing A parallel processing system can carry out simultaneous data-processing to achieve faster execution time. For instance, while an instruction is being processed in the ALU component of the CPU, the next instruction can be read from memory. The primary purpose of parallel processing is to enhance the computer processing capability and increase its throughput, i.e, the amount of processing that can be accomplished during a given interval of time. A parallel processing system can be achieved by having a multiplicity of functional units that perform identical or different operations simultaneously, The data can be distributed among various multiple functional units. ol Aas Sttractor >{_ tnger muti eal ut — Ingen ‘Tomemory <—o "ales Fatng point "stat Fratng point mult Fratng pant aie + The adder and integer multiplier performs the arithmetic operation with integer numbers. + The floating-point operations are separated into three circuits operating in parallel. * The logic, shift, and increment operations can be performed concurrently on different data. All units are independent of each other, so one number can be shifted while another number is being incremented. v Pipelining Pipelining is a computer processor design technique used to enhance instruction throughput by allowing multiple instructions to be in various stages of execution simultaneously. It divides the processing of instructions into a series of discrete stages, each handled by a different segment of the processor's circuitry. The registers provide isolation between each segment so that each can operate on distinct data simultaneously. The structure of a pipeline organization can be represented simply by including an input register for each segment followed by a combinational circuit. Pipeline Processing: A, Registers R1, R2, R3, and R4 hold the data and the combinational circuits operate in a particular segment. The output generated by the combinational circuit in a given segment is applied as an input register of the next segment. For instance, from the block diagram, we can see that the register R3 is used as one of the input registers for the combinational adder circuit Arithmetic Pipeline Arithmetic Pipelines are mostly used in high-speed computers. They are used to implement floating-point operations, multiplication of fixed-point numbers, and similar computations encountered in scientific problems To understand the concepts of arithmetic pipeline in a more convenient way, let us consider an example of a pipeline unit for floating-point addition and subtraction. The inputs to the floating-point adder pipeline are two normalized floating-point binary numbers defined as: =A‘ 2a = 0.9504 * 20 3 ate b = 0.8209 * 19 2 Where A and B are two fractions that represent the mantissa and a and b are the exponents. The combined operation of floating-point addition and subtraction is divided into four segments, Each segment contains the corresponding suboperation to be performed in the given pipeline. The suboperations that are shown in the four segments are: 1. Compare the exponents by subtraction. 2. Align the mantissas. 3, Add or subtract the mantissas. 4. Normalize the result. Note: Registers are placed after each suboperation to store the intermediate results. Pipeline organization for floating point addition and subtraction: Compare exponents segment: | Of on ements: { Choose exponent +n mans + segment: | Adjust exponent (oe 1, Compare exponents by subtraction: The exponents are compared by subtracting them to determine their difference. The larger exponent is chosen as the exponent of the result. The difference of the exponents, i.e., 3 - 2 = 1 determines how many times the mantissa associated with the smaller exponent must be shifted to the right. 2. Align the mantissas: The mantissa associated with the smaller exponent is shifted according to the difference of exponents determined in segment one. X= 0.9504 * 10 3 0.08200 * 10 3. Add mantissas: The two mantissas are added in segment three. z = 1.0324 * 40 3 4. Normalize the result After normalization, the result is written as: z Instruction Pipeline Pipeline processing can occur not only in the data stream but in the instruction stream as well 0.1324 * 40 4 Most of the digital computers with complex instructions require instruction pipeline to carry out operations like fetch, decode and execute instructions. The organization of an instruction pipeline will be more efficient if the instruction cycle is divided into segments of equal duration. One of the most common examples of this type of organization is a Four-segment instruction pipeline. A four-segment instruction pipeline combines two or more different segments and makes it as a single one. For instance, the decoding of the instruction can be combined with the calculation of the effective address into one segment. ewes Segment 1: The instruction fetch segment can be implemented using first in, first out (FIFO) butfer. Segment 2: The instruction fetched from memory is decoded in the second segment, and eventually, the effective address is calculated in a separate arithmetic circuit Segment 3: An operand from memory is fetched in the third segment. Segment 4: The instructions are finally executed in the last segment of the pipeline organization Vv RISC Pipeline RISC stands for Reduced Instruction Set Computers. It was introduced to execute as fast as one instruction per clock cycle. This RISC pipeline helps to simplify the computer architecture's design. unis. 10 The main benefit of RISC to implement instructions at the cost of one per clock cycle is continually not applicable because each instruction cannot be fetched from memory and implemented in one clock cycle correctly under all circumstances. Principles of RISCs Pipeline There are various principles of RISCs pipeline which are as follows - + Keep the most frequently accessed operands in CPU registers. + It can minimize the register-to-memory operations. It can use a high number of registers to enhance operand referencing and decrease the processor memory traffic. + It can optimize the design of instruction pipelines such that minimum compiler code generation can be achieved. + Itcan use a simplified instruction set and leave out those complex and unnecessary instructions. Let us consider a three-segment instruction pipeline that shows how a compiler can optimize the machine language program to compensate for pipeline conflicts. A frequent collection of instructions for a RISC processor is of three types are as follows — + Data Manipulation Instructions ~ Manage the data in processor registers. + Data Transfer Instructions — These are load and store instructions that use an effective address that is obtained by adding the contents of two registers or a register and a displacement constant provided in the instruction. + Program Control Instructions - These instructions use register values and a constant to evaluate the branch address, which is transferred to a register or the program counter (PC). Vv Vector Processing Vector processing is a central processing unit that can perform the complete vector input in individual instruction. It is a complete unit of hardware resources that implements a sequential set of similar data elements in the memory using individual instruction, It is a parallel computing Main memory Vector concroller Vector! processor Functional Diagram of Vector Computer Here are some key characteristics related to vector processing: 1. Vector Instructions: * Vector processors use specialized vector instructions that perform the same operation on multiple data elements simultaneously. These instructions are designed to take advantage of the inherent parallelism in vectorized data. 2. Vector Registers: + Vector processors have dedicated vector registers that store the vector operands. These registers can hold multiple data elements, and vector instructions operate on these registers in a single instruction. 3. SIMD (Single Instruction, Multiple Data): + Vector processing follows the SIMD paradigm, where a single instruction is executed on multiple data elements at the same time. This contrasts with traditional scalar processors, which operate on one data element per instruction. 4, Vector Length: + The vector length determines the number of elements that can be processed in parallel by a single vector instruction, Longer vectors result in higher throughput but may require more resources. 5, Parallelism and Throughput: + Vector processing is designed to exploit parallelism in data processing. By performing the same operation on multiple data elements simultaneously, vector processors can achieve high throughput for certain types of computations, such as numerical simulations, scientific computing, and multimedia processing v Array Processor Aprocessor that performs computations on a vast array of data is known as an array processor. Multiprocessors and vector processors are other terms for array processors. It only executes one instruction at a time on an array of data. ary a — oe a Attached Array SIMD Array Processor Processor Attached Array Processor The attached array processor is the auxiliary processor connected to a general- purpose computer to enhance and improve the machine's performance in numerical computational tasks. The attached array processor includes a common processor with an input/output interface and a local memory interface. The main memory and the local memory are linked. General Purpose Computer | Attached Array | Processor eitarrtnenaroteny bs ‘Main memory Local memory Fig- The interconnection of Attached Array Processor to Host computer SIMD refers to the organization of a single computer with multiple parallel processors. The processing units are designed to work together under the supervision of a single control unit, resulting in a single instruction stream and multiple data streams. An array processor's general block diagram is given below. It comprises several identical processing elements (PEs), each with its local memory M. An ALU and registers are included in each processor element. The master control unit controls the processing elements’ actions. It also decodes instructions and determines how they should be carried out. The program is stored in the main memory. The control unit retrieves the instructions, Vector instructions are sent to all PEs simultaneously, and the results are stored in memory. Usage of Array Processors * Array processors enhance the total speed of instruction processing. + Most array processors’ design optimizes its performance for repetitive arithmetic operations, making it faster at vector arithmetic than the host CPU. Since most Array processors run asynchronously from the host CPU, the system’s overall capacity is thus improved. * Array Processors have their own local memory, providing additional extra, memory to systems with limited memory. This is an essential consideration for the systems with a limited physical memory or address space. Applications of Array Processors Array processing is used at various places, including:- + Radar Systems * Sonar Systems * Anti-jamming * Seismic Exploration * Wireless communication + Medical applications v Characteristics of Multiprocessors ‘A Multiprocessor is a computer system with two or more central processing units (CPUs) share full access to a common RAM. The main objective of using a multiprocessor is to boost the system’s execution speed There are two types of multiprocessors, one is called shared memory multiprocessor and another is distributed memory multiprocessor. In shared memory multiprocessors, all the CPUs shares the common memory but in a distributed memory multiprocessor, every CPU has its own private memory. There are the major characteristics of multiprocessors are as follows — + Parallel Computing ~ This involves the simultaneous application of multiple processors. These processors are developed using a single architecture to execute a common task. In general, processors are identical and they work together in such a way that the users are under the impression that they are the only users of the system. In reality, however, many users are accessing the system at a given time. * Distributed Computing - This involves the usage of a network of processors. Each processor in this network can be considered as a computer in its own right and have the capability to solve a problem. These processors are heterogeneous, and generally, one task is allocated to a single processor. + Supercomputing - This involves the usage of the fastest machines to resolve big and computationally complex problems. In the past, supercomputing machines were vector computers but at present, vector or parallel computing is accepted by most people. + Pipelining - This is a method wherein a specific task is divided into several subtasks that must be performed in a sequence. The functional units help in performing each subtask. The units are attached serially and all the units work simultaneously. + Vector Computing - It involves the usage of vector processors, wherein operations such as ‘multiplication’ are divided into many steps and are then applied to a stream of operands (‘vectors’). + Systolic - This is similar to pipelining, but units are not arranged in a linear order. The steps in systolic are normally small and more in number and performed in a lockstep manner. This is more frequently applied in special- purpose hardware such as image or signal processors. ¥ Interconnection Structures Interconnection structures : The processors must be able to share a set of main memory modules & /O devices in a multiprocessor system. This sharing capability can be provided through interconnection structures. The interconnection structure that are commonly used can be given as follows — 1, Time-shared / Common Bus 2. Cross bar Switch 3. Multiport Memory 4. Multistage Switching Network Time-shared / Common Bus (Interconnection structure in Multiprocessor System) In a multiprocessor system, the time shared bus interconnection provides a common communication path connecting all the functional units like processor, /O processor, memory unit etc. The figure below shows the multiple processors with common communication path (single bus) Common Bus _ ° = = Single-Bus Multiprocessor Organization To communicate with any functional unit, processor needs the bus to transfer the data. To do so, the processor first need to see that whether the bus is available / not by checking the status of the bus. If the bus is used by some other functional unit, the status is busy, else free. A processor can use bus only when the bus is free. The sender processor puts the address of the destination on the bus & the destination unit identifies it. In order to communicate with any functional unit, a command is issued to tell that unit, what work is to be done. The other processors at that time will be either busy in internal operations or will sit free, waiting to get bus. We can use a bus controller to resolve conflicts, if any. (Bus controller can set priority of different functional units) This Single-Bus Multiprocessor Organization is easiest to reconfigure & is simple. This interconnection structure contains only passive elements. The bus interfaces of sender & receiver units controls the transfer operation here. Advantages — * Inexpensive as no extra hardware is required such as switch + Simple & easy to configure as the functional units are directly connected to the bus Disadvantages - + Major fight with this kind of configuration is that if malfunctioning occurs in any of the bus interface circuits, complete system wil fail + Decreased throughput — At a time, only one processor can communicate with any other functional unit. v Interprocessor arbitration There is something known as a system bus that connects CPU, Input-Output processor, and memory (main components of a computer). In some cases, there exists a dispute in the system bus such that only one of the mentioned components can access it, The mechanism to solve this dispute is known as Interprocessor Arbitration. Computer systems need buses to facilitate the transfer of information between their various components. There is a dispute in the system bus that connects the CPU, Input-Output processor, and memory (main components of a computer). Only one between the CPU, Input-Output processor, and memory gets the grant to use the bus simultaneously. Hence, an appropriate priority resolving mechanism is required to decide which processor should get control of the bus. Therefore, a mechanism is needed to handle multiple requests for the bus, known as Interprocessor Arbitration. Static Arbitration Techniques In this technique, the priority assigned is fixed. It has two types. They are serial arbitration procedures and parallel arbitration logic. Serial Arbitration Highest Lowest priority priority To next Bus Bus Bus pian pL PO +} Pl POR Pl PO arbiter | arbiter 3 arbiter 4 Serial (daisy-chain) arbitration Itis also known as Daisy chain Arbitration. It is obtained by the daisy-chain connection of bus arbitration circuits. The scheme got the term from the structure of the grant line, which chains through each device from the higher to lowest priority. The highest priority device will pass the grant line to the lower priority device only if it does not want it, Then the priority is forwarded to the next in the sequence. All devices use the same line for bus requests.{f a busy bus line returns to its idle state, the most high-priority arbiter enables the busy line, and its corresponding processor can then run the required bus transfer. Advantage i) Itis a simple design. il) Less number control lines are used. Disadvantage i) Priority depends on the physical location of the device il) Propagation delay due to serially granting of bus ii) Failure of one of the devices may fail the entire system iv) Cannot assure fairness- a low priority device may be locked out indefinitely Parallel Arbitration BPRN BREQ Bus busy ine Parallel arbitration Ituses an external priority encoder and decoder, Each bus arbiter has a bus request output line and a bus acknowledge input line. Each arbiter enables request lines when its processor is requesting the system bus. The one with the highest priority determined by the output of the decoder gets access to the bus. Dynamic Arbitration Techniques Serial and Parallel bus arbitration are static since the priorities assigned are fixed. In dynamic arbitration, priorities of the system change while the system is in operation, The various algorithms used are:- Time Slice Polling v Cache Coherence Cache is hardware or software that is used to store something, usually data, temporarily in a computing environment. Acache coherence issue results from the concurrent operation of several processors and the possibility that various caches may hold different versions of the identical memory block. The cache coherence problem is the issue that arises when several copies of the same data are kept at various levels of memory. Cache coherence has three different levels: + Each writing operation seems to happen instantly. + Each operand's value changes are seen in every processor in precisely the same order. + Non-coherent behavior results from many processors interpreting the same action in various ways. Cache Cache Cache Cache Methods to resolve Cache Coherence The two methods listed below can be used to resolve the cache coherence issue: * Write Through * Write Back Write Through The easiest and most popular method is to write through. Every memory write operation updates the main memory. If the word is present in the cache memory at the requested address, the cache memory is also updated simultaneously with the main memory. The benefit of this approach is that the RAM and cache always hold the same information. In systems with direct memory access transfer, this quality is crucial. It makes sure the information in the main memory is up-to-date at all times so that a device interacting over DNA can access the most recent information. Advantage - It provides the highest level of consistency. Disadvantage - It requires a greater number of memory access. Write Back Only the catch location is changed during a write operation in this approach. When the word is withdrawn from the cache, the place is flagged, so it is replicated in the main memory. The right-back approach was developed because words may be updated numerous times while they are in the cache. However, as long as they are still there, it doesn't matter whether the copy that is stored in the main memory is outdated because requests for words are fulfilled from the cache. ‘An accurate copy must only be transferred back to the main memory when the word is separated from the cache. According to the analytical findings, between 10% and 30% of all memory references in a normal program are written into memory. ‘Advantage - A very small number of memory accesses and write operations. ADVERTISEMENT Disadvantage - Inconsistency may occur in this approach, The important terms related to the data or information stored in the cache as well as in the main memory are as follows: + Modified - The modified term signifies that the data stored in the cache and main memory are different. This means the data in the cache has been modified, and the changes need to be reflected in the main memory. + Exclusive - The exclusive term signifies that the data is clean, i.e., the cache and the main memory hold identical data. + Shared - Shared refers to the fact that the cache value contains the most current data copy, which is then shared across the whole cache as well as main memory. * Owned - The owned term indicates that the block is currently held by the cache and that it has acquired ownership of it, .e., complete privileges to that specific block, * Invalid - When a cache block is marked as invalid, it means that it needs to be fetched from another cache or main memory. Below is a list of the different Cache Coherence Protocols used in multiprocessor systems * MSI protocol (Modified, Shared, Invalid) + MOS protocol (Modified, Owned, Shared, Invalid) * MESI protocol (Modified, Exclusive, Shared, Invalid) * MOES! protocol (Modified, Owned, Exclusive, Shared, Invalid)

You might also like