SlideShare a Scribd company logo
UNIT-1 PRESENTED BY
R.RADHA
ASSISTANT PROFESSOR
DEPARTMENT OF CS
Evalution of computer system:
 Relays and vacuum tubes->diodes and transistors->to small –and medium-scale
integreated(SSI/MSI)circuits to->large scale and very large scale(VLSI).
 Increased speed and reliability and reduction in HW cost and size.
 Von Neuman I/P->Process->O/P.
 16th century->word computer-Chinesh&Egypthian.
 Architecture:
 Arrangements of components with one another to execute a task.

 Hard vand->Separated H/W we have to connect one another.
 Von Numan->Constructed as a single unit through bus we can communicate.

Modern com/...sys composite of
 Processor.
 Memories.
 Functional units.
 Inter connection Networks.
 Peripheral devices, database.
Computer Architecture:
 System component integrated Hardware, Software algorithm and
languages to perform large computations.
Generation determined by:
 Device technology.
 System architecture.
 Processing mode.
 Language used
First Generation(1983-1953):
 First Electronic analog computer.
 First Electronic digital computer.
 ENIAC->Electronic Numercial integrator and computer.
 Electromechanical relays were used as switching devices(1940).
 1950->Vacum tubes->interconnected by insulated wires.
 Arithmetic is done on a bit-by-bit fixed point basis, which uses a single
full adder and one bit of carry flay.
 BCD(Machine language was used in early computers.
 First stored program computers (os).
 EDVACElectronic Discrete Variable Automatic Computer.
Second Generation(1952-1963):
 Transistors 1948.
 TRADIC->by Bell Laboratories.
 800 transistor used.
 Printed circuits appears.
 Current magnetic core memory was developed and sub sequently applaced in many
machines.
 Assembly language were used.
 Fortan-1956 (FORTAN).
 Algol-1960.
 1959->Sperry Rand build larc IDM started stretch project.
 Larc-had independent I/O processor paralled with one or two processing usints.
 COBOL1959.
 Inter changeable disk pack-1963.
 Batch processing was popular.
Third Generation(1962-1975):
 SSI and MSI->as the basic building block.
 Multiprogramming was well developed to allow the simultaneous
execution of many program segments interleaved with intelligent
compilers during this --------I/O operation.
 Time sharing O/S-1960’s.
 Virtual memory was developed by using hierarchically structured
memory system.
Fourth Generation(1972-present):
 LSI-circuit for both logic and memory sections.
 High-level language are being extented to handle both scalar and vector
data.
 Most O/S are time sharing using virtual memories.
 High-speed main frames and supers appear in multiprocessor system.
 A high degree of pipelining and multiprocessing is greatly emphasized in
commercial supercomputers.
 Massinely parallel processor(MPP)-1982.
 16,384 bit-slice Microprocessor, is under the control of one array
controller for satellite image processing.
Trends towards parallel processing:
 4 Ascending level
 1. Data processing
 2. Information processing
 3. Knowledge processing
 4. Intelligence processing
Advanced computer architecture
Data processing:
 Largest space.
 Numeric numbers, character symbols, multidimensional measures.
 Huge amount of data are being generated daily in all walks of life.
 Scientific, business&government sectors.
 Information item processing:
 Collection of data object that are related by some syntatics structure of
the relation.
 It forms sub space of the data space.
 KNOWLEDGE:
 Consists of information items plus some semantic meaning.
 Forms subspace of information space.
 From an O/S point of view 4Phases:
 Batch processing
 Multiprogramming
 Time sharing
 Multiprocessing
 Degree of parallism increases sharply form phase to phase.
Definition parallel processing:
 Efficient form of info/... processing.
 Exploitation of concurrent events in the computing process.
 Concurrent implies:
 Parallism
 Simultancity
 Pipelining
 Parallel Events may
 Occur in multiple resources during the same time interval.
 Simultaneous eventsMay occur at the same time instant.
 Pipelined eventsMay occur in overlapped time span.
 P-PDemands concurrent execution.
 Cost effective, improce system performance.
 The highest level of parallel processing is conducted among multiple jobs
or program through multiprog/... time sharing and multiprocessing.
 Use parallel processable algorithm.
 Implement of parallel algorithm depends on the efficient allocation of
limited H/W-S/W resources to multiple pgms being used to solve a large
computation problem.
 The next highest level of parallel processing is conducted among
procedures or tasks. With in the same program.
Decomposition of a program into
multiple task.
 3rd level exploit concurrency among multiple instructions.
 Job or program level
 Task or procedure level
 Inter instruction level
 Intra instruction level
 Highest Job level by->algorithms.
 Lowest Job level by->Hardware.
 H/W level->high to low level increased.
 S/W-low to high level (implementation).
3Major components
 Main Memory.
 CPU.
 I/O subsystems.
Advanced computer architecture
 Super minicomputer VAX-11/780.
 Manufactured by digital equipment company.
 CPU contains the master control of the VAX system.
 16-32bit general-purpose register, one the which sever as the program counter
(PC).
 Special CPU status Register.
 Containing information about the current state of the processor and of the
program being executed.
 ALUwith an optional floating point accelerator.
 Some local cache memory with optional dynamic memory.
 The cpu main memory and I/O subsystems connected by common by (SBI
Synchronous Backplane Interconnect).
 All I/O devices can communicate with each other through this bus.
 Peripheral storage or I/O devices can be connected directly to the SBI
through the unibus and its controller or through a massbus and its
Advanced computer architecture
 Main memory divided into 4 units (logical storage units).
 Storage controller provides multiport connections b/w the cup and 4 LSUS.
 Periperals are connected to the system via high-speed I/O channels which
operate asynchronously with the cpu.
 It is necessary to balance the processing rate of various subsystems in
order to avoid bottle necks and to increase total system throughput.
 Throughtputwhich is the number of instructions performed per unit
time.
Parallel processing mechanism in
uniprocessor computers:
 Multiplicity of functional units.
 Parallelism and pipelining with the cpu.
 Overlapped cpu and I/O operation.
 Use of hierarchical memory system.
 Balance of subsystem bandwidths.
 Multiprogramming and time sharing.
 1.Multiplicity of functional units:
 Early comp->only one ALU in cpu.
 ALU: Could only perform one function at a time.
 CDC->6600-1964:
 Has 10 functional unit built into its cpu.
Advanced computer architecture
 These units are independent of each other may operate simultaneously.
 Score board:
 Used to keep track of the availblity of the functional units.
 10 functional units 24 registers available.
 Instruction issue rate can be significantly increased.
 Another & IBM 360/91(1968)
 Which has two parallel execution.
 Floating point arithmetic.
 Floating point add-subtract.
 Within the floating point E unit are 2 functional unit.
 Fixed point arithmetic.
 F/P multiply&devide.
 Highly pipelined, multifunction, scientic uniprocessor.
2.Parallelism and pipelining with in the
cpu:
 Parallel adderusing technique
 Carry-lookahead
 Carry-save
 High-speed multiplies recording and comcesence division are techniques for
exploring parallelism and sharing of hardware resources for the functions of
multiply and divide.
 Various phases of instruction executions are now pipelined->Inst/...fetching,
decode, operand fetch arithmetic logic execution and store result.
 Instruction prefetch and data buffering technique have been developed.
 Most commercial uniprocessor systems are now pipelined in their cpu with a
clock rate b/w 10 and 500ns.
3.Overlapped cpu and I/O operation:
 I/O operations can be performed simultaneously with the cpu
computations by using separate I/O controller, channels or I/O processor.
 DMA channel can be used to provide direct information transfer b/w the
I/O devices and the main memory.
 DMA is conducted on a cycle-stealing basis which is apparent to cpu.
 Back-end db machines can be used to manage large db stored on disk.
4.Use of hierarchical memory system:
 The inner most level is the register files directly addressable by ALU.
 Cache memory can be used to serve as a buffer b/w the cpu and main
memory.
 Block access of the main memory can be achieved through multiway
interleaving across parallel memory modules.
 V/memory space can be established with the use of disks and tape units
at the outer levels.
Advanced computer architecture
5.Balance of subsystems bandwidth:
 In general the cpu is the fastest unit in a computer, with a processor cycle
 tp of tens of nanosecond.
 The main memory has a cycle time tm of hundreds of nanosecond.
 The I/O devices are the slowest with an average access time td of a few
millisecond.
 td>tm>tp
Example:
 IBM 370/168
 td=5ms(disk)
 tm=320ns
 tp=80ns
 With these speed gaps b/w the subsystem we need to match their
processing bandwidth in order to avoid a system bottle neck problem.
 The bandwidth of a system is defined as the no/..of operation performed
per unit time.
 The memory bandwidth is measured by the no of memory words that can
be accessed per unit time.
 Bm=W/tm(word/s or byte/s)
Advanced computer architecture
Advanced computer architecture
6.Multiprogramming and time
sharing:
 With in the same time interval, there may be multiple processes active in
a computer computing for memory, I/O and cpu resources.
 Some computer programs are
 CPU bound(Computation intensive)
 I/O bound (I/O intensive)
 The program interlaving is internted to promote better resource
utilization through overlapping I/O&CPU operations.


Advanced computer architecture
Advanced computer architecture
 Whenever a process p1 is tied up with I/O operation, the system scheduler
can switch the cpu to process p2. This allows the simultaneous execution
of several programs in the system.
 When p2 is done, the cpu can be switched to p3.
 The overlapped I/O and cpu operations and the cpu wait time are greatly
reduced.
 This interleaving of cpu and I/O operations among several programs is
called multiprogramming.
Figure b:
Time sharing Fig 1.9c:
 M/pg on a uniprocessor is centered around the sharing of the cpu by
many problems.
 Sometimes a high-priority program may occupy the cpu for too long to
allow others to share.
 This problem can be overcome by using a time-sharing o/s.
 Multiprogramming by assigning fixed or variable time slice to
multiprograms.
 Equal opertunities are given no all programs computing for the use of
CPU.
 The execution time saved with time sharing may be greater than either
batch or multiprogram processing mode.
Parallel Computer Structures:
 We divide parallel computer into 3architectural configuration.
 1. Pipeline computerTemporal parallelism.
 2. Array processorSpatial parallelism(synchronous).
 3. Multiprocessor systemAsynchronous parallelism.
1.Pipeline computers:
 The process of executing an instruction in a digital computers involves 4
major steps.
 IF->Instruction fetch from the main memory.
 ID->Inst/...decoding->Identify the operation to be performed.
 OF->Operand fetch if need the execution and then.
 EX->Execution of the decoded arithmetic logic operation.
 In a nonpipelined computer, these 4 steps must be completed before the
next instruction can be issued.
 In pipelinedexecuted in a overlapped fashion.
 Stage to stage is triggered by a common clock of pipeline.
Advanced computer architecture
 Both scalar arithmetic pipeline and vector arithmetic pipelines are provided.
 The instruction preprocessing unit is itself pipelined with three stages shown.
 The of stage consists of two independents stages.
 One for fetching scalar operands.
 Vector operands.
 The scalar register are feeuel in quantity than the vector register because each
vector register implies a whole set of component register.
 Scalar processoract on a single data stream where as a vector processor
works on a 1D (vector) of numbers (multiple data stream).
 SIMD-Example of vector.
 Super scalar->Multiple instruction at once but from the same instruction
stream.
 Super scalar->MIMD->should not confused with MIMDused more in parallel
computing arch/.... becoz there are multiple instruction stream operating
independently.
2.Array computers:
 Is a synchronous parallel computer with multiple arithmetic logic units.
Called processing element (PE) that can operate in parallel in a lock step
fashion.
 The PE are synchronized to perform the same function at the same time.
 An appropriate data-routing mechanism must be established among the
PES.
Fig 1.12: (SIMD array processor)
 Scalar and control-type instructions are directly executed in the control unit
(CU).
 Each PEs consists of an ALU with registers and a local memory.
 The PEs are interconnected by a data-routing n/w.
 The interconnection pallern to be established for specific computation is under
program control from the cu.
 Vector instructions are broad cast to the PEs for distributed execution over
different component operands fetched directly from the local memories.
 Instruction fetch (from local memory or from the control memory) and decode
is done by the control unit.
 The PEs are passive devices without instruction decoding capabilities.
 Associative memorywhich is content addressable will also be treated there in
the contex of parallel processing.
 Array processor designed with associative memories called associative
processor.
3.Multiprocessor systems:
 Research and development of multiprocessor systems are aimed at
improving throughput, reliability, felexibility and availability.
 The system contains two or more processors of approxincately comparable
capabilities.
 All processors share access to common sets of memory modules, I/O
channels, and peripheral devices.
 Most importantly, the entire system must be controlled by a single integrated
o/s providing intractions b/w processors and their programs at various levels.
 Besides the shared memories and I/O devices each processor has its own
local memory and private devices.
 Interprocessor communications can be done through the shared memories
or through an interucpt n/w.
 Multiprocessort H/W sys organization is determind primarily by the
interconnection structure to be used b/w the memories and processor.
 Three different interconnection have been participated in the past.
 1. Time-shared common bus.
 2. Crossbar switch n/w.
 3. Multiport memories.








Advanced computer architecture
Advanced computer architecture

More Related Content

PPTX
Multiprocessor architecture
PPTX
Parallel computing and its applications
PPTX
Computer system architecture
PPTX
Vpn(virtual private network)
PPSX
08. Central Processing Unit (CPU)
PPTX
Computer Organization
PPTX
Pipelining and vector processing
PPTX
Buses in a computer
Multiprocessor architecture
Parallel computing and its applications
Computer system architecture
Vpn(virtual private network)
08. Central Processing Unit (CPU)
Computer Organization
Pipelining and vector processing
Buses in a computer

What's hot (20)

PDF
Introduction to Bus | Address, Data, Control Bus
PDF
Pipelining and ILP (Instruction Level Parallelism)
PPTX
Memory management
PPTX
Cache memory
PPT
Arbitration in computer organization
PPTX
Distributed computing
PPTX
bus and memory tranfer (computer organaization)
PPT
Parallel processing
PPTX
INTERCONNECTION STRUCTURE
PPTX
Parallel Processing & Pipelining in Computer Architecture_Prof.Sumalatha.pptx
PPTX
Memory virtualization
PPT
Basic structure of computers
PPTX
Computer organization
PPTX
1 Computer Architecture
PPTX
Comuputer processor
PPT
History of the internet
PPTX
Single &Multi Core processor
PPTX
INSTRUCTION LEVEL PARALLALISM
PPTX
Computer Organization : CPU, Memory and I/O organization
PPT
Application Layer
Introduction to Bus | Address, Data, Control Bus
Pipelining and ILP (Instruction Level Parallelism)
Memory management
Cache memory
Arbitration in computer organization
Distributed computing
bus and memory tranfer (computer organaization)
Parallel processing
INTERCONNECTION STRUCTURE
Parallel Processing & Pipelining in Computer Architecture_Prof.Sumalatha.pptx
Memory virtualization
Basic structure of computers
Computer organization
1 Computer Architecture
Comuputer processor
History of the internet
Single &Multi Core processor
INSTRUCTION LEVEL PARALLALISM
Computer Organization : CPU, Memory and I/O organization
Application Layer
Ad

Similar to Advanced computer architecture (20)

PPTX
Parallel Processing.pptx
DOCX
Parallel computing persentation
PPTX
Lecture1
DOCX
INTRODUCTION TO PARALLEL PROCESSING
PPTX
PP - CH01 (2).pptxhhsjoshhshhshhhshhshsbx
PPT
Multilevel arch & str org.& mips, 8086, memory
PPT
Introduction To operating System:
PPT
The Deal
PDF
PPT
Introduction to parallel_computing
PPTX
High_Performance_ComputingforComputers.pptx
DOCX
Introduction to parallel computing
PPT
Multiprocessors Characters coherence.ppt
PPT
Brief Introduction.ppt
PPTX
Operating system v1 d1
PPTX
Overview of HPC.pptx
PPTX
Modern processors
PPTX
Underlying principles of parallel and distributed computing
PDF
Computer science class 11th, kseeb notes
PDF
Free Hardware & Networking Slides by ITE Infotech Private Limited
Parallel Processing.pptx
Parallel computing persentation
Lecture1
INTRODUCTION TO PARALLEL PROCESSING
PP - CH01 (2).pptxhhsjoshhshhshhhshhshsbx
Multilevel arch & str org.& mips, 8086, memory
Introduction To operating System:
The Deal
Introduction to parallel_computing
High_Performance_ComputingforComputers.pptx
Introduction to parallel computing
Multiprocessors Characters coherence.ppt
Brief Introduction.ppt
Operating system v1 d1
Overview of HPC.pptx
Modern processors
Underlying principles of parallel and distributed computing
Computer science class 11th, kseeb notes
Free Hardware & Networking Slides by ITE Infotech Private Limited
Ad

Recently uploaded (20)

PDF
Yogi Goddess Pres Conference Studio Updates
PPTX
Cell Structure & Organelles in detailed.
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Complications of Minimal Access Surgery at WLH
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
Lesson notes of climatology university.
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
RMMM.pdf make it easy to upload and study
PPTX
GDM (1) (1).pptx small presentation for students
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
Cell Types and Its function , kingdom of life
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Yogi Goddess Pres Conference Studio Updates
Cell Structure & Organelles in detailed.
Weekly quiz Compilation Jan -July 25.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Microbial diseases, their pathogenesis and prophylaxis
human mycosis Human fungal infections are called human mycosis..pptx
Anesthesia in Laparoscopic Surgery in India
O7-L3 Supply Chain Operations - ICLT Program
Complications of Minimal Access Surgery at WLH
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Lesson notes of climatology university.
Chinmaya Tiranga quiz Grand Finale.pdf
RMMM.pdf make it easy to upload and study
GDM (1) (1).pptx small presentation for students
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Cell Types and Its function , kingdom of life
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student

Advanced computer architecture

  • 1. UNIT-1 PRESENTED BY R.RADHA ASSISTANT PROFESSOR DEPARTMENT OF CS
  • 2. Evalution of computer system:  Relays and vacuum tubes->diodes and transistors->to small –and medium-scale integreated(SSI/MSI)circuits to->large scale and very large scale(VLSI).  Increased speed and reliability and reduction in HW cost and size.  Von Neuman I/P->Process->O/P.  16th century->word computer-Chinesh&Egypthian.  Architecture:  Arrangements of components with one another to execute a task.   Hard vand->Separated H/W we have to connect one another.  Von Numan->Constructed as a single unit through bus we can communicate. 
  • 3. Modern com/...sys composite of  Processor.  Memories.  Functional units.  Inter connection Networks.  Peripheral devices, database.
  • 4. Computer Architecture:  System component integrated Hardware, Software algorithm and languages to perform large computations.
  • 5. Generation determined by:  Device technology.  System architecture.  Processing mode.  Language used
  • 6. First Generation(1983-1953):  First Electronic analog computer.  First Electronic digital computer.  ENIAC->Electronic Numercial integrator and computer.  Electromechanical relays were used as switching devices(1940).  1950->Vacum tubes->interconnected by insulated wires.  Arithmetic is done on a bit-by-bit fixed point basis, which uses a single full adder and one bit of carry flay.  BCD(Machine language was used in early computers.  First stored program computers (os).  EDVACElectronic Discrete Variable Automatic Computer.
  • 7. Second Generation(1952-1963):  Transistors 1948.  TRADIC->by Bell Laboratories.  800 transistor used.  Printed circuits appears.  Current magnetic core memory was developed and sub sequently applaced in many machines.  Assembly language were used.  Fortan-1956 (FORTAN).  Algol-1960.  1959->Sperry Rand build larc IDM started stretch project.  Larc-had independent I/O processor paralled with one or two processing usints.  COBOL1959.  Inter changeable disk pack-1963.  Batch processing was popular.
  • 8. Third Generation(1962-1975):  SSI and MSI->as the basic building block.  Multiprogramming was well developed to allow the simultaneous execution of many program segments interleaved with intelligent compilers during this --------I/O operation.  Time sharing O/S-1960’s.  Virtual memory was developed by using hierarchically structured memory system.
  • 9. Fourth Generation(1972-present):  LSI-circuit for both logic and memory sections.  High-level language are being extented to handle both scalar and vector data.  Most O/S are time sharing using virtual memories.  High-speed main frames and supers appear in multiprocessor system.  A high degree of pipelining and multiprocessing is greatly emphasized in commercial supercomputers.  Massinely parallel processor(MPP)-1982.  16,384 bit-slice Microprocessor, is under the control of one array controller for satellite image processing.
  • 10. Trends towards parallel processing:  4 Ascending level  1. Data processing  2. Information processing  3. Knowledge processing  4. Intelligence processing
  • 12. Data processing:  Largest space.  Numeric numbers, character symbols, multidimensional measures.  Huge amount of data are being generated daily in all walks of life.  Scientific, business&government sectors.  Information item processing:  Collection of data object that are related by some syntatics structure of the relation.  It forms sub space of the data space.
  • 13.  KNOWLEDGE:  Consists of information items plus some semantic meaning.  Forms subspace of information space.  From an O/S point of view 4Phases:  Batch processing  Multiprogramming  Time sharing  Multiprocessing  Degree of parallism increases sharply form phase to phase.
  • 14. Definition parallel processing:  Efficient form of info/... processing.  Exploitation of concurrent events in the computing process.  Concurrent implies:  Parallism  Simultancity  Pipelining  Parallel Events may  Occur in multiple resources during the same time interval.  Simultaneous eventsMay occur at the same time instant.  Pipelined eventsMay occur in overlapped time span.  P-PDemands concurrent execution.  Cost effective, improce system performance.
  • 15.  The highest level of parallel processing is conducted among multiple jobs or program through multiprog/... time sharing and multiprocessing.  Use parallel processable algorithm.  Implement of parallel algorithm depends on the efficient allocation of limited H/W-S/W resources to multiple pgms being used to solve a large computation problem.  The next highest level of parallel processing is conducted among procedures or tasks. With in the same program.
  • 16. Decomposition of a program into multiple task.  3rd level exploit concurrency among multiple instructions.  Job or program level  Task or procedure level  Inter instruction level  Intra instruction level  Highest Job level by->algorithms.  Lowest Job level by->Hardware.  H/W level->high to low level increased.  S/W-low to high level (implementation).
  • 17. 3Major components  Main Memory.  CPU.  I/O subsystems.
  • 19.  Super minicomputer VAX-11/780.  Manufactured by digital equipment company.  CPU contains the master control of the VAX system.  16-32bit general-purpose register, one the which sever as the program counter (PC).  Special CPU status Register.  Containing information about the current state of the processor and of the program being executed.  ALUwith an optional floating point accelerator.  Some local cache memory with optional dynamic memory.  The cpu main memory and I/O subsystems connected by common by (SBI Synchronous Backplane Interconnect).  All I/O devices can communicate with each other through this bus.  Peripheral storage or I/O devices can be connected directly to the SBI through the unibus and its controller or through a massbus and its
  • 21.  Main memory divided into 4 units (logical storage units).  Storage controller provides multiport connections b/w the cup and 4 LSUS.  Periperals are connected to the system via high-speed I/O channels which operate asynchronously with the cpu.  It is necessary to balance the processing rate of various subsystems in order to avoid bottle necks and to increase total system throughput.  Throughtputwhich is the number of instructions performed per unit time.
  • 22. Parallel processing mechanism in uniprocessor computers:  Multiplicity of functional units.  Parallelism and pipelining with the cpu.  Overlapped cpu and I/O operation.  Use of hierarchical memory system.  Balance of subsystem bandwidths.  Multiprogramming and time sharing.  1.Multiplicity of functional units:  Early comp->only one ALU in cpu.  ALU: Could only perform one function at a time.  CDC->6600-1964:  Has 10 functional unit built into its cpu.
  • 24.  These units are independent of each other may operate simultaneously.  Score board:  Used to keep track of the availblity of the functional units.  10 functional units 24 registers available.  Instruction issue rate can be significantly increased.  Another & IBM 360/91(1968)  Which has two parallel execution.  Floating point arithmetic.  Floating point add-subtract.  Within the floating point E unit are 2 functional unit.  Fixed point arithmetic.  F/P multiply&devide.  Highly pipelined, multifunction, scientic uniprocessor.
  • 25. 2.Parallelism and pipelining with in the cpu:  Parallel adderusing technique  Carry-lookahead  Carry-save  High-speed multiplies recording and comcesence division are techniques for exploring parallelism and sharing of hardware resources for the functions of multiply and divide.  Various phases of instruction executions are now pipelined->Inst/...fetching, decode, operand fetch arithmetic logic execution and store result.  Instruction prefetch and data buffering technique have been developed.  Most commercial uniprocessor systems are now pipelined in their cpu with a clock rate b/w 10 and 500ns.
  • 26. 3.Overlapped cpu and I/O operation:  I/O operations can be performed simultaneously with the cpu computations by using separate I/O controller, channels or I/O processor.  DMA channel can be used to provide direct information transfer b/w the I/O devices and the main memory.  DMA is conducted on a cycle-stealing basis which is apparent to cpu.  Back-end db machines can be used to manage large db stored on disk.
  • 27. 4.Use of hierarchical memory system:  The inner most level is the register files directly addressable by ALU.  Cache memory can be used to serve as a buffer b/w the cpu and main memory.  Block access of the main memory can be achieved through multiway interleaving across parallel memory modules.  V/memory space can be established with the use of disks and tape units at the outer levels.
  • 29. 5.Balance of subsystems bandwidth:  In general the cpu is the fastest unit in a computer, with a processor cycle  tp of tens of nanosecond.  The main memory has a cycle time tm of hundreds of nanosecond.  The I/O devices are the slowest with an average access time td of a few millisecond.  td>tm>tp
  • 30. Example:  IBM 370/168  td=5ms(disk)  tm=320ns  tp=80ns  With these speed gaps b/w the subsystem we need to match their processing bandwidth in order to avoid a system bottle neck problem.  The bandwidth of a system is defined as the no/..of operation performed per unit time.  The memory bandwidth is measured by the no of memory words that can be accessed per unit time.  Bm=W/tm(word/s or byte/s)
  • 33. 6.Multiprogramming and time sharing:  With in the same time interval, there may be multiple processes active in a computer computing for memory, I/O and cpu resources.  Some computer programs are  CPU bound(Computation intensive)  I/O bound (I/O intensive)  The program interlaving is internted to promote better resource utilization through overlapping I/O&CPU operations.  
  • 36.  Whenever a process p1 is tied up with I/O operation, the system scheduler can switch the cpu to process p2. This allows the simultaneous execution of several programs in the system.  When p2 is done, the cpu can be switched to p3.  The overlapped I/O and cpu operations and the cpu wait time are greatly reduced.  This interleaving of cpu and I/O operations among several programs is called multiprogramming. Figure b:
  • 37. Time sharing Fig 1.9c:  M/pg on a uniprocessor is centered around the sharing of the cpu by many problems.  Sometimes a high-priority program may occupy the cpu for too long to allow others to share.  This problem can be overcome by using a time-sharing o/s.  Multiprogramming by assigning fixed or variable time slice to multiprograms.  Equal opertunities are given no all programs computing for the use of CPU.  The execution time saved with time sharing may be greater than either batch or multiprogram processing mode.
  • 38. Parallel Computer Structures:  We divide parallel computer into 3architectural configuration.  1. Pipeline computerTemporal parallelism.  2. Array processorSpatial parallelism(synchronous).  3. Multiprocessor systemAsynchronous parallelism.
  • 39. 1.Pipeline computers:  The process of executing an instruction in a digital computers involves 4 major steps.  IF->Instruction fetch from the main memory.  ID->Inst/...decoding->Identify the operation to be performed.  OF->Operand fetch if need the execution and then.  EX->Execution of the decoded arithmetic logic operation.  In a nonpipelined computer, these 4 steps must be completed before the next instruction can be issued.  In pipelinedexecuted in a overlapped fashion.  Stage to stage is triggered by a common clock of pipeline.
  • 41.  Both scalar arithmetic pipeline and vector arithmetic pipelines are provided.  The instruction preprocessing unit is itself pipelined with three stages shown.  The of stage consists of two independents stages.  One for fetching scalar operands.  Vector operands.  The scalar register are feeuel in quantity than the vector register because each vector register implies a whole set of component register.  Scalar processoract on a single data stream where as a vector processor works on a 1D (vector) of numbers (multiple data stream).  SIMD-Example of vector.  Super scalar->Multiple instruction at once but from the same instruction stream.  Super scalar->MIMD->should not confused with MIMDused more in parallel computing arch/.... becoz there are multiple instruction stream operating independently.
  • 42. 2.Array computers:  Is a synchronous parallel computer with multiple arithmetic logic units. Called processing element (PE) that can operate in parallel in a lock step fashion.  The PE are synchronized to perform the same function at the same time.  An appropriate data-routing mechanism must be established among the PES.
  • 43. Fig 1.12: (SIMD array processor)
  • 44.  Scalar and control-type instructions are directly executed in the control unit (CU).  Each PEs consists of an ALU with registers and a local memory.  The PEs are interconnected by a data-routing n/w.  The interconnection pallern to be established for specific computation is under program control from the cu.  Vector instructions are broad cast to the PEs for distributed execution over different component operands fetched directly from the local memories.  Instruction fetch (from local memory or from the control memory) and decode is done by the control unit.  The PEs are passive devices without instruction decoding capabilities.  Associative memorywhich is content addressable will also be treated there in the contex of parallel processing.  Array processor designed with associative memories called associative processor.
  • 45. 3.Multiprocessor systems:  Research and development of multiprocessor systems are aimed at improving throughput, reliability, felexibility and availability.  The system contains two or more processors of approxincately comparable capabilities.  All processors share access to common sets of memory modules, I/O channels, and peripheral devices.  Most importantly, the entire system must be controlled by a single integrated o/s providing intractions b/w processors and their programs at various levels.  Besides the shared memories and I/O devices each processor has its own local memory and private devices.  Interprocessor communications can be done through the shared memories or through an interucpt n/w.  Multiprocessort H/W sys organization is determind primarily by the interconnection structure to be used b/w the memories and processor.
  • 46.  Three different interconnection have been participated in the past.  1. Time-shared common bus.  2. Crossbar switch n/w.  3. Multiport memories.        