Advanced computer architecture

UNIT-1 PRESENTED BY
R.RADHA
ASSISTANT PROFESSOR
DEPARTMENT OF CS

Evalution of computer system:
 Relays and vacuum tubes->diodes and transistors->to small –and medium-scale
integreated(SSI/MSI)circuits to->large scale and very large scale(VLSI).
 Increased speed and reliability and reduction in HW cost and size.
 Von Neuman I/P->Process->O/P.
 16th century->word computer-Chinesh&Egypthian.
 Architecture:
 Arrangements of components with one another to execute a task.

 Hard vand->Separated H/W we have to connect one another.
 Von Numan->Constructed as a single unit through bus we can communicate.


Modern com/...sys composite of
 Processor.
 Memories.
 Functional units.
 Inter connection Networks.
 Peripheral devices, database.

Computer Architecture:
 System component integrated Hardware, Software algorithm and
languages to perform large computations.

Generation determined by:
 Device technology.
 System architecture.
 Processing mode.
 Language used

First Generation(1983-1953):
 First Electronic analog computer.
 First Electronic digital computer.
 ENIAC->Electronic Numercial integrator and computer.
 Electromechanical relays were used as switching devices(1940).
 1950->Vacum tubes->interconnected by insulated wires.
 Arithmetic is done on a bit-by-bit fixed point basis, which uses a single
full adder and one bit of carry flay.
 BCD(Machine language was used in early computers.
 First stored program computers (os).
 EDVACElectronic Discrete Variable Automatic Computer.

Second Generation(1952-1963):
 Transistors 1948.
 TRADIC->by Bell Laboratories.
 800 transistor used.
 Printed circuits appears.
 Current magnetic core memory was developed and sub sequently applaced in many
machines.
 Assembly language were used.
 Fortan-1956 (FORTAN).
 Algol-1960.
 1959->Sperry Rand build larc IDM started stretch project.
 Larc-had independent I/O processor paralled with one or two processing usints.
 COBOL1959.
 Inter changeable disk pack-1963.
 Batch processing was popular.

Third Generation(1962-1975):
 SSI and MSI->as the basic building block.
 Multiprogramming was well developed to allow the simultaneous
execution of many program segments interleaved with intelligent
compilers during this --------I/O operation.
 Time sharing O/S-1960’s.
 Virtual memory was developed by using hierarchically structured
memory system.

Fourth Generation(1972-present):
 LSI-circuit for both logic and memory sections.
 High-level language are being extented to handle both scalar and vector
data.
 Most O/S are time sharing using virtual memories.
 High-speed main frames and supers appear in multiprocessor system.
 A high degree of pipelining and multiprocessing is greatly emphasized in
commercial supercomputers.
 Massinely parallel processor(MPP)-1982.
 16,384 bit-slice Microprocessor, is under the control of one array
controller for satellite image processing.

Trends towards parallel processing:
 4 Ascending level
 1. Data processing
 2. Information processing
 3. Knowledge processing
 4. Intelligence processing

Advanced computer architecture

Data processing:
 Largest space.
 Numeric numbers, character symbols, multidimensional measures.
 Huge amount of data are being generated daily in all walks of life.
 Scientific, business&government sectors.
 Information item processing:
 Collection of data object that are related by some syntatics structure of
the relation.
 It forms sub space of the data space.

 KNOWLEDGE:
 Consists of information items plus some semantic meaning.
 Forms subspace of information space.
 From an O/S point of view 4Phases:
 Batch processing
 Multiprogramming
 Time sharing
 Multiprocessing
 Degree of parallism increases sharply form phase to phase.

Definition parallel processing:
 Efficient form of info/... processing.
 Exploitation of concurrent events in the computing process.
 Concurrent implies:
 Parallism
 Simultancity
 Pipelining
 Parallel Events may
 Occur in multiple resources during the same time interval.
 Simultaneous eventsMay occur at the same time instant.
 Pipelined eventsMay occur in overlapped time span.
 P-PDemands concurrent execution.
 Cost effective, improce system performance.

 The highest level of parallel processing is conducted among multiple jobs
or program through multiprog/... time sharing and multiprocessing.
 Use parallel processable algorithm.
 Implement of parallel algorithm depends on the efficient allocation of
limited H/W-S/W resources to multiple pgms being used to solve a large
computation problem.
 The next highest level of parallel processing is conducted among
procedures or tasks. With in the same program.

Decomposition of a program into
multiple task.
 3rd level exploit concurrency among multiple instructions.
 Job or program level
 Task or procedure level
 Inter instruction level
 Intra instruction level
 Highest Job level by->algorithms.
 Lowest Job level by->Hardware.
 H/W level->high to low level increased.
 S/W-low to high level (implementation).

3Major components
 Main Memory.
 CPU.
 I/O subsystems.

 Super minicomputer VAX-11/780.
 Manufactured by digital equipment company.
 CPU contains the master control of the VAX system.
 16-32bit general-purpose register, one the which sever as the program counter
(PC).
 Special CPU status Register.
 Containing information about the current state of the processor and of the
program being executed.
 ALUwith an optional floating point accelerator.
 Some local cache memory with optional dynamic memory.
 The cpu main memory and I/O subsystems connected by common by (SBI
Synchronous Backplane Interconnect).
 All I/O devices can communicate with each other through this bus.
 Peripheral storage or I/O devices can be connected directly to the SBI
through the unibus and its controller or through a massbus and its

 Main memory divided into 4 units (logical storage units).
 Storage controller provides multiport connections b/w the cup and 4 LSUS.
 Periperals are connected to the system via high-speed I/O channels which
operate asynchronously with the cpu.
 It is necessary to balance the processing rate of various subsystems in
order to avoid bottle necks and to increase total system throughput.
 Throughtputwhich is the number of instructions performed per unit
time.

Parallel processing mechanism in
uniprocessor computers:
 Multiplicity of functional units.
 Parallelism and pipelining with the cpu.
 Overlapped cpu and I/O operation.
 Use of hierarchical memory system.
 Balance of subsystem bandwidths.
 Multiprogramming and time sharing.
 1.Multiplicity of functional units:
 Early comp->only one ALU in cpu.
 ALU: Could only perform one function at a time.
 CDC->6600-1964:
 Has 10 functional unit built into its cpu.

 These units are independent of each other may operate simultaneously.
 Score board:
 Used to keep track of the availblity of the functional units.
 10 functional units 24 registers available.
 Instruction issue rate can be significantly increased.
 Another & IBM 360/91(1968)
 Which has two parallel execution.
 Floating point arithmetic.
 Floating point add-subtract.
 Within the floating point E unit are 2 functional unit.
 Fixed point arithmetic.
 F/P multiply&devide.
 Highly pipelined, multifunction, scientic uniprocessor.

2.Parallelism and pipelining with in the
cpu:
 Parallel adderusing technique
 Carry-lookahead
 Carry-save
 High-speed multiplies recording and comcesence division are techniques for
exploring parallelism and sharing of hardware resources for the functions of
multiply and divide.
 Various phases of instruction executions are now pipelined->Inst/...fetching,
decode, operand fetch arithmetic logic execution and store result.
 Instruction prefetch and data buffering technique have been developed.
 Most commercial uniprocessor systems are now pipelined in their cpu with a
clock rate b/w 10 and 500ns.

3.Overlapped cpu and I/O operation:
 I/O operations can be performed simultaneously with the cpu
computations by using separate I/O controller, channels or I/O processor.
 DMA channel can be used to provide direct information transfer b/w the
I/O devices and the main memory.
 DMA is conducted on a cycle-stealing basis which is apparent to cpu.
 Back-end db machines can be used to manage large db stored on disk.

4.Use of hierarchical memory system:
 The inner most level is the register files directly addressable by ALU.
 Cache memory can be used to serve as a buffer b/w the cpu and main
memory.
 Block access of the main memory can be achieved through multiway
interleaving across parallel memory modules.
 V/memory space can be established with the use of disks and tape units
at the outer levels.

5.Balance of subsystems bandwidth:
 In general the cpu is the fastest unit in a computer, with a processor cycle
 tp of tens of nanosecond.
 The main memory has a cycle time tm of hundreds of nanosecond.
 The I/O devices are the slowest with an average access time td of a few
millisecond.
 td>tm>tp

Example:
 IBM 370/168
 td=5ms(disk)
 tm=320ns
 tp=80ns
 With these speed gaps b/w the subsystem we need to match their
processing bandwidth in order to avoid a system bottle neck problem.
 The bandwidth of a system is defined as the no/..of operation performed
per unit time.
 The memory bandwidth is measured by the no of memory words that can
be accessed per unit time.
 Bm=W/tm(word/s or byte/s)

6.Multiprogramming and time
sharing:
 With in the same time interval, there may be multiple processes active in
a computer computing for memory, I/O and cpu resources.
 Some computer programs are
 CPU bound(Computation intensive)
 I/O bound (I/O intensive)
 The program interlaving is internted to promote better resource
utilization through overlapping I/O&CPU operations.



 Whenever a process p1 is tied up with I/O operation, the system scheduler
can switch the cpu to process p2. This allows the simultaneous execution
of several programs in the system.
 When p2 is done, the cpu can be switched to p3.
 The overlapped I/O and cpu operations and the cpu wait time are greatly
reduced.
 This interleaving of cpu and I/O operations among several programs is
called multiprogramming.
Figure b:

Time sharing Fig 1.9c:
 M/pg on a uniprocessor is centered around the sharing of the cpu by
many problems.
 Sometimes a high-priority program may occupy the cpu for too long to
allow others to share.
 This problem can be overcome by using a time-sharing o/s.
 Multiprogramming by assigning fixed or variable time slice to
multiprograms.
 Equal opertunities are given no all programs computing for the use of
CPU.
 The execution time saved with time sharing may be greater than either
batch or multiprogram processing mode.

Parallel Computer Structures:
 We divide parallel computer into 3architectural configuration.
 1. Pipeline computerTemporal parallelism.
 2. Array processorSpatial parallelism(synchronous).
 3. Multiprocessor systemAsynchronous parallelism.

1.Pipeline computers:
 The process of executing an instruction in a digital computers involves 4
major steps.
 IF->Instruction fetch from the main memory.
 ID->Inst/...decoding->Identify the operation to be performed.
 OF->Operand fetch if need the execution and then.
 EX->Execution of the decoded arithmetic logic operation.
 In a nonpipelined computer, these 4 steps must be completed before the
next instruction can be issued.
 In pipelinedexecuted in a overlapped fashion.
 Stage to stage is triggered by a common clock of pipeline.

 Both scalar arithmetic pipeline and vector arithmetic pipelines are provided.
 The instruction preprocessing unit is itself pipelined with three stages shown.
 The of stage consists of two independents stages.
 One for fetching scalar operands.
 Vector operands.
 The scalar register are feeuel in quantity than the vector register because each
vector register implies a whole set of component register.
 Scalar processoract on a single data stream where as a vector processor
works on a 1D (vector) of numbers (multiple data stream).
 SIMD-Example of vector.
 Super scalar->Multiple instruction at once but from the same instruction
stream.
 Super scalar->MIMD->should not confused with MIMDused more in parallel
computing arch/.... becoz there are multiple instruction stream operating
independently.

2.Array computers:
 Is a synchronous parallel computer with multiple arithmetic logic units.
Called processing element (PE) that can operate in parallel in a lock step
fashion.
 The PE are synchronized to perform the same function at the same time.
 An appropriate data-routing mechanism must be established among the
PES.

Fig 1.12: (SIMD array processor)

 Scalar and control-type instructions are directly executed in the control unit
(CU).
 Each PEs consists of an ALU with registers and a local memory.
 The PEs are interconnected by a data-routing n/w.
 The interconnection pallern to be established for specific computation is under
program control from the cu.
 Vector instructions are broad cast to the PEs for distributed execution over
different component operands fetched directly from the local memories.
 Instruction fetch (from local memory or from the control memory) and decode
is done by the control unit.
 The PEs are passive devices without instruction decoding capabilities.
 Associative memorywhich is content addressable will also be treated there in
the contex of parallel processing.
 Array processor designed with associative memories called associative
processor.

3.Multiprocessor systems:
 Research and development of multiprocessor systems are aimed at
improving throughput, reliability, felexibility and availability.
 The system contains two or more processors of approxincately comparable
capabilities.
 All processors share access to common sets of memory modules, I/O
channels, and peripheral devices.
 Most importantly, the entire system must be controlled by a single integrated
o/s providing intractions b/w processors and their programs at various levels.
 Besides the shared memories and I/O devices each processor has its own
local memory and private devices.
 Interprocessor communications can be done through the shared memories
or through an interucpt n/w.
 Multiprocessort H/W sys organization is determind primarily by the
interconnection structure to be used b/w the memories and processor.

 Three different interconnection have been participated in the past.
 1. Time-shared common bus.
 2. Crossbar switch n/w.
 3. Multiport memories.









Advanced computer architecture

More Related Content

What's hot (20)

Similar to Advanced computer architecture (20)

Recently uploaded (20)

Advanced computer architecture