VECTOR
COMPUTING
1
PRESENTED BY
VECTOR PROCESSOR
• Vector processors are special purpose computers
that match a range of (scientific) computing tasks.
• vector processors provide vector instructions. These
instructions operate in a pipeline .
3
OBJECTIVE
• Small Programs size
• No wastage
• Feeding of functional unit(FU) and the register
buses
4
HOW IT WORKS?
5
HOW IT WORKS?
6
OPERATIONS
• Add two vectors to produce a third.
• Subtract two vectors to produce a third
• Multiply two vectors to produce a third
• Divide two vectors to produce a third
• Load a vector from memory
• Store a vector to memory.
7
ARCHITECTURE
8
PROPERTIES
• Vector processors reduce the fetch and decode
bandwidth as the number of instructions fetched are
less.
• They also exploit data parallelism in large scientific
and multimedia applications.
• Many performance optimization schemes are used
in vector processors.
• Strip mining is used to generate code so that vector
operation is possible for vector operands whose size
is less than or greater than the size of vector
registers.
9
PROPERTIES
• Vector chaining the equivalent of forwarding in
vector processors - is used in case of data
dependency among vector instructions.
• Special scatter and gather instructions are provided
to efficiently operate on sparse matrices.
• Instruction are designed with the property that all
vector arithmetic instructions only allow element N
of one vector register to take part in operations with
element N from other vector registers.
10
PROPERTIES
• Based on how the operands are fetched, vector
processors can be divided into two categories - in
memory-memory architecture operands are directly
streamed to the functional units from the memory
and results are written back to memory as the vector
operation proceeds. In vector-register architecture,
operands are read into vector registers from which
they are fed to the functional units and results of
operations are written to vector registers.
11
ADVANTAGES
• Data can be represented at its original resolution and
form without generalization.
• Accurate location of data is maintained.
• Efficient encoding of topology, and as a result more
efficient operations.
• Mature, developed compiler technology
• Compact: Describe N operations with 1 short
instruction
12
SOME VECTOR PROCESSORS
13
NEW TERMS FOR VECTOR PROCESSORS
• Initiation rate
consuming operands
producing new results.
• Chime
timing measure
vector sequence
ignores the startup overhead for a vector operation.
14
• Convoy
is the set of vector instructions
potentially begin execution together in one clock period.
must complete before new instructions can begin.
• vector start-up time
overhead to start execution
related to the pipeline depth
NEW TERMS FOR VECTOR PROCESSORS
15
PROPOSED VECTOR PROCESSOR
• CODE (Clustered Organization for Decoupled
Execution) is a proposed vector architecture which
will overcome the some limitations of conventional
vector processors.
16
REASONS
• Complexity of central vector register files(VRF) - In
a processor with N vector functional units(VFU),
the register file needs approximately 3N access
ports. VRF area, power consumption and latency
are proportional to O(N*N), O(log N) and O(N)
respectively.
• Difficult to implement precise implementation - In
order to implement in-order commit, a large ROB is
needed with at least one vector register per VFU.
17
• In order to support virtual memory, large TLB is
needed so that TLB has enough entries to translate
all virtual addresses generated by a vector
instruction.
• Vector processors need expensive on-chip memory
for low latency.
REASONS
18
SOME FEATURES OF CODE
• Vector registers are organized in the form of clusters
in CODE architecture.
• CODE can hide communication latency by forcing
the output interface to look ahead into the
instruction queue and start executing register move
instructions.
• CODE supports precise exception using a history
buffer.
• In order to reduce the size of TLB.
• CODE proposes an ISA level change.
19
The Effect of cache design into
vector computers
• Numerical programs
 data sets that are too large for the current cache sizes.
Sweep accesses of a large vector
result in complete reloading of the cache
• achieve high memory bandwidth
Register files
highly interleaved memories
• Address sequentiation
20
Proposals of cache schemes
• Proposals such as prime-mapped cache schemes
have been proposed and studied. The new cache
organization minimizes cache misses caused by
cache line interferences that have been shown to be
critical in numerical applications.
• The cache lookup time of the new mapping scheme
keeps the same as conventional caches. Generation
of cache addresses for accessing the prime-mapped
cache can be done in parallel with normal address
calculations.
21
Conclusion
• Vector supercomputers
• Vector instruction
• Commodity technology like SMT
• Superscalar microprocessor
• Embedded and multimedia applications
22
References:
• J.L. Hennessy and D.A. Patterson, Computer Architecture, A
Quantitative Approach. Morgan Kaufmann, 1990.
• http://csep1.phy.ornl.gov/ca/node24.html
• http://www.comp.nus.edu.sg/~johnm/cs3220/l21.htm
• http://penta-performance.com/sager/vector/Default_vector2.htm
• www.google.com
• www.wikipedia.com
• www.youtube.com
23
24

Vector computing

  • 1.
  • 2.
  • 3.
    VECTOR PROCESSOR • Vectorprocessors are special purpose computers that match a range of (scientific) computing tasks. • vector processors provide vector instructions. These instructions operate in a pipeline . 3
  • 4.
    OBJECTIVE • Small Programssize • No wastage • Feeding of functional unit(FU) and the register buses 4
  • 5.
  • 6.
  • 7.
    OPERATIONS • Add twovectors to produce a third. • Subtract two vectors to produce a third • Multiply two vectors to produce a third • Divide two vectors to produce a third • Load a vector from memory • Store a vector to memory. 7
  • 8.
  • 9.
    PROPERTIES • Vector processorsreduce the fetch and decode bandwidth as the number of instructions fetched are less. • They also exploit data parallelism in large scientific and multimedia applications. • Many performance optimization schemes are used in vector processors. • Strip mining is used to generate code so that vector operation is possible for vector operands whose size is less than or greater than the size of vector registers. 9
  • 10.
    PROPERTIES • Vector chainingthe equivalent of forwarding in vector processors - is used in case of data dependency among vector instructions. • Special scatter and gather instructions are provided to efficiently operate on sparse matrices. • Instruction are designed with the property that all vector arithmetic instructions only allow element N of one vector register to take part in operations with element N from other vector registers. 10
  • 11.
    PROPERTIES • Based onhow the operands are fetched, vector processors can be divided into two categories - in memory-memory architecture operands are directly streamed to the functional units from the memory and results are written back to memory as the vector operation proceeds. In vector-register architecture, operands are read into vector registers from which they are fed to the functional units and results of operations are written to vector registers. 11
  • 12.
    ADVANTAGES • Data canbe represented at its original resolution and form without generalization. • Accurate location of data is maintained. • Efficient encoding of topology, and as a result more efficient operations. • Mature, developed compiler technology • Compact: Describe N operations with 1 short instruction 12
  • 13.
  • 14.
    NEW TERMS FORVECTOR PROCESSORS • Initiation rate consuming operands producing new results. • Chime timing measure vector sequence ignores the startup overhead for a vector operation. 14
  • 15.
    • Convoy is theset of vector instructions potentially begin execution together in one clock period. must complete before new instructions can begin. • vector start-up time overhead to start execution related to the pipeline depth NEW TERMS FOR VECTOR PROCESSORS 15
  • 16.
    PROPOSED VECTOR PROCESSOR •CODE (Clustered Organization for Decoupled Execution) is a proposed vector architecture which will overcome the some limitations of conventional vector processors. 16
  • 17.
    REASONS • Complexity ofcentral vector register files(VRF) - In a processor with N vector functional units(VFU), the register file needs approximately 3N access ports. VRF area, power consumption and latency are proportional to O(N*N), O(log N) and O(N) respectively. • Difficult to implement precise implementation - In order to implement in-order commit, a large ROB is needed with at least one vector register per VFU. 17
  • 18.
    • In orderto support virtual memory, large TLB is needed so that TLB has enough entries to translate all virtual addresses generated by a vector instruction. • Vector processors need expensive on-chip memory for low latency. REASONS 18
  • 19.
    SOME FEATURES OFCODE • Vector registers are organized in the form of clusters in CODE architecture. • CODE can hide communication latency by forcing the output interface to look ahead into the instruction queue and start executing register move instructions. • CODE supports precise exception using a history buffer. • In order to reduce the size of TLB. • CODE proposes an ISA level change. 19
  • 20.
    The Effect ofcache design into vector computers • Numerical programs  data sets that are too large for the current cache sizes. Sweep accesses of a large vector result in complete reloading of the cache • achieve high memory bandwidth Register files highly interleaved memories • Address sequentiation 20
  • 21.
    Proposals of cacheschemes • Proposals such as prime-mapped cache schemes have been proposed and studied. The new cache organization minimizes cache misses caused by cache line interferences that have been shown to be critical in numerical applications. • The cache lookup time of the new mapping scheme keeps the same as conventional caches. Generation of cache addresses for accessing the prime-mapped cache can be done in parallel with normal address calculations. 21
  • 22.
    Conclusion • Vector supercomputers •Vector instruction • Commodity technology like SMT • Superscalar microprocessor • Embedded and multimedia applications 22
  • 23.
    References: • J.L. Hennessyand D.A. Patterson, Computer Architecture, A Quantitative Approach. Morgan Kaufmann, 1990. • http://csep1.phy.ornl.gov/ca/node24.html • http://www.comp.nus.edu.sg/~johnm/cs3220/l21.htm • http://penta-performance.com/sager/vector/Default_vector2.htm • www.google.com • www.wikipedia.com • www.youtube.com 23
  • 24.