Tanzilur Rahman (Tnr)
CSE 332
Computer Organization and
Architecture
Lecture 2: Computer Performance
North South University
Performance
• Performance is the key to understanding underlying motivation
for the hardware and its organization
• Why is some hardware better than others for different
programs?
• What factors of system performance are hardware related?
(e.g., do we need a new machine, or a new operating system?)
3
Performance
What do we measure?
Define performance….
• How much faster is the Concorde compared to the 747?
• How much bigger is the Boeing 747 than the Douglas DC-8?
5
Defining Performance
Computer Performance: TIME, TIME, TIME!!!
• Response Time (elapsed time, latency):
• How long does it take to complete (start to finish) a task?
• Eg: how long must I wait for the database query?
Individual is more interested in response time. As a user of a smart phone/laptop,
the one that responds faster is better!
Response time (computer ): the total time required by computer to complete a
task including :
Disk access Memory access I/O activities OS overheads CPU exec. time etc
Individual user
concerns…
Computer Performance: TIME, TIME, TIME!!!
• Throughput:
• Total work done per unit time……(per hr,day etc)
• how many jobs can the machine run at once?
• what is the average execution rate?
• how much work is getting done?
Systems manager
concerns…
8
Response Time and Throughput
• If we upgrade a machine with a new processor what do we increase?
9
Relative Performance
10
Relative Performance
Execution Time
• Elapsed Time
• counts everything (disk and memory accesses, waiting for I/O, running other
programs, etc.) from start to finish
• a useful number, but often not good for comparison purposes
Elapsed time = CPU time + wait time (I/O, other programs, etc.)
• CPU time
• doesn't count waiting for I/O or time spent running other programs
• can be divided into user CPU time and system CPU time (OS calls)
CPU time = user CPU time + system CPU time
• Our focus:
• user CPU time (CPU execution time or, simply, execution time)
• time spent executing the lines of code that are in our program
• For easier writing, user CPU time has been termed simply as CPU time in rest of
the studies.
Execution Time
Summary of Execution Time
14
CPU Clocking
15
CPU Clocking
Processor
Clock
Transistors
16
Performance Equation - I
CPU Time
Example
• Our favorite program runs in 10 seconds on computer A, which has a
2Ghz. clock.
• We are trying to help a computer designer build a new machine B, that
will run this program in 6 seconds. The designer can use new (or
perhaps more expensive) technology to substantially increase the clock
rate, but has informed us that this increase will affect the rest of the CPU
design, causing machine B to require 1.2 times as many clock cycles as
machine A for the same program.
• What clock rate should we tell the designer to target?
18
CPU Time Example
19
CPU Time Example
20
CPU Time Example
21
No. of Clock Cycles
22
Performance Equation - II
Instruction Count and CPI
23
24
25
26
27
Factors Influencing Performance
28
Factors Influencing Performance
29
Factors Influencing Performance
30
Factors Influencing Performance
More to follow in ALU and Pipeline chapter
31
CPU Time Example
32
CPU Time Example
33
CPU Time Example
Self Help
• Suppose we have two implementations of the same instruction set
architecture (ISA). For some program:
• machine A has a clock cycle time of 10 ns. and a CPI of 2.0
• machine B has a clock cycle time of 20 ns. and a CPI of 1.2
• Which machine is faster for this program, and by how much?
• If two machines have the same ISA, which of our quantities (e.g., clock
rate, CPI, execution time, # of instructions, MIPS) will always be
identical?
CPI Example
• A compiler designer is trying to decide between two code sequences for
a particular machine.
• Based on the hardware implementation, there are three different classes
of instructions: Class A, Class B, and Class C,
• Which code sequence has the most instructions? Which sequence will be
faster? How much? What is the CPI for each sequence?
For different class of Instructions
CPI Example
Which code sequence has the most instructions?
Which sequence will be faster?
What is the CPI for each sequence
Self Help
• Two different compilers are being tested for a 500 MHz. machine with
three different classes of instructions: Class A, Class B, and Class C,
which require 1, 2 and 3 cycles (respectively). Both compilers are used
to produce code for a large piece of software.
• Compiler 1 generates code with 5 billion Class A instructions, 1 billion
Class B instructions, and 1 billion Class C instructions.
• Compiler 2 generates code with 10 billion Class A instructions, 1 billion
Class B instructions, and 1 billion Class C instructions.
• Which sequence will be faster according to MIPS?
• Which sequence will be faster according to execution time?
Example
Benchmarks
• Performance best determined by running a real application
• use programs typical of expected workload
• or, typical of expected class of applications
e.g., compilers/editors, scientific applications, graphics, etc.
• Benchmark suites
• Each vendor announces a SPEC rating for their system
• a measure of execution time for a fixed collection of programs
• is a function of a specific CPU, memory system, IO system, operating
system, compiler enables easy comparison of different systems
• The key is coming up with a collection of relevant programs
SPEC (System Performance Evaluation
Corporation)
• Sponsored by industry but independent and self-managed – trusted by
code developers and machine vendors
• Clear guides for testing, see www.spec.org
• Regular updates (benchmarks are dropped and new ones added
periodically according to relevance)
• Specialized benchmarks for particular classes of applications
42
SPEC CPU
• The 2006 version includes 12 integer and 17 floating-point applications
• The SPEC rating specifies how much faster a system is, compared to
a baseline machine – a system with SPEC rating 600 is 1.5 times
faster than a system with SPEC rating 400
• Note that this rating incorporates the behavior of all 29 programs – this
may not necessarily predict performance for your favorite program!
43
SPEC CPU
44
SPEC CPU
Summary
• Performance is specific to a particular program
• total execution time is a consistent summary of performance
• For a given architecture performance increases come from:
• increases in clock rate (without adverse CPI affects)
• improvements in processor organization that lower CPI
• compiler enhancements that lower CPI and/or instruction count
46
Important Trends
• Running out of ideas to improve single thread performance
• Power wall makes it harder to add complex features
• Power wall makes it harder to increase frequency
47
Power Wall
49
Energy ∝ Capacitive load X Voltage2
Energy ∝ ½ X Capacitive load X Voltage2
Power ∝ ½ X Capacitive load XVoltage2 X Frequency switched
The power required per transistor
The energy of a single transition
The energy of a pulse during the logic transition of 0 → 1 → 0 or 1 → 0 → 1
Power Wall
Power Wall
51
Energy ∝ Capacitive load X Voltage2
Energy ∝ ½ X Capacitive load X Voltage2
Power ∝ ½ X Capacitive load XVoltage2 X Frequency switched
The power required per transistor
The energy of a single transition
The energy of a pulse during the logic transition of 0 → 1 → 0 or 1 → 0 → 1
Power Wall
52
Power Wall

L-2 (Computer Performance).ppt

  • 1.
    Tanzilur Rahman (Tnr) CSE332 Computer Organization and Architecture Lecture 2: Computer Performance North South University
  • 2.
    Performance • Performance isthe key to understanding underlying motivation for the hardware and its organization • Why is some hardware better than others for different programs? • What factors of system performance are hardware related? (e.g., do we need a new machine, or a new operating system?)
  • 3.
  • 4.
    What do wemeasure? Define performance…. • How much faster is the Concorde compared to the 747? • How much bigger is the Boeing 747 than the Douglas DC-8?
  • 5.
  • 6.
    Computer Performance: TIME,TIME, TIME!!! • Response Time (elapsed time, latency): • How long does it take to complete (start to finish) a task? • Eg: how long must I wait for the database query? Individual is more interested in response time. As a user of a smart phone/laptop, the one that responds faster is better! Response time (computer ): the total time required by computer to complete a task including : Disk access Memory access I/O activities OS overheads CPU exec. time etc Individual user concerns…
  • 7.
    Computer Performance: TIME,TIME, TIME!!! • Throughput: • Total work done per unit time……(per hr,day etc) • how many jobs can the machine run at once? • what is the average execution rate? • how much work is getting done? Systems manager concerns…
  • 8.
    8 Response Time andThroughput • If we upgrade a machine with a new processor what do we increase?
  • 9.
  • 10.
  • 11.
    Execution Time • ElapsedTime • counts everything (disk and memory accesses, waiting for I/O, running other programs, etc.) from start to finish • a useful number, but often not good for comparison purposes Elapsed time = CPU time + wait time (I/O, other programs, etc.) • CPU time • doesn't count waiting for I/O or time spent running other programs • can be divided into user CPU time and system CPU time (OS calls) CPU time = user CPU time + system CPU time • Our focus: • user CPU time (CPU execution time or, simply, execution time) • time spent executing the lines of code that are in our program • For easier writing, user CPU time has been termed simply as CPU time in rest of the studies.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
    Example • Our favoriteprogram runs in 10 seconds on computer A, which has a 2Ghz. clock. • We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. • What clock rate should we tell the designer to target?
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
    22 Performance Equation -II Instruction Count and CPI
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
    30 Factors Influencing Performance Moreto follow in ALU and Pipeline chapter
  • 31.
  • 32.
  • 33.
  • 34.
    Self Help • Supposewe have two implementations of the same instruction set architecture (ISA). For some program: • machine A has a clock cycle time of 10 ns. and a CPI of 2.0 • machine B has a clock cycle time of 20 ns. and a CPI of 1.2 • Which machine is faster for this program, and by how much? • If two machines have the same ISA, which of our quantities (e.g., clock rate, CPI, execution time, # of instructions, MIPS) will always be identical?
  • 35.
    CPI Example • Acompiler designer is trying to decide between two code sequences for a particular machine. • Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, • Which code sequence has the most instructions? Which sequence will be faster? How much? What is the CPI for each sequence?
  • 36.
    For different classof Instructions
  • 37.
    CPI Example Which codesequence has the most instructions? Which sequence will be faster? What is the CPI for each sequence
  • 38.
    Self Help • Twodifferent compilers are being tested for a 500 MHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require 1, 2 and 3 cycles (respectively). Both compilers are used to produce code for a large piece of software. • Compiler 1 generates code with 5 billion Class A instructions, 1 billion Class B instructions, and 1 billion Class C instructions. • Compiler 2 generates code with 10 billion Class A instructions, 1 billion Class B instructions, and 1 billion Class C instructions. • Which sequence will be faster according to MIPS? • Which sequence will be faster according to execution time?
  • 39.
  • 40.
    Benchmarks • Performance bestdetermined by running a real application • use programs typical of expected workload • or, typical of expected class of applications e.g., compilers/editors, scientific applications, graphics, etc. • Benchmark suites • Each vendor announces a SPEC rating for their system • a measure of execution time for a fixed collection of programs • is a function of a specific CPU, memory system, IO system, operating system, compiler enables easy comparison of different systems • The key is coming up with a collection of relevant programs
  • 41.
    SPEC (System PerformanceEvaluation Corporation) • Sponsored by industry but independent and self-managed – trusted by code developers and machine vendors • Clear guides for testing, see www.spec.org • Regular updates (benchmarks are dropped and new ones added periodically according to relevance) • Specialized benchmarks for particular classes of applications
  • 42.
    42 SPEC CPU • The2006 version includes 12 integer and 17 floating-point applications • The SPEC rating specifies how much faster a system is, compared to a baseline machine – a system with SPEC rating 600 is 1.5 times faster than a system with SPEC rating 400 • Note that this rating incorporates the behavior of all 29 programs – this may not necessarily predict performance for your favorite program!
  • 43.
  • 44.
  • 45.
    Summary • Performance isspecific to a particular program • total execution time is a consistent summary of performance • For a given architecture performance increases come from: • increases in clock rate (without adverse CPI affects) • improvements in processor organization that lower CPI • compiler enhancements that lower CPI and/or instruction count
  • 46.
    46 Important Trends • Runningout of ideas to improve single thread performance • Power wall makes it harder to add complex features • Power wall makes it harder to increase frequency
  • 47.
  • 48.
  • 49.
    49 Energy ∝ Capacitiveload X Voltage2 Energy ∝ ½ X Capacitive load X Voltage2 Power ∝ ½ X Capacitive load XVoltage2 X Frequency switched The power required per transistor The energy of a single transition The energy of a pulse during the logic transition of 0 → 1 → 0 or 1 → 0 → 1 Power Wall
  • 50.
  • 51.
    51 Energy ∝ Capacitiveload X Voltage2 Energy ∝ ½ X Capacitive load X Voltage2 Power ∝ ½ X Capacitive load XVoltage2 X Frequency switched The power required per transistor The energy of a single transition The energy of a pulse during the logic transition of 0 → 1 → 0 or 1 → 0 → 1 Power Wall
  • 52.