Lecture1_Computer Abstractions and Technology v2
Lecture1_Computer Abstractions and Technology v2
Computer Organization
NYCU EE / IEE
The Computer Revolution
P2
Hollerith Tabulating Machine – CTR (1990)
P3
From CTR to IBM (1896 → 1924)
P4
Turing Machine
P5
Von Neumann Model
P7
ENIAC (1945)
P8
First Transistor (1947)
P9
Google’s first server (1999)
P10
Moore’s Law and Technology Scaling
❖ “… the performance of an IC, including the number components
on it, doubles every 18-24 months with the chip price …”
– Gordon Moore 1965
P11
Moore’s Law in Action
❖ Example:
❖ Intel Core i7 microprocessor evolution
➢ Six-core Core i7 (Gulftown), 2010
➢ 32nm technology
➢ 1,170 M transistors
➢ Die size: 240 mm2
➢ Transistor density: 4.875 M/mm2
Source: Intel
P12
Reference Size
P14
Classes of Computers
❖ Personal computers
❖ General purpose, variety of software
❖ Subject to cost/performance tradeoff
❖ Server computers
❖ Network based
❖ High capacity, performance, reliability
❖ Range from small servers to building sized
❖ Supercomputers
❖ High-end scientific and engineering calculations
❖ Highest capability but represent a small fraction of the overall computer
market
❖ Embedded computers
❖ Hidden as components of systems
❖ Stringent power/performance/cost constraints
P15
Understanding Performance
❖ Algorithm
❖ Determines number of operations executed
P16
Eight Great Ideas
❖ Hierarchy of memories
P17
Below Your Program
❖ Application software
❖ Written in high-level language
❖ System software
❖ Compiler: translates HLL code to machine code
❖ Operating System: service code
➢ Handling input/output
➢ Managing memory and storage
➢ Scheduling tasks & sharing resources
❖ Hardware
❖ Processor, memory, I/O controllers
P18
Levels of Program Code
❖ High-level language
❖ Level of abstraction closer to problem
domain
❖ Provides for productivity and portability
❖ Assembly language
❖ Textual representation of instructions
❖ Hardware representation
❖ Binary digits (bits)
❖ Encoded instructions and data
P19
Semiconductor Technology
❖ Silicon: semiconductor
❖ Add materials to transform properties:
❖ Conductors
❖ Insulators
❖ Switch
P20
Technology Trends
❖ Electronics technology
continues to evolve
❖ Increased capacity and
performance
❖ Reduced cost
DRAM capacity
P21
VLSI IC Technology
Wiring levels 7 9 10 10
P23
Intel Core i7 Wafer
P24
Integrated Circuit Cost
P25
Performance
❖ Both require
❖basis for comparison
❖metric for evaluation
❖ Speedup of X relative to Y
❖ Execution timeY / Execution timeX
❖ B is n times faster than A
➢ Means exec_time_A/exec_time_B == rate_B/rate_A
❖ Execution time
❖ Wall clock time (or response time, or elapsed time): includes all system
overheads
❖ CPU time: only computation time
P27
Measuring Performance
P28
Real Performance Measurement
❖ Benchmark suites
❖ Attempts at running programs that ate much simpler than a real application
have led to performance pitfalls
P29
Improve Performance by
❖ Changing the
❖ algorithm (ex. Bubble sorting vs. Quick sorting)
❖ data structures (ex. Structure vs. pointer)
❖ programming language (ex. C vs. Matlab)
❖ compiler
❖ compiler optimization flags
❖ OS parameters (ex. Windows vs. Linux)
P30
CPU Clocking
❖ Operation of digital hardware governed by a constant-rate clock
Clock (cycles)
Data transfer
and computation
Update state
P31
CPU Time
P32
CPU Time Example
❖ Computer A: 2GHz clock, 10s CPU time
❖ Designing Computer B
❖ Aim for 6s CPU time
❖ Can do faster clock, but causes 1.2 × clock cyclesA
❖ How fast must Computer B clock be?
P34
What is Instruction Count?
sub $r1,$r2,$r3
Loop: beq $r9,$r0,End
add $r8,$r8,$r10
addi $r9,$r9,-1 10 times => 41 instructions
j Loop
End:
Dynamic Instruction Count
P35
CPI Example
❖ Computer A: Cycle Time = 250ps, CPI = 2.0
❖ Computer B: Cycle Time = 500ps, CPI = 1.2
❖ Same ISA
❖ Which is faster, and by how much?
P36
CPI in More Detail
❖ If different instruction classes take different numbers of cycles
n
Clock Cycles = (CPIi Instruction Counti )
i=1
Clock Cycles n
Instruction Count i
CPI = = CPIi
Instruction Count i=1 Instruction Count
Relative frequency
Instruction Frequency
P37
CPI Example
❖ Alternative compiled code sequences using instructions in classes A, B,
C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
❖ Sequence 1: IC =5 ❖ Sequence 2: IC = 6
❖ Clock Cycles ❖ Clock Cycles
= 2×1 + 1×2 + 2×3 = 10 = 4×1 + 1×2 + 1×3
❖ Avg. CPI = 10/5 = 2.0 =9
❖ Avg. CPI = 9/6 = 1.5
P38
Performance Summary
P39
Performance Summary
❖ Performance depends on
❖ Algorithm: affects IC, possibly CPI
❖ Programming language: affects IC, CPI
❖ Compiler: affects IC, CPI
❖ Instruction set architecture: affects IC, CPI, Tc (i.e., clock cycle time)
P40
MIPS as a Performance Measure
Instruct. count
MIPS =
Execution time 106
Instruct. count Clock rate
= =
Instruct. count CPI CPI 10 6
10 6
Clock rate
P41
Pitfall: MIPS as a Performance Metric
❖ MIPS: Millions of Instructions Per Second
❖ Doesn’t account for
➢ Differences in ISAs between computers
➢ Differences in complexity between instructions
Instructio n count
MIPS =
Execution time 106
Instructio n count Clock rate
= =
Instructio n count CPI CPI 10 6
10 6
Clock rate
P42
Example
P43
Example
Code from Instruction counts (in billions)
A B C
Compiler 1 5 1 1
Compiler 2 10 1 1
CPI 1 2 3
P44
Amdahl’s Law: A method to quantify
performance speedup
Gene Amdahl
Computer Pioneer
9/8/2024
P45
Amdahl’s Law
❖ Amdahl’s law gives us a quick way to find the speedup from some
enhancement, which depends on two factors:
❖ Fraction of the computation time in the original computer that can be
converted to take advantage of the enhancement
➢ How much fraction of the computation time can be enhanced?
➢ The fraction is always less than 1
❖ How much faster the task would run if the enhanced mode were used for
the entire program
➢ Speedup = (execution time without enhance.) / (execution time with enhance.)
= (time without) / (time with) = Tw/o / Tw/
P46
Example
Answer:
Fractionenhanced=0.4
Speedupenhanced=10
1 1
Speedupoverall= 0.4 = ≈ 1.56
0.6+ 10 0.64
P47
Power Trends
❖ In CMOS IC technology
×40 5V → 1V ×1000
P48
Reducing Power
❖ Suppose a new CPU has
❖ 85% of capacitive load of old CPU
❖ 15% voltage and 15% frequency reduction
P49
Power and Energy
P50
Case Study: Profile of Intel i7-8850H
P51
Power
P52
Power vs. Thermal Issue
❖ Distributing the power, removing the heat, and preventing hot spot
have become increasingly difficult challenges.
❖ Techniques for reducing power
❖ Do nothing well
❖ Dynamic Voltage-Frequency Scaling
❖ Low power state for DRAM, disks
❖ Overclocking, turning off cores
P53
Case Study: Intel© Turbo Boost Technology 2.0
❖ Features
❖ Under a Thermal Design Power(TDP), maximize the frequency state
depends on the workload and:
➢ Number of active cores
➢ Estimated current consumption
➢ Estimated power consumption
➢ Processor temperature
❖ Commercial Products
❖ Intel Core i7-2xxx series
❖ Intel Core i5-2xxx series
❖ Intel Xeon E3-12xx series
RISC
P55
Multiprocessors
❖ Multicore microprocessors
❖ More than one processor per chip
P56
Why Do I Want to Know These?
I/O Chan
Link
API
ISA
Technology Interfaces (ISA)
Historic IR
Background, Regs
Trend
Machine Organization
Computer
Applications Architect Measurement &
Evaluation
P57
In Fact, Architecture Design Is an Iterative
Process
Estimate
Cost & Sort
Performance
Historical background
and understanding of
trends help the Good
selection process Mediocre ideas
ideas
P58