0% found this document useful (0 votes)

6 views58 pages

Lecture1_Computer Abstractions and Technology v2

The document discusses the evolution of computer organization, highlighting key milestones such as the Hollerith Tabulating Machine, the Turing Machine, and the Von Neumann model. It explains the impact of Moore's Law on technology scaling and the classification of computers into categories like personal, server, and supercomputers. Additionally, it covers performance metrics, instruction count, and the factors influencing CPU performance.

Uploaded by

matt33768.ee11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views58 pages

Lecture1_Computer Abstractions and Technology v2

Uploaded by

matt33768.ee11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

計算機組織

Computer Organization

Computer Abstractions and

Technology

Kun-Chih (Jimmy) Chen 陳坤志

[email protected]
Institute of Electronics,
National Yang Ming Chiao Tung University

NYCU EE / IEE
The Computer Revolution

❖ Progress in computer technology

❖ Underpinned by Moore’s Law
❖ Makes novel applications feasible
❖ Computers in automobiles
❖ Cell phones
❖ Human genome project
❖ World Wide Web
❖ Search Engines
❖ Computers are pervasive

P2
Hollerith Tabulating Machine – CTR (1990)

❖ It opened the world’s eyes to the very idea of data processing.

❖ Beginning with the 1890 US census
❖ Save the data collection time from 144.5 hours to 72.5 hours
❖ Save the tabulating time from 55.5 hours to 5.5 hours

P3
From CTR to IBM (1896 → 1924)

❖ Herman Hollerith is the founder of Computing Tabulating Recording

(CTR)
❖ The main business is census (人口普查)
❖ Thomas J. Watson rebrand CTR to International Business Machines
Corporation (IBM)

P4
Turing Machine

❖ The idea of a universal computational device was first described by

Alan Turing in 1937.
❖ all computation could be performed by a special kind of a machine

P5
Von Neumann Model

❖ Around 1944–1945, John von Neumann proposed that

❖ since program and data are logically the same, programs should also be
stored in the memory of a computer.

Von Neumann with the first Institute computer

P6
Four Subsystems
❖ Computers built on the von Neumann model divide the computer
hardware into four subsystems
❖ Memory: 電路學、電子學、超大型積體電路
❖ arithmetic logic unit: 邏輯設計、計算機結構、數位系統設計、超大型積體
電路
❖ control unit: 邏輯設計、計算機結構、數位系統設計…
❖ Input/output

P7
ENIAC (1945)

❖ ENIAC (Electronic Numerical Integrator and Computer) was the first

programmable, electronic, general-purpose digital computer

P8
First Transistor (1947)

❖ In celebration of ENIAC’s 50th anniversary, the machine was

reimplemented using CMOS technology in 1995. The room—size
computer could now fit in the palm of your hand!

P9
Google’s first server (1999)

P10
Moore’s Law and Technology Scaling
❖ “… the performance of an IC, including the number components
on it, doubles every 18-24 months with the chip price …”
– Gordon Moore 1965

P11
Moore’s Law in Action
❖ Example:
❖ Intel Core i7 microprocessor evolution
➢ Six-core Core i7 (Gulftown), 2010
➢ 32nm technology
➢ 1,170 M transistors
➢ Die size: 240 mm2
➢ Transistor density: 4.875 M/mm2

➢ 10-core Core i7 (Broadwell-E), 2016

➢ 14nm technology
➢ 3,200 M transistors
➢ Die size: 246 mm2
➢ Transistor density: 13.008 M/mm2

Source: Intel
P12
Reference Size

❖ Cell size (diameter): 10 – 100 um

❖ Erythrocyte size (diameter): 6 – 9 um
❖ Protein size (diameter): 1 – 20 nm
❖ DNA molecule size (diameter): 2 nm
P13
IC 60 (1958 - 2018)

P14
Classes of Computers

❖ Personal computers
❖ General purpose, variety of software
❖ Subject to cost/performance tradeoff

❖ Server computers
❖ Network based
❖ High capacity, performance, reliability
❖ Range from small servers to building sized

❖ Supercomputers
❖ High-end scientific and engineering calculations
❖ Highest capability but represent a small fraction of the overall computer
market

❖ Embedded computers
❖ Hidden as components of systems
❖ Stringent power/performance/cost constraints
P15
Understanding Performance

❖ Algorithm
❖ Determines number of operations executed

❖ Programming language, compiler, architecture

❖ Determine number of machine instructions executed per operation

❖ Processor and memory system

❖ Determine how fast instructions are executed

❖ I/O system (including OS)

❖ Determines how fast I/O operations are executed

P16
Eight Great Ideas

❖ Design for Moore’s Law

❖ Use abstraction to simplify design

❖ Make the common case fast

❖ Performance via parallelism

❖ Performance via pipelining

❖ Performance via prediction

❖ Hierarchy of memories

❖ Dependability via redundancy

P17
Below Your Program
❖ Application software
❖ Written in high-level language

❖ System software
❖ Compiler: translates HLL code to machine code
❖ Operating System: service code
➢ Handling input/output
➢ Managing memory and storage
➢ Scheduling tasks & sharing resources

❖ Hardware
❖ Processor, memory, I/O controllers

P18
Levels of Program Code
❖ High-level language
❖ Level of abstraction closer to problem
domain
❖ Provides for productivity and portability

❖ Assembly language
❖ Textual representation of instructions

❖ Hardware representation
❖ Binary digits (bits)
❖ Encoded instructions and data

P19
Semiconductor Technology

❖ Silicon: semiconductor
❖ Add materials to transform properties:
❖ Conductors
❖ Insulators
❖ Switch

P20
Technology Trends

❖ Electronics technology
continues to evolve
❖ Increased capacity and
performance
❖ Reduced cost

DRAM capacity

Year Technology Relative performance/cost

1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2013 Ultra large scale IC 250,000,000,000

P21
VLSI IC Technology

2001 2005 2010 2016

Line width (nm) 130 80 45 22

Clock (GHz) 1.7 5.2 11.5 28.8

DRAM cost
7.7 1.9 0.34 0.042
(microcents/bit)
MPU cost
97 24 4.31 0.54
(microcent/trans)
Supply voltage(V) 1.2 1.0 0.8 0.6

Wiring levels 7 9 10 10

cost per transistor chip density

P22
Manufacturing ICs

❖ Yield: proportion of working dies per wafer

P23
Intel Core i7 Wafer

❖ 300mm wafer, 280 chips, 32nm technology

❖ Each chip is 20.7 x 10.5 mm

P24
Integrated Circuit Cost

Cost per wafer

Cost per die =
Dies per wafer  Yield
Dies per wafer  Wafer area Die area
1
Yield =
(1+ (Defects per area  Die area/2)) 2

❖ Nonlinear relation to area and defect rate

❖ Wafer cost and area are fixed
❖ Defect rate determined by manufacturing process
❖ Die area determined by architecture and circuit design

P25
Performance

❖ Purchasing perspective: Given a collection of machines, which has the

❖best performance? least cost?
❖best performance/cost?

❖ Design perspective: Faced with design options, which has the

❖best performance improvement? least cost?
❖best performance improvement/cost?

❖ Both require
❖basis for comparison
❖metric for evaluation

❖ Goal: understand cost and

performance implications of
architectural choices
P26
Performance Measures

❖ Typical performance metrics

❖ Response time (latency): time between start and completion
❖ Throughput (bandwidth) -- rate: work done per unit time

❖ Speedup of X relative to Y
❖ Execution timeY / Execution timeX
❖ B is n times faster than A
➢ Means exec_time_A/exec_time_B == rate_B/rate_A

❖ Execution time
❖ Wall clock time (or response time, or elapsed time): includes all system
overheads
❖ CPU time: only computation time

P27
Measuring Performance

❖ Time is the measure of computer performance

❖ Elapsed time (Wall-clock time) = program execution + I/O + wait

❖ important to user

❖ Execution time = user time + system time (but OS self measurement

may be inaccurate)

❖ CPU performance = user time on unloaded system

❖ important to architect

P28
Real Performance Measurement

❖ Benchmark suites
❖ Attempts at running programs that ate much simpler than a real application
have led to performance pitfalls

❖ Performance is the result of executing a workload on a configuration

❖ Workload = program + input

❖ Configuration = CPU + cache + memory + I/O + OS + compiler

optimizations

❖ compiler optimizations can make a huge difference!

P29
Improve Performance by

❖ Changing the
❖ algorithm (ex. Bubble sorting vs. Quick sorting)
❖ data structures (ex. Structure vs. pointer)
❖ programming language (ex. C vs. Matlab)
❖ compiler
❖ compiler optimization flags
❖ OS parameters (ex. Windows vs. Linux)

❖ Improving locality of memory or I/O accesses

❖ Overlapping I/O (ex. Single direction port vs. bi-direction port)

❖ On multiprocessors, you can improve performance by avoiding cache

coherency problems (e.g., false sharing) and synchronization
problems

P30
CPU Clocking
❖ Operation of digital hardware governed by a constant-rate clock

❖ Clock period: duration of a clock cycle

❖ e.g., 250ps = 0.25ns = 250×10–12s

❖ Clock frequency (rate): cycles per second

❖ e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Clock period

Clock (cycles)

Data transfer
and computation
Update state

P31
CPU Time

CPU Time = CPU Clock Cycles  Clock Cycle Time

CPU Clock Cycles
=
Clock Rate
❖ Performance improved by
❖ Reducing number of clock cycles
❖ Increasing clock rate
❖ Hardware designer must often trade off clock rate against cycle count

P32
CPU Time Example
❖ Computer A: 2GHz clock, 10s CPU time
❖ Designing Computer B
❖ Aim for 6s CPU time
❖ Can do faster clock, but causes 1.2 × clock cyclesA
❖ How fast must Computer B clock be?

Clock CyclesB 1.2  Clock CyclesA

Clock RateB = =
CPU Time B 6s
Clock CyclesA = CPU Time A  Clock Rate A
= 10s  2GHz = 20  109
1.2  20  109 24  109
Clock RateB = = = 4GHz
6s 6s
P33
Instruction Count and CPI

Clock Cycles = Instruction Count  Cycles per Instruction

CPU Time = Instruction Count  CPI  Clock Cycle Time
Instruction Count  CPI
=
Clock Rate
❖ Instruction Count for a program
❖ Determined by program, ISA and compiler
❖ Average cycles per instruction
❖ Determined by CPU hardware
❖ If different instructions have different CPI
➢ Average CPI affected by instruction mix

P34
What is Instruction Count?

❖ How many C operations？

for(i=0; i<10; i++)

a[i] = b[i] * c[i];

❖ How many Assembly instructions？

sub $r1,$r2,$r3
Loop: beq $r9,$r0,End
add $r8,$r8,$r10
addi $r9,$r9,-1 10 times => 41 instructions
j Loop
End:
Dynamic Instruction Count

P35
CPI Example
❖ Computer A: Cycle Time = 250ps, CPI = 2.0
❖ Computer B: Cycle Time = 500ps, CPI = 1.2
❖ Same ISA
❖ Which is faster, and by how much?

CPU Time = Instruction Count  CPI  Cycle Time

A A A
= I  2.0  250ps = I  500ps A is faster…
CPU Time = Instruction Count  CPI  Cycle Time
B B B
= I  1.2  500ps = I  600ps
CPU Time
B = I  600ps = 1.2
…by this much
CPU Time I  500ps
A

P36
CPI in More Detail
❖ If different instruction classes take different numbers of cycles

n
Clock Cycles =  (CPIi  Instruction Counti )
i=1

❖ Weighted average CPI

Clock Cycles n
 Instruction Count i 
CPI = =   CPIi  
Instruction Count i=1  Instruction Count 

Relative frequency
Instruction Frequency

P37
CPI Example
❖ Alternative compiled code sequences using instructions in classes A, B,
C

Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1

❖ Sequence 1: IC =5 ❖ Sequence 2: IC = 6
❖ Clock Cycles ❖ Clock Cycles
= 2×1 + 1×2 + 2×3 = 10 = 4×1 + 1×2 + 1×3
❖ Avg. CPI = 10/5 = 2.0 =9
❖ Avg. CPI = 9/6 = 1.5

P38
Performance Summary

CPU execution time for program (designer’s view)

= Clock Cycles for program x Clock Cycle Time

❖ Substituting for clock cycles:

CPU execution time for program (user’s view)

= Instruction Count x CPI x Clock Cycle Time

CPU time = Instructions  Cycles  Seconds

Program Instruction Cycle

P39
Performance Summary

The BIG Picture

Instructio ns Clock cycles Seconds

CPU Time =  
Program Instructio n Clock cycle

❖ Performance depends on
❖ Algorithm: affects IC, possibly CPI
❖ Programming language: affects IC, CPI
❖ Compiler: affects IC, CPI
❖ Instruction set architecture: affects IC, CPI, Tc (i.e., clock cycle time)

P40
MIPS as a Performance Measure

❖ MIPS: Million Instructions Per Second

Instruct. count
MIPS =
Execution time  106
Instruct. count Clock rate
= =
Instruct. count  CPI CPI  10 6
 10 6

Clock rate

P41
Pitfall: MIPS as a Performance Metric
❖ MIPS: Millions of Instructions Per Second
❖ Doesn’t account for
➢ Differences in ISAs between computers
➢ Differences in complexity between instructions

Instructio n count
MIPS =
Execution time  106
Instructio n count Clock rate
= =
Instructio n count  CPI CPI  10 6
 10 6

Clock rate

➢ CPI varies between programs on a given CPU

P42
Example

❖ Consider two compilers with three instruction classes, A, B , and

C, and the CPIs of them are 1, 2, and 3, respectively. Suppose
we measure the code for the same program from two different
compilers and obtain the following data:

Code from Instruction counts (in billions=109)

A B C
Compiler 1 5 1 1
Compiler 2 10 1 1

❖ Assume that the machine’s clock rate is 4GHz. Which code

sequence will execute faster according to MIPS rate? According
to execution time?

P43
Example
Code from Instruction counts (in billions)
A B C
Compiler 1 5 1 1
Compiler 2 10 1 1
CPI 1 2 3

❖ CPU clock cycle1 = (51+12+13)109 =10*109

❖ CPU clock cycle2 = (10*1+1*2+1*3)*109 =15*109
❖ Execution time1 = (10*109) / (4*109) = 2.5 Sec
❖ Execution time2 = (15*109) / (4*109) = 3.75 Sec
❖ MIPS = IC / (Exe time * 106 )
❖ MIPS1 = [(5+1+1) *109] / (2.5 * 106 ) = 2800
❖ MIPS2 = [(10+1+1) *109] / (3.75 * 106 ) = 3200

P44
Amdahl’s Law: A method to quantify
performance speedup

Gene Amdahl
Computer Pioneer

9/8/2024
P45
Amdahl’s Law

❖ Amdahl’s law gives us a quick way to find the speedup from some
enhancement, which depends on two factors:
❖ Fraction of the computation time in the original computer that can be
converted to take advantage of the enhancement
➢ How much fraction of the computation time can be enhanced?
➢ The fraction is always less than 1
❖ How much faster the task would run if the enhanced mode were used for
the entire program
➢ Speedup = (execution time without enhance.) / (execution time with enhance.)
= (time without) / (time with) = Tw/o / Tw/

P46
Example

Suppose we are considering an enhancement to a web server. The

enhanced CPU is 10 times faster on computation but the same speed on
I/O. Suppose also that 60% of the time is waiting on I/O

Answer:
Fractionenhanced=0.4
Speedupenhanced=10
1 1
Speedupoverall= 0.4 = ≈ 1.56
0.6+ 10 0.64

P47
Power Trends

❖ In CMOS IC technology

Power = Capacitive load  Voltage 2  Frequency

×40 5V → 1V ×1000

P48
Reducing Power
❖ Suppose a new CPU has
❖ 85% of capacitive load of old CPU
❖ 15% voltage and 15% frequency reduction

Pnew Cold  0.85  (Vold  0.85) 2  Fold  0.85

= 2
= 0.85 4
= 0.52
Pold Cold  Vold  Fold

❖ The power wall

❖ We can’t reduce voltage further
❖ We can’t remove more heat

❖ How else can we improve performance?

P49
Power and Energy

❖ Problem: Get power in, get power out

❖ Thermal Design Power (TDP)

❖ Characterizes sustained power consumption
❖ Lower than peak power, higher than average power consumption
❖ Used as target for power supply and cooling system
➢ Two kinds of thermal management in the modern processors

❖ Clock rate can be reduced dynamically to limit power consumption

❖ Energy per task is often a better measurement

P50
Case Study: Profile of Intel i7-8850H

P51
Power

❖ Intel 80386 consumed ~

❖ 3.3 GHz Intel Core i7

consumes 130 W

❖ Heat must be dissipated

from 1.5 x 1.5 cm chip

❖ This is the limit of what

can be cooled by air

P52
Power vs. Thermal Issue

❖ Distributing the power, removing the heat, and preventing hot spot
have become increasingly difficult challenges.
❖ Techniques for reducing power
❖ Do nothing well
❖ Dynamic Voltage-Frequency Scaling
❖ Low power state for DRAM, disks
❖ Overclocking, turning off cores

Leakage Current Increase 5% per 10°C

❖ Features
❖ Under a Thermal Design Power(TDP), maximize the frequency state
depends on the workload and:
➢ Number of active cores
➢ Estimated current consumption
➢ Estimated power consumption
➢ Processor temperature

❖ Commercial Products
❖ Intel Core i7-2xxx series
❖ Intel Core i5-2xxx series
❖ Intel Xeon E3-12xx series

Near zero power for inactive core

P54
Single Processor Performance
Move to multi-processor

RISC

P55
Multiprocessors

❖ Multicore microprocessors
❖ More than one processor per chip

❖ Requires explicitly parallel programming

❖ Compare with instruction level parallelism
➢ Hardware executes multiple instructions at once
➢ Hidden from the programmer
❖ Hard to do
➢ Programming for performance
➢ Load balancing
➢ Optimizing communication and synchronization

P56
Why Do I Want to Know These?

I/O Chan
Link
API
ISA
Technology Interfaces (ISA)

Historic IR

Background, Regs
Trend
Machine Organization

Computer
Applications Architect Measurement &
Evaluation

P57
In Fact, Architecture Design Is an Iterative
Process

-- searching the space of possible designs

-- at all levels of computer systems
New concepts
created

Estimate
Cost & Sort
Performance

Historical background
and understanding of
trends help the Good
selection process Mediocre ideas
ideas

P58

Patterson6e MIPS Ch01 PPT
No ratings yet
Patterson6e MIPS Ch01 PPT
49 pages
Chapter 01 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
83% (6)
Chapter 01 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
49 pages
Chapter 1: Computer Abstractions and Technology
No ratings yet
Chapter 1: Computer Abstractions and Technology
50 pages
DSKH Chung Khoan Tai TP - HCM 4421
No ratings yet
DSKH Chung Khoan Tai TP - HCM 4421
943 pages
Ico22 - 1 - Computer Abstraction and Technology
No ratings yet
Ico22 - 1 - Computer Abstraction and Technology
42 pages
Lecture 2 CPU Fundamentals
No ratings yet
Lecture 2 CPU Fundamentals
43 pages
CH02-HP Computer Abstractions and Technology
No ratings yet
CH02-HP Computer Abstractions and Technology
36 pages
Chapter_1_Introduction
No ratings yet
Chapter_1_Introduction
49 pages
01_Chapter 1 (1)
No ratings yet
01_Chapter 1 (1)
41 pages
Computer Architecture and Operating Systems (Caos) Course Code: CS31702 4-0-0
No ratings yet
Computer Architecture and Operating Systems (Caos) Course Code: CS31702 4-0-0
33 pages
Intro
No ratings yet
Intro
14 pages
lect1
No ratings yet
lect1
25 pages
ARM Computer Organization-Chapter01
No ratings yet
ARM Computer Organization-Chapter01
55 pages
Chapter 01 Modified
No ratings yet
Chapter 01 Modified
55 pages
CMP2008 L1
No ratings yet
CMP2008 L1
47 pages
COAL lecture 02
No ratings yet
COAL lecture 02
36 pages
Week 1
No ratings yet
Week 1
34 pages
Computer Abstractions and Technology
No ratings yet
Computer Abstractions and Technology
46 pages
Computer Abstractions and Technology
No ratings yet
Computer Abstractions and Technology
47 pages
Computer Architecture Unit 1
No ratings yet
Computer Architecture Unit 1
59 pages
Lecture 16 Technology ,Performance,Powerwall
No ratings yet
Lecture 16 Technology ,Performance,Powerwall
9 pages
Chapter 1 Computer Abstractions and Technology
No ratings yet
Chapter 1 Computer Abstractions and Technology
39 pages
Lecture - 4 - Performance
No ratings yet
Lecture - 4 - Performance
31 pages
PPT#01
No ratings yet
PPT#01
30 pages
chapter_1
No ratings yet
chapter_1
53 pages
Computer Architecture and Operating Systems (CS31702)
No ratings yet
Computer Architecture and Operating Systems (CS31702)
30 pages
Chapter 1 Computer Abstractions and Technology
No ratings yet
Chapter 1 Computer Abstractions and Technology
46 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
Unit 1
No ratings yet
Unit 1
6 pages
Abstraction & Technology_1
No ratings yet
Abstraction & Technology_1
74 pages
Aula Ch1
No ratings yet
Aula Ch1
40 pages
Slide 1
No ratings yet
Slide 1
33 pages
Computer Abstractions and Technology: Omputer Rganization and Esign
No ratings yet
Computer Abstractions and Technology: Omputer Rganization and Esign
49 pages
Chapter 1 Computer Abstractions and Technology
No ratings yet
Chapter 1 Computer Abstractions and Technology
46 pages
Defining Computer Architecture
No ratings yet
Defining Computer Architecture
6 pages
Computer Architecture
No ratings yet
Computer Architecture
53 pages
Computer Abstractions and Technology
No ratings yet
Computer Abstractions and Technology
49 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
Lec 1
No ratings yet
Lec 1
32 pages
Computer Abstractions and Technology
No ratings yet
Computer Abstractions and Technology
39 pages
Alllpdf PDF
No ratings yet
Alllpdf PDF
253 pages
ch1 PDF
No ratings yet
ch1 PDF
33 pages
Chapter 01
No ratings yet
Chapter 01
50 pages
CMP2008 - NOTLAR - Chapter - 01 - 4UP
No ratings yet
CMP2008 - NOTLAR - Chapter - 01 - 4UP
13 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
Computer Abstractions and Technology
No ratings yet
Computer Abstractions and Technology
231 pages
CSA Performance
No ratings yet
CSA Performance
40 pages
Computer Abstractions and Technology: The Hardware/Software Interface 5
No ratings yet
Computer Abstractions and Technology: The Hardware/Software Interface 5
52 pages
Chapter 01
No ratings yet
Chapter 01
49 pages
CCS 1202 Lecture 2_Computer Evolution and Performance
No ratings yet
CCS 1202 Lecture 2_Computer Evolution and Performance
32 pages
Chapter 01 Jadara Short
No ratings yet
Chapter 01 Jadara Short
43 pages
Computer Abstractions and Technology: Omputer Rganization AND Esign
No ratings yet
Computer Abstractions and Technology: Omputer Rganization AND Esign
50 pages
Fundamentals of Computer Design Unit 1-Chapter 1: Reference
No ratings yet
Fundamentals of Computer Design Unit 1-Chapter 1: Reference
53 pages
Computer Abstractions and Technology
No ratings yet
Computer Abstractions and Technology
50 pages
Hpca Notes
No ratings yet
Hpca Notes
216 pages
Computer Abstractions and Technology: Adapted by Prof. Gheith Abandah
No ratings yet
Computer Abstractions and Technology: Adapted by Prof. Gheith Abandah
35 pages
Unit 1
No ratings yet
Unit 1
68 pages
Lecture1 Cda3101
No ratings yet
Lecture1 Cda3101
44 pages
Computer Architecture Introduction
No ratings yet
Computer Architecture Introduction
61 pages
Computer Architecutre
No ratings yet
Computer Architecutre
77 pages
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
Osad 1
No ratings yet
Osad 1
24 pages
Free Proxy List
No ratings yet
Free Proxy List
7 pages
TestPro User Manual v3.8 (July 2022)
No ratings yet
TestPro User Manual v3.8 (July 2022)
46 pages
Understanding SPI Communication
No ratings yet
Understanding SPI Communication
9 pages
ZXSDR B8200 GU360 (HV2.5) Indoor GSM&UMTS Dual Mode Baseband Unit Hardware Description PDF
No ratings yet
ZXSDR B8200 GU360 (HV2.5) Indoor GSM&UMTS Dual Mode Baseband Unit Hardware Description PDF
72 pages
Ubuntu
No ratings yet
Ubuntu
2 pages
Case Study Informed Consent
No ratings yet
Case Study Informed Consent
2 pages
MB-800 Microsoft Exam Practice Questions
No ratings yet
MB-800 Microsoft Exam Practice Questions
20 pages
Amazon Workspaces: Administration Guide
No ratings yet
Amazon Workspaces: Administration Guide
75 pages
System To System API
No ratings yet
System To System API
13 pages
SSRN 4756641
No ratings yet
SSRN 4756641
12 pages
Genetica Cuantitativa
No ratings yet
Genetica Cuantitativa
120 pages
Erba Product Catalogue 2021 Version 3 English WEB-d4a02df4
No ratings yet
Erba Product Catalogue 2021 Version 3 English WEB-d4a02df4
21 pages
Quadratic Equation (As Sir, SAG Sir)
No ratings yet
Quadratic Equation (As Sir, SAG Sir)
13 pages
RDBMS Sample Questions
No ratings yet
RDBMS Sample Questions
4 pages
VBM tourniquete
No ratings yet
VBM tourniquete
24 pages
PXC4 Siemens
No ratings yet
PXC4 Siemens
13 pages
Rajan Sah
No ratings yet
Rajan Sah
42 pages
FS - MM-En-023 - Service PO For Transporters - Contractors and Other Overheads Pricing - v0.1
No ratings yet
FS - MM-En-023 - Service PO For Transporters - Contractors and Other Overheads Pricing - v0.1
8 pages
Placement Cell Interview Preparation Question Paper 1
No ratings yet
Placement Cell Interview Preparation Question Paper 1
20 pages
6. CONSTANT USE OF ELECTRONIC MEDIA
No ratings yet
6. CONSTANT USE OF ELECTRONIC MEDIA
3 pages
2024-fall-prelim2
No ratings yet
2024-fall-prelim2
12 pages
Standard Five Mathematics Review Booklet 1
No ratings yet
Standard Five Mathematics Review Booklet 1
25 pages
Fov Switcher Eng
No ratings yet
Fov Switcher Eng
2 pages
Hardware Installation Manual: 3-Axis Stepper Drive With Breakout Board & I/O's
No ratings yet
Hardware Installation Manual: 3-Axis Stepper Drive With Breakout Board & I/O's
25 pages
PHStat Readme
No ratings yet
PHStat Readme
4 pages
Mass Schedule of San Isidro Labrador Parish, Cebu City, Cebu
No ratings yet
Mass Schedule of San Isidro Labrador Parish, Cebu City, Cebu
1 page
Java Second Class
No ratings yet
Java Second Class
7 pages
VXXX J Tr34 Tech-Spec 01-15
No ratings yet
VXXX J Tr34 Tech-Spec 01-15
7 pages

Lecture1_Computer Abstractions and Technology v2

Uploaded by

Lecture1_Computer Abstractions and Technology v2

Uploaded by

計算機組織

Computer Abstractions and

Kun-Chih (Jimmy) Chen 陳坤志

❖ Progress in computer technology

❖ It opened the world’s eyes to the very idea of data processing.

❖ Herman Hollerith is the founder of Computing Tabulating Recording

❖ The idea of a universal computational device was first described by

❖ Around 1944–1945, John von Neumann proposed that

Von Neumann with the first Institute computer

❖ ENIAC (Electronic Numerical Integrator and Computer) was the first

❖ In celebration of ENIAC’s 50th anniversary, the machine was

➢ 10-core Core i7 (Broadwell-E), 2016

❖ Cell size (diameter): 10 – 100 um

❖ Programming language, compiler, architecture

❖ Processor and memory system

❖ I/O system (including OS)

❖ Design for Moore’s Law

❖ Use abstraction to simplify design

❖ Make the common case fast

❖ Performance via parallelism

❖ Performance via pipelining

❖ Performance via prediction

❖ Dependability via redundancy

Year Technology Relative performance/cost

2001 2005 2010 2016

Line width (nm) 130 80 45 22

Clock (GHz) 1.7 5.2 11.5 28.8

cost per transistor chip density

❖ Yield: proportion of working dies per wafer

❖ 300mm wafer, 280 chips, 32nm technology

Cost per wafer

❖ Nonlinear relation to area and defect rate

❖ Purchasing perspective: Given a collection of machines, which has the

❖ Design perspective: Faced with design options, which has the

❖ Goal: understand cost and

❖ Typical performance metrics

❖ Time is the measure of computer performance

❖ Elapsed time (Wall-clock time) = program execution + I/O + wait

❖ Execution time = user time + system time (but OS self measurement

❖ CPU performance = user time on unloaded system

❖ Performance is the result of executing a workload on a configuration

❖ Workload = program + input

❖ Configuration = CPU + cache + memory + I/O + OS + compiler

❖ compiler optimizations can make a huge difference!

❖ Improving locality of memory or I/O accesses

❖ Overlapping I/O (ex. Single direction port vs. bi-direction port)

❖ On multiprocessors, you can improve performance by avoiding cache

❖ Clock period: duration of a clock cycle

❖ Clock frequency (rate): cycles per second

CPU Time = CPU Clock Cycles  Clock Cycle Time

Clock CyclesB 1.2  Clock CyclesA

Clock Cycles = Instruction Count  Cycles per Instruction

❖ How many C operations？

for(i=0; i<10; i++)

❖ How many Assembly instructions？

CPU Time = Instruction Count  CPI  Cycle Time

❖ Weighted average CPI

CPU execution time for program (designer’s view)

= Clock Cycles for program x Clock Cycle Time

❖ Substituting for clock cycles:

CPU execution time for program (user’s view)

= Instruction Count x CPI x Clock Cycle Time

CPU time = Instructions  Cycles  Seconds

The BIG Picture

Instructio ns Clock cycles Seconds

❖ MIPS: Million Instructions Per Second

➢ CPI varies between programs on a given CPU

❖ Consider two compilers with three instruction classes, A, B , and

Code from Instruction counts (in billions=109)

❖ Assume that the machine’s clock rate is 4GHz. Which code

❖ CPU clock cycle1 = (5*1+1*2+1*3)*109 =10*109

Suppose we are considering an enhancement to a web server. The

Power = Capacitive load  Voltage 2  Frequency

Pnew Cold  0.85  (Vold  0.85) 2  Fold  0.85

❖ The power wall

❖ How else can we improve performance?

❖ CPU clock cycle1 = (51+12+13)109 =10*109