0% found this document useful (0 votes)
10 views61 pages

EEE415 Lect Intro

The document outlines the course EEE 415, focusing on ARM microprocessors, assembly language, and embedded systems with real-time operating systems. It includes a syllabus, lecture plan, and information about the instructors, highlighting their research interests and relevant textbooks. The course aims to provide a comprehensive understanding of microprocessor architecture, programming, and design principles.

Uploaded by

2006100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views61 pages

EEE415 Lect Intro

The document outlines the course EEE 415, focusing on ARM microprocessors, assembly language, and embedded systems with real-time operating systems. It includes a syllabus, lecture plan, and information about the instructors, highlighting their research interests and relevant textbooks. The course aims to provide a comprehensive understanding of microprocessor architecture, programming, and design principles.

Uploaded by

2006100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 61

EEE 415

Microprocessors and Embedded


Systems
1
Objective: To learn about

1. ARM Microprocessor
• ARM Architecture
• ARM Programming
• Operating Principles
2. ARM Assembly Language
• ARM Instruction Set
• Assembly Interpretation
3. Embedded Systems & RTOS
• Embedded Systems
• RTOS Concepts

2
Course Instructors

• Dr. Sajid Muhaimin Choudhury


• Section A

• Dr. Zabir Ahmed


• Section B
• Section C

3
Course Instructor (Section – B and C)
Dr. Zabir Ahmed

https://sites.google.com/view/zabirahmed/
4
https://sites.google.com/view/zabirahmed/
Course Instructor (Section – B and
C)
Dr. Zabir Ahmed
Prior Research
• Neurophotonic and Neural Interfaces
1. Integrated photonic ultra-sensitive electro-optic sensor
for sub-mV neural signal detection
2. Development of NHP rigid and flexible neural
interfaces
• Plasmonics and Nanophotonics
1. Plasmonic Solar Cell Brain Computer
2. Plasmonic nanolaser Interface
Research Interests
• Neurophotonic and Neuroplasmonic Devices
1. Voltage Sensing
2. Glucose and Neurotransmitter sensing
• Flexible wearable sensors
• Nanobiophotonics
• Integrated Silicon Photonics + Embedded
• Photonic Computing Systems! 5
New Syllabus
•Fundamentals of microprocessor and computer design, processor data path, architecture,
microarchitecture, complexity, metrics, and benchmark; Instruction Set Architecture, introduction
to CISC and RISC, Instruction-Level Parallelism, pipelining, pipelining hazards and data
dependency, branch prediction, exceptions and limits, super-pipelined vs superscalar processing;
Memory hierarchy and management, Direct Memory Access, Translation Lookaside Buffer; cache,
cache policies, multi-level cache, cache performance; Multicore computing, message passing,
shared memory, cache-coherence protocol, memory consistency, paging, Vector Processor,
Graphics Processing Unit, IP Blocks, Single Instruction Multiple Data and SoC with
microprocessors. Simple Arm/RISC-V based processor design with VerilogHDL
•Introduction to embedded systems design, software concurrency and Realtime Operating
Systems, Arm Cortex M / RISC-V microcontroller architecture, registers and I/O, memory map and
instruction sets, endianness and image, Assembly language programming of Arm Cortex M / RISC-
V based embedded microprocessors (jump, call-return, stack, push and pop, shift, rotate, logic
instructions, port operations, serial communication and interfacing), system clock, exceptions and
interrupt handling, timing analysis of interrupts, general purpose digital interfacing, analog
interfacing, timers: PWM, real-time clock, serial communication, SPI, I2C, UART protocols,
Embedded Systems for Internet of Things (IoT)

6
How it relates to other courses in BUET
EEE
• PHY 165 Electricity and Magnetism,
programs
CSE / Comp Engg Modern Physics and Mechanics
device drivers • EEE 101 - Electrical Circuits I
• EEE 105 - Electrical Circuits II
instructions

EEE 415 registers

datapaths
controllers
• EEE 201 - Electronic Circuits I
• EEE 203 - Energy Conversion I
• EEE 207 - Electronic Circuits II
adders
• EEE 209 - Engineering Electromagnetics
EEE 303, EEE 467
memories

AND gates • EEE 303 - Digital Electronics


NOT gates
• EEE 313 Solid State Devices
amplifiers
EE 101, 105, 207, 315, 465* filters
• EEE 315 Power Electronics

transistors
• EEE 415 Microprocessors and
EEE 201, 313 diodes Embedded Systems
• EEE 465* Analog Integrated Circuits
PHY 165, 209, 461* electrons
• EEE 467* VLSI Circuits and Design
7
Slide Courtesy:
Textbooks

[Harris] Sarah Harris, David Harris – “Digital Design and


Computer Architecture, ARM Edition, Morgan
Kaufmann (2015)

[Zhu] Yifeng Zhu “Embedded Systems with ARM Cortex-M


Microcontrollers with Assembly Language and C”

[Patterson] David A. Patterson and John L. Hennessy,


“Computer Organization and Design – The Hardware
/ Software Interface ARM edition” Morgan Kaufmann
Slide Courtesy: 8
Lecture Plan (Tentative)
Week Lectures Topic Textbook
01 1-3 Patterson 1
Fundamentals of microprocessor and computer design, processor data
path, architecture, microarchitecture, introduction to CISC and RISC,
complexity, metrics, and benchmark
02 4-6 Assembly Language, Harris 6.1-6.3
03 7-9 Assembly Language Programming, Harris 6.3
04 10-12 Machine Language, Compiling, Assembling, Harris 6.4
05 13-15 Performance Analysis, Single Cycle Processor Harris 7.1-7.3
06 16-18 Multicycle Processor, Pipelining, Hazards Harris 7.4,7.7
07 19-21 Advanced Microarchitecture, Memory Systems – Cache Harris 8.2-8.3
08 20-24 Introducing Embedded System Design, IoT, Arm Cortex m4 Lecture Slides
09 25-27 General Purpose Input Output Zhu 14
10 28-30 General Purpose Timers Zhu 15.1-15.3
11 31-33 Interrupts Zhu 11, 15.4
12 34-36 ADC + DAC Zhu 20,21
13 37-39 Serial Communication Zhu 22
Slide Courtesy: 9
Next Up

• What is computer architecture?


• Computer architecture arena and design goals
• Historical performance of computer architecture
• Future trends with multicore processors, systems on chip (SoCs),
and beyond

10
Fundamentals of
microprocessor and computer
architecture
11
Introduction to Computers
• The modern computer is less than 100
years old.
• The first electromechanical and valve-
based machines were produced in the EDSAC replica (2018)1
1930s and 1940s.
• Today’s machines are many orders of
magnitude faster, lower power, more
reliable, and cheaper.

Raspberry Pi 2 Arm Cortex-M02


1. EDSAC photo, CC BY-SA 4.0
2. The courier mail, CC BY-SA 4.0

Slide Courtesy: 12
Computers are everywhere!

BU CS101
13
But application/purpose can be vastly
different!

It can be also single user or multiuser!

BU CS101 14
What’s inside a Computer

15
What’s on a motherboard?

16
What’s inside a microprocessor?

Computer Architecture
In EEE 415, we will
learn:
• How a microprocessor is
programmed
• Instruction Set Architecture(ISA)
or just Architecture
8008 microprocessor (1972) • How a microprocessor is designed
https://www.righto.com/2017/03/analyzing-vintage-8008-processor-from.html • Logic level
• Component level
• These microprocessors are everywhere: mobile phone, • (Commonly
TV, smartknown
TV, as
laptop, smart watch, motor vehicles, airplanes, Xbox, PS, etc. microarchitecture)
• Depending on the application, the architecture and microarchitecture of the
processors can be different 17
a
c
t
i
o
n
How are microprocessors made?
s

a
n
d • Core i7 chips on a 12 inch wafer
T
e • How are these chips made?
c
h • Essentially from very pure sand!
n
o • Make wafer from Sand (silicon)
l
o • Then fabricate chips on that wafer
g
y

1
8

• 300mm wafer, 280 chips, 32nm


technology
• Each chip is 20.7 x 10.5 mm

18
Making the wafer

Melt in furnace
Ingot formation

Pure Sand

Cut and polish

19
Making the chip

Mask Mask

Resist Resist
SiO2
Silicon Wafer Silicon Wafer SiO2 SiO2
Grow silicon dioxide Silicon Wafer Silicon Wafer From ASML ($380
Apply photo resist Expose to UV
Million), ~2 nm
features

Etch SiO2
SiO2 Met Met
(Or not)
Silicon Wafer Silicon Wafer Silicon Wafer Silicon Wafer
Patterned resist Etch SiO2
Deposit metal

Slide Courtesy: 20
Chip manufacturing process

Slide Courtesy: 21
Building blocks
Transistors Wires/
Interconnects

22
Major Players in the Microprocessor
world CISC RISC

Slide Courtesy: 23
Intel Raptor Lake Architecture

ISA: x86

24
Intel Raptor Lake Architecture

ISA: x86

25
AMD Zen 4 Core under Microscope

ISA: x86

https://www.techpowerup.com/298338/amd-zen-4-
dies-transistor-counts-cache-sizes-and-latencies-
detailed

26
Apple M4 (SoC)

ISA: ARMv9

27
Snapdragon 8 Series (SoC)

ISA: ARMv9

28
Self-driving car (AI Accelerator)

Tesla's third-generation full self-driving car 29


NVIDIA H100 (GPU Accelerator)

30
Takeaways
• Some chips share the same instruction set (e.g., x86 or ARM)
• But their physical designs are vastly different
• Even the same instruction may be executed differently across
architectures
• ISA ≠ microarchitecture — same software, different hardware
behavior
• Design choices depend on target use cases: performance, power,
scalability, etc.
31
What is Computer Architecture?
•In computer engineering, computer architecture is a
description of the structure of a computer system made from
component parts.[1]

•It can sometimes be a high-level description that ignores details


of the implementation.[2]

•At a more detailed level, the description may include the


instruction set architecture design, microarchitecture design,
logic design, and implementation.[3]
1. Dragoni, Nicole (n.d.). "Introduction to peer to peer computing" (PDF). DTU Compute – Department of Applied Mathematics and Computer Science. Lyngby, Denmark.
2.^ Clements, Alan. Principles of Computer Hardware (Fourth ed.).
3.^ Hennessy, John; Patterson, David. Computer Architecture: A Quantitative Approach (Fifth ed.). p. 11.

Slide Courtesy: 32
Levels of Abstractions

• Architecture
• A set of specifications that allows developers to
write software and firmware
• These include the instruction set.
• Microarchitecture
• The logical organization of the inner structure of the
computer
• Hardware or Implementation
• The realization or the physical structure, i.e., logic
design and chip packaging

33
Another view of Abstractions
This Course

Physics/Materials Devices Micro-


architecture Processors
Architectur
es

Slide Courtesy: 34
Computer Architecture
• Computer architecture is concerned with how best to exploit
fabrication technology to meet marketplace demands.
• e.g., how best might we use five billion transistors and a power
budget of two watts to design the chip at the heart of a mobile
phone?
• Computer architecture builds on a few simple concepts, but
is challenging as we must constantly seek new solutions.
• What constitutes the “best” design changes over time and
depending on our use-case. It involves considering many
different trade-offs.

35
Forces acting on Computer Architecture

Computer
architecture
Application
characteristics
Markets

New
applications

Technology

Source: “Early 21st Century Processors,” S. Vajapeyam and M. Valero, IEEE Computer, April 2004

Slide Courtesy: 36
Design Goals
• Functional – hard to correct (unlike software). Verification is perhaps the highest
single cost in the design process. We also need to test our chips once they have
been manufactured, again this can be a costly process and requires careful thought
at the design stage

• Performance – what does this mean? No single best answer, e.g., sports car vs.
off-road 4x4 vehicle – performance will always depend on the “workload”

• Power – a first-order design constraint for most designs today. Power limits the
performance of most systems.

• Security – e.g., the ability to control access to sensitive data or prevent carefully
crafted malicious inputs from hijacking control of the processor

• Cost – design cost (complexity), die costs (i.e., the size or area of our chip),
packaging, etc.

• Reliability – do we need to try to detect and/or tolerate faults during operation?

Slide Courtesy: 37
Why study computer architecture?

• Get fundamental understanding of how computers work.


• Great job opportunity. ASIC, SoC design is on the rise!
• Application specific IC (ASIC)
• SoC (System on Chip)
• You may get you to work in High frequency trading!

38
8 Great Ideas for Comp. Arch.
• Design for Moore’s Law

• Use abstraction to simplify design

• Make the common case fast

• Performance via parallelism

• Performance via pipelining

• Performance via prediction

• Hierarchy of memories

• Dependability via redundancy


[Patterson]
Slide Courtesy: 39
Stored Program Computer/Concept
(Von Neumann Arch.)

• Instructions and data are stored


together in memory (in binary
form).

• The program is then fetched from


memory an instruction at a time
and executed
Questions…
• How are the program represented?
• How do we implement an algorithm in
the computer?
• How does a computer interpret a
program? 40
Representing Programs
• We need some basic building blocks -- call them “instructions”

• What does “execute a program” mean?

• What instructions do we need?

• What should instructions look like?

• Is it enough to just specify the instructions?

• How complex should an instruction be?

41
Levels of Program Code

• High-level language
• Level of abstraction closer to
problem domain
• Provides for productivity and
portability
• Assembly language
• Textual representation of instructions
• Hardware representation
• Binary digits (bits)
• Encoded instructions and data

42
Things to consider…

What instructions do we need?


• Depends on the application
• Add, subtract, multiply, bitwise The complexity of instructions
operations, branches, jumps, function • Complex Instructions
calls, load, store, etc • Complex hardware
• Trade-offs: Performance, power, efficiency, • Difficult to pipeline
pipelining, etc • Different types of instructions
required
What instructions should look like? • Good code density
• Binary codes (String of bits) • Less complex instructions
• All Same size or different sizes? • Easier to pipeline
• What size? How many bits? 32 or 64 ? • Poor code density
• Less readable

43
“Instruction Set Architecture (ISA)
or simply “Architecture”
• An ISA is “the agreed-upon interface between all
the software that runs on the machine and the
hardware that executes it.”
• The “contract” between software and hardware
Instruction Set
• Functional definition of operations, modes, and Architecture Type
storage locations supported by hardware (ISA)
• Precise description of how to invoke, and access x86 CISC
ARM RISC
them
MIPS RISC
RISC-V RISC
• Same ISA or Architecture can be implemented byPowerPC RISC
different microarchitecture, hardware designs IA-64 (Itanium) VLIW
SPARC RISC
CISC: Complex instruction set
computer
RISC: Reduced instruction set 44
Trends in Computer
Performance and
Architecture
45
Historical Performance Trends

• By 1985, it was possible to integrate a complete microprocessor


onto a single die or “chip.”
• As fabrication technology improved, and transistors got smaller,
the performance of a single core improved quickly.
• Performance improved at the rate of 52% per year for nearly 20
years (measured using SPEC benchmark data).

46
Historical Performance Trends

• From 1985 to 2002, performance improved by

Clock Frequency (MHz)


~800 times.
• Over time, technology scaling provided much greater
numbers of faster and lower power transistors.

• The “iron law” of processor performance:


• Time = instructions executed x clocks per instruction (CPI) x clock
period

• Early machines were limited by transistor count. As a result,


they often required multiple clock cycles to execute each
instruction (CPI >> 1). Year
A. Danowitz, K. Kelley, J. Mao, J. P. Stevenson, and M.
• As transistor budgets improved, we could aim to get closer to a Horowitz. Clock Frequency, Stanford CPU DB. Accessed
on Nov. 5, 2019. [Online]. Available:
CPI of 1. http://cpudb.stanford.edu/visualize/clock_frequency

Slide Courtesy: 47
Optimizing performance with Assembly
language
• RollerCoaster Tycoon
• Developer: Chris Sawyer
• Wrote 99% of the code in
Assembly language
• Architecture: x86
• Time: 2 years
• Remaining 1% in C

Chris Sawyer
48
Moore’s law

• Moore’s Law predicted that the number of


transistors we can integrate onto a chip, for the
same cost (or minimal increase), doubles every 2
years.

Gordon Moore and Robert Noyce at Intel in 1970


Source: IntelFreePress, CC BY-SA-2.0

Slide Courtesy: 49
Moore’s law

• Moore’s Law predicted that the number of


transistors we can integrate onto a chip, for the
same cost (or minimal increase), doubles every 2
years.

• 1985 –Intel 386: 275K transistors, die


size = 43 mm2
• 2002 –Intel Pentium 4: 42M
transistors, die size = 217 mm2
• 2024- Intel i9 9900K – 4.2 Billion
transistors!

50
Clocks Per Instruction (CPI)
• Eventually, the industry was also able to
fetch and execute multiple instructions per
clock cycle. This reduced CPI to below 1.
• When we fetch and execute multiple
instructions together, we often refer to
Instructions Per Cycle (IPC), which is
1/CPI.
• For instructions to be executed at the
same time, they must be independent .
• Again, growing transistor budgets were
exploited to help find and exploit this
Instruction Level Parallelism (ILP).

51
Pipelining and Parallelism

52
Pipelining and Parallelism (Another
example)

w/o pipelining with pipelining

https://cs.stanford.edu/people/eroberts/courses/soco/projects/risc/
pipelining/index.html#:~:text=We%20could%20put%20the
%20the,the%20third%20and%20fourth%20loads.
53
There are limits to these performance
gains!
• Slowing Single core Performance
Gains
• The limits of pipelining
• The limits of Instruction-Level Parallelism
(ILP)
• On-chip wiring: wire-delays -> Logic delay
• Power consumption
• Clock rate increased quickly: 1980s- 2004
• Then slowed down! Why?
We can potentially:
• Scale V and f together
• We hit a “Power Wall”
• Challenges:
• Leakage power from short
channel effects
• And other issues!

54
So, what’s next?

• Multicore Processors:
• Eventually, it made sense to shift from
single-core to multicore designs.
• From ~2005, multicore designs became
mainstream.
• The number of cores on a single chip
increased over time.
• Clock frequencies increased more slowly.
• Individual cores were designed to be as
power efficient as possible.
e.g., 4 x Arm Cortex-A72
processors, each with their
own L1 caches and a shared
Slide Courtesy: L2 cache 55
Multicore Processors
Exploiting multiple cores comes with its own set of challenges and
limitations:
• Power consumption may still limit performance.
• We need to write scalable and correct parallel programs to exploit
them.
• We might not be able to find enough parallel threads to take
advantage of our cores.
• On-chip and off-chip communication will limit performance gains.
• Off-chip bandwidth is limited and may throttle our many cores.
• Cores also need to communicate to maintain a coherent view of memory.

Slide Courtesy: 56
Processors for targeted applications

Graphics Processing
Unit (GPU)
57
Limits to specialization

• There are costs associated with designing each new accelerator.


• The chip, or “ASIC,” produced may only be competitive in a smaller
target market, reducing profitability.
• Specialization reduces flexibility.
• The logic invested in specialized accelerators is no longer general-purpose.
• Algorithm changes may render specialized hardware obsolete.
• Once we’ve specialized, further gains may be difficult to achieve.
• Specialization isn’t immune to the concept of diminishing returns.

58
Today’s SoC Designs

• A modern mobile phone mem interface mem interface

SoC (2019) may contain


L3 cache
more than 7 billion memory Neural
transistors. Processor
Unit
• It will integrate: GPU 4 “big” (NPU)
• Multiple processor cores cores
• A GPU
• A large number of 4 “small” Other
cores accelerators
specialized accelerators
• Large amounts of on-chip mem interface mem interface
memory
• High bandwidth interfaces A high-level block diagram of a
to off-chip memory mobile phone SoC
59
Trends in Computer Architecture

Early computers Gains from bit-level parallelism


Time
Pipelining and superscalar issue + Instruction-level parallelism
+ Thread-level parallelism/data-level
Multicore/GPUs parallelism
Greater integration (large SoCs),
heterogeneity, and specialization + Accelerator-level parallelism

Note: Memory hierarchy developments have also been significant. The


memory hierarchy typically consumes a large fraction of the transistor
budget.

60
The Future – The End of Moore’s Law?
• The end of Moore’s Law has been predicted many times.
• Scaling has perhaps slowed in recent years, but transistor
density continues to improve.
• Eventually, 2D scaling will have to slow down.
• We are ultimately limited by the size of atoms!
• Where next?
• Going 3D - Future designs may take advantage of multiple layers
of transistors on a single chip.
• Note: the gains are linear rather than exponential.
• Better packaging and integration technologies (e.g., chip stacking)
• New types of memory (phase-change memory, STT-RAM, etc)
• New materials and devices (nanowire, nanosheet transistors, etc)
61

You might also like