ARM PROCESSORS
Processor Architecture
ARM stands for “Advanced RISC Machine”.
Based on Reduced Instruction Set Computer
(RISC) architecture
Trading simpler hardware circuitry with
software complexity (& size)
A whole family of ARM Processors exist.
Share similar design principles and common
instruction set
But latest ARM processors utilize more than 100
instructions
A Bit of ARM History
Originally conceived to be a processor for the
desktop system (Acorn®)
now entrenched in embedded markets
First well-known product
Apple®’s Newton™ PDA (1993)
based on an ARM6 core
Significant breakthrough
Apple®’s iPod® (2001)
based on an ARM7 core
Apple ®’ iPhone (2007), Nokia N93
(2006),N100
based on an ARM11 core
ARM Processor
By having relatively simpler hardware, the ARM
processor is targeted for applications that
demand:
low power consumption
i.e. battery powered devices, mobile
devices
Biggest market for the ARM processor:
mobile phones and smart phones
RISC Design philosophy
Instructions
Pipelining
Registers
Load – Store
Architecture
ARM Design Philosophy
Low power consumption
High code density
Price sensitive
Ability to use low-cost memory devices
Able to reduce the area of die taken up by the
embedded processor
Able to incorporate hardware debug
technology
ARM Deviations from RISC characteristics
Variable cycle execution for certain
instructions
Inline barrel shifter leading to more complex
instructions
Thumb 16-bit instruction set
Conditional execution
Enhanced instructions
ARM Processor Main Features
Typical ARM processors:
Run at a relatively slow clock cycle (few hundred
MHz).
[But new and upcoming family, like the dual-core
Cortex™-A9 Osprey is capable of achieving up to
2 GHz clock.]
32-bit instructions, with extension to support 16-
bit Thumb® & Thumb-2 instructions.
Single unified memory address space (i.e. all
peripherals and I/O are accessed like normal
memory, at certain specific memory locations).
Relatively low power consumption.
Data Sizes and Instruction Sets
The ARM is a 32-bit architecture.
When used in relation to the ARM:
Byte means 8 bits
Halfword means 16 bits (two bytes)
Word means 32 bits (four bytes)
Most ARM’s implement two instruction sets
32-bit ARM Instruction Set
16-bit Thumb Instruction Set
Jazelle cores can also execute Java bytecode
ARM Partners
The ARM processor is not sold as a processor chip but as
a hardware IP license.
Licensees add their own logic and customized peripherals
and then manufacture the silicon processor chip.
Typically sold as ASIC/SOC for embedded applications
Some of the present and past licensees (ARM calls them
Partners) include:
Texas Instruments, Philips, Analog Devices, Qualcomm
Intel (StrongARM® and XScale®)
Atmel – its processor is used on the ARM9 board
ARM Nomenclature
ARM Revision History
ARM Revision History (Contd..)
ARM Processor Family
ARM Core Data Flow Model
Operation unit /
Storage area
Buses
Flow of data
Registers
The active registers available in user mode (one
of the mode of operation of ARM) are shown
here.
There are up to 18 active registers
16 data registers visible to the programmer as
(r0-r15)
General Purpose registers (r0 – r12)
Special Function registers:
r13 – sp (Stack Pointer) (can also be used as general
purpose)
r14 – lr (Link Register) (can also be used as general purpose)
r15 – pc (Program Counter)
2 processor status registers.
cpsr (Current Program Status Register)
Spsr (Saved Program Status Register)
r0 – r13 are orthogonal – any instruction that you
can apply to r0 you can equally well apply to any
of the other registers.
Current Program Status Register
Generic psr format shown
above Interrupt Disable bits.
I = 1: Disables the IRQ.
To Monitor and control internal
operations F = 1: Disables the FIQ.
Condition code flags T Bit
N = Negative result from ALU Architecture xT only
Z = Zero result from ALU T = 0: Processor in ARM state
T = 1: Processor in Thumb state
C = ALU operation Carried out
V = ALU operation oVerflowed Mode bits
Specify the processor mode
Processor Modes
Determines which registers are active and the access rights to cpsr
register itself.
Each processor mode is either
Privileged : Allows full read – write access to cpsr
nonprivileged : allows only read access to control field of cpsr but still
allows read-write access to condition flags.
The ARM has seven basic operating modes:
User : nonprivileged mode under which most tasks (applications)
run
FIQ : entered when a high priority (fast) interrupt is raised
IRQ : entered when a low priority (normal) interrupt is raised
Supervisor : entered on reset and generally the mode that OS
kernel operates in
Abort : used to handle memory access violations
Undefined : used to handle undefined instructions
System : privileged mode using the same registers as user mode
that allows full read-write access to cpsr
Complete ARM Register Set
Changing Mode on an Exception
Changing Mode on an Exception
ARM Processor Mode Select bits
Program Counter (r15)
When the processor is executing in ARM state:
All instructions are 32 bits wide
All instructions must be word aligned
Therefore the value is stored in bits [31:2] with bits
[1:0] undefined (as instruction cannot be half word or
byte aligned)
When the processor is executing in Thumb state:
All instructions are 16 bits wide
All instructions must be half word aligned
Therefore the value is stored in bits [31:1] with bit [0]
undefined (as instruction cannot be byte aligned)
When the processor is executing in Jazelle state:
All instructions are 8 bits wide
Processor performs a word access to read 4 instructions
at once
Pipeline
What is pipelining :
A mechanism for overlapped execution of several
input sets by partitioning some computation into a
set of k – sub computations (or stages)
Very nominal increase in the cost of implementation
Very significant Speed up (Ideally, k)
To attain k times speed up for some computation
Alternative 1: Replicate the hardware by k times
(cost also goes upto k times)
Alternative 2: Split the computation into k stages
(very nominal cost increase)
Need Buffering
Pipeline
ARM7 Three Stage Pipeline
Filling the Pipeline
Pipeline changes for ARM7TDMI and above
ARM7TDMI
ARM decode
Instruction Thumb®ARM Reg Reg
Shift ALU
Fetch decompress Read Write
Reg Select
FETCH DECODE EXECUTE
ARM9TDMI
ARM or Thumb
Instruction Inst Decode Memory Reg
Shift + ALU Write
Fetch Reg Reg Access
Decode Read
FETCH DECODE EXECUTE MEMORY WRITE
Pipeline changes for ARM7TDMI and above
ARM10
Branch ARM or Memory
Prediction Shift + ALU
Thumb Reg Read Access Reg
Instruction Write
Instruction
Decode Multipl
Fetch Multiply
y Add
FETCH ISSUE DECODE EXECUTE MEMORY WRITE
Pipeline (Contd..)
ARM9 five – stage pipeline (13%)
ARM10 six – stage pipeline (34%)
Increased pipeline length
Reduces the amount of work done at each stage
Higher Operating frequency
Increases pipeline latency
Increase the data dependency between the stages.
Pipeline Executing Characteristics
Pipeline Executing Characteristics
From Pipeline filling ;
ARM State PC = PC+8 (2 inst. ahead)
Thumb State PC = PC+4 ( -do-
)
Execution of Branch instruction causes the
ARM core to flush its pipeline.
Branch prediction used by ARM10 reduces
the effect of pipeline flush.
If interrupt occurs, other instructions in the
pipeline will be abandoned and ARM starts
filling the pipeline from appropriate entry in
Vector table can
be at
Exception Handling 0xFFFF0000 on
ARM720T and
When an exception occurs, the ARM: on ARM9/10
family devices
Copies CPSR into SPSR_<mode> .
Sets appropriate CPSR bits .
Change to ARM state .
Change to exception mode Fast interrupt
Disable interrupts (if appropriate)
0x1C request
Interrupt
Stores the return address in LR_<mode>0x18 request
Sets PC to vector address 0x14 Reserved
To return, exception handler needs0x10 to: Data abort
Restore CPSR from SPSR_<mode>0x0C Prefetch abort
Software
Restore PC from LR_<mode> 0x08 interrupt
Undefined
This can only be done in ARM state. 0x04 instruction
The Vector Table
ARM Instruction Set
Data processing Instructions
Branch Instructions
Load store Instructions
Software Interrupt Instructions
Program status register Instructions
Data Processing Instructions
Consist of:
Arithmetic: ADD ADC SUB SBC RSB RSC
Logical: AND ORR EOR BIC
Comparisons: CMP CMN TST TEQ
Data movement: MOV MVN
These instructions only work on registers, NOT memory.
Syntax:
<Operation>{<cond>}{S} Rd, Rn, Operand2
Comparisons set flags only - they do not specify Rd
Data movement does not specify Rn
Second operand is sent to the ALU via barrel shifter.
Conditional Execution and Flags
ARM instructions can be made to execute conditionally by post
fixing them with the appropriate condition code field.
This improves code density and performance by reducing the
number of forward branch instructions.
CMP r3,#0 CMP r3,#0
BEQ skip ADDNE r0,r1,r2
ADD r0,r1,r2
skip
By default, data processing instructions do not affect the
condition code flags but the flags can be optionally set by using
“S”. CMP does not need “S”.
loop
… decrement r1 and set flags
SUBS r1,r1,#1
if Z flag clear then branch
BNE loop
Condition Codes
The possible condition codes are listed below
Note AL is the default and does not need to be specified
Immediate Constant
No ARM instruction can contain a 32 bit immediate
instruction
The data processing instruction format has 32 bit available
for operand2
4 bit rotate value (0-15) is multiplied by 2 to give range 0-
30 in steps of 2
Rule to remember is “8-bits shifted by even number of bit
positions”
Using a Barrel Shifter: The 2nd Operand
Operand Operand Register, optionally with shift operation
1 2 Shift value can be either be:
5 bit unsigned integer
Specified in bottom byte of
Barrel another register.
Shifter Used for multiplication by
constant
Immediate value
8-bit number with a range 0-
255
ALU Rotated through even number
of positions
Allows increased range of 32-
bit constants to be loaded
Result
directly into registers
The Barrel Shifter
Barrel Shifter Operations
Examples for Data Processing
1. MVN r6, #0 ; move not of 32 bit
value to the
; register
2. MOVS r7,r7 ; set the flags
RSBMI r7,r7,#0 ; if neg, r7=0-r7
3. ADD r9,r8,r8,LSL #2 ; r9=r8*5
RSB r10,r9,r9,LSL #3 ; r10=r9*7
Examples for Data Processing
PRE
cpsr = nzcvqiFt_USER
r0 = 0x0000 0000
r1 = 0x8000 0004
MOVS r0, r1, LSL #1
POST
cpsr = nzCvqiFt_USER
r0 = 0x0000 0008
r1 = 0x8000 0004
Multiply and Divide
There are 2 classes of multiply - producing 32-bit and
64-bit results
32-bit versions on an ARM7TDMI will execute in 2 - 5
cycles
MUL r0, r1, r2 ; r0 = r1 * r2
MLA r0, r1, r2, r3 ; r0 = (r1 * r2) + r3
64-bit multiply instructions offer both signed and
unsigned versions
For these instruction there are 2 destination registers
[U|S]MULL r4, r5, r2, r3 ; r5:r4 = r2 * r3
[U|S]MLAL r4, r5, r2, r3
; r5:r4 = (r2 * r3) +
r5:r4
Most ARM cores do not offer integer divide instructions
Division operations will be performed by C library
Load / Store Instructions
Single register data transfer
LDR STR Word
LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed byte load
LDRSH Signed halfword load
Memory system must support all access sizes
Syntax:
LDR{<cond>}{<size>} Rd, <address>
STR{<cond>}{<size>} Rd, <address>
Single register load and store addressing
Pre or Post Indexed addressing
Address accessed
Address accessed by LDR/STR is specified by a base register with
an offset
For word and unsigned byte accesses, offset can be: Post-index
An unsigned 12-bit immediate value (i.e. 0 - 4095 bytes)
LDR r0, [r1], #8 ;r0=[r1], r1=r1+8
A register, optionally shifted by an immediate value
Pre-index
LDR r0, [r1, r2]
LDR r0, [r1, r2, LSL#2]
This can be either added or subtracted from the base register:
LDR r0, [r1, #-8]
LDR r0, [r1, -r2, LSL#2]
For half word and signed half word / byte, offset can be:
Update base
An unsigned 8 bit immediate value (i.e. 0 - 255 bytes)
pointer
A register (un shifted)
Pre-index
Choice of pre-indexed or post-indexed addressing
Choice of whether to update the base pointer (pre-indexed only)
LDR r0, [r1, #-8]! ;r0=[r1-8],r1=r1
ARM addressing Modes
Load and Store Multiples
Syntax:
<LDM|STM>{<cond>}<addressing_mode> Rb{!}, <register
list>
4 addressing modes:
LDMIA / STMIA increment after
LDMIB / STMIB increment before
LDMDA / STMDA decrement after
LDMDB / STMDB decrement before
IA IB DA DB
LDMxx r10!, {r0,r1,r4} r4
STMxx r10!, {r0,r1,r4} r4 r1
r1 r0 Increasing
Base Register (Rb) r10 r0 r4 Address
r1 r4
r0 r1
r0
St. add. r10 r10+4 r10- r10-4*3
4*3+4
Load and Store Multiples
Syntax:
<LDM|STM>{<cond>}<addressing_mode> Rb{!},
<register list>
Load and Store multiple Pairs when
base update is used
Store Multiple Load Multiple
STMIA LDMDB
STMIB LDMDA
STMDA LDMIB
STMDB LDMIA
Example
Example
Example
Stack Operations
Push operation – Placing data onto stack – Store
multiple instruction
Pop operation – Removing data from stack – Load
multiple instruction
Addressing methods
Ascending (A) or Descending(D)
Stack pointer points to Full (F) or Empty (E)
location
Example
Swap Instructions
Swap (SWP)
Swap byte (SWPB)
Ex:
SWP R12, R10, [R9];Load R12 from address R9 and
; store R10 to address R9
SWPB R3, R5, [R6] ;Load a byte to R3
from address R6 and ; store byte
from R5 to address R6
Example
Branch instructions
Branch : B {<cond>} label
Branch with Link : BL{<cond>} subroutine_label
The address Label is stored in the inst. as signed PC – relative offset and must
be within ± 32 Mbyte range
31 28 27 25 24 23 0
Cond 1 0 1 L Offset
Link bit 0 = Branch
1 = Branch with link
Condition field
The processor core shifts the offset field left by 2 positions,
sign-extends it and adds it to the PC
± 32 Mbyte range
ARM Branches and Subroutines
B <label>
PC relative. ±32 Mbyte range.
BL <subroutine>
Stores return address in LR
Returning implemented by restoring the PC from LR
For non-leaf functions, LR will have to be stacked
func1 func2
: STMFD sp!, :
: {regs,lr} :
BL func1 : :
: BL func2 :
: : :
LDMFD sp!, MOV pc, lr
{regs,pc}
Branch instructions (contd..)
Branch exchange :BX {<cond>} Rm
Copies the contents of general purpose register Rm
into PC (PC = Rm & 0xfffffffe
T bit of cpsr is updated from LSB of Rm
Branch exchange with Link :
BLX{<cond>} subroutine_label | Rm
Copies the contents of general purpose register Rm or
label into PC (PC = Rm & 0xfffffffe
Additionally sets the link register with the return
address
Available in T variants of ARM architecture versions 4 and
above
It is primarily used to branch to and from the Thumb code
Software Interrupt (SWI)
SWI causes a software interrupt exception,
which provides a mechanism for applications
to call operating system routines.
Each SWI inst. Has an associated SWI
number, which is used to represent a
particular function call or feature.
Software Interrupt (SWI)
PSR access
31 28 27 24 23 19 16 15 10 9 8 7 6 5 4 0
N Z C V Q de J GE[3:0] IT cond_abc E A I F T mode
flag status extension control
MRS and MSR allow contents of CPSR / SPSR to be transferred to / from
a general purpose register or take an immediate value
MSR allows the whole status register, or just parts of it to be updated
Interrupts can be enable/disabled and modes changed, by writing to the
CPSR
Typically a read/modify/write strategy should be used:
MRS r0,CPSR ; read CPSR into r0
BIC r0,r0,#0x80 ; clear bit 7 to enable IRQ
MSR CPSR_c,r0 ; write modified value to ‘c’ byte only
In User Mode, all bits can be read but only the condition flags (_f) can be
modified
Loading Constants
Two pseudo instructions to move a 32 bit
value into a register
LDR Rd, =Constant
It writes a 32 bit constant into a register
ADR Rd, label
It writes a relative address into a register
ARM Instruction set format
Questions?