0% found this document useful (0 votes)
28 views68 pages

Unit 4 - ARM Processors

Uploaded by

pyarlachiru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views68 pages

Unit 4 - ARM Processors

Uploaded by

pyarlachiru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

ARM PROCESSORS

Processor Architecture
ARM stands for “Advanced RISC Machine”.

Based on Reduced Instruction Set Computer


(RISC) architecture
 Trading simpler hardware circuitry with
software complexity (& size)
 A whole family of ARM Processors exist.

 Share similar design principles and common


instruction set

 But latest ARM processors utilize more than 100


instructions
A Bit of ARM History
Originally conceived to be a processor for the
desktop system (Acorn®)
 now entrenched in embedded markets

First well-known product


 Apple®’s Newton™ PDA (1993)
based on an ARM6 core

Significant breakthrough
 Apple®’s iPod® (2001)
based on an ARM7 core
 Apple ®’ iPhone (2007), Nokia N93
(2006),N100
based on an ARM11 core
ARM Processor
By having relatively simpler hardware, the ARM
processor is targeted for applications that
demand:
 low power consumption
 i.e. battery powered devices, mobile
devices

Biggest market for the ARM processor:


 mobile phones and smart phones
RISC Design philosophy
Instructions
Pipelining
Registers
Load – Store
Architecture
ARM Design Philosophy
Low power consumption
High code density
Price sensitive
Ability to use low-cost memory devices
Able to reduce the area of die taken up by the
embedded processor
Able to incorporate hardware debug
technology
ARM Deviations from RISC characteristics
Variable cycle execution for certain
instructions
Inline barrel shifter leading to more complex
instructions
Thumb 16-bit instruction set
Conditional execution
Enhanced instructions
ARM Processor Main Features
Typical ARM processors:
Run at a relatively slow clock cycle (few hundred
MHz).
[But new and upcoming family, like the dual-core
Cortex™-A9 Osprey is capable of achieving up to
2 GHz clock.]
32-bit instructions, with extension to support 16-
bit Thumb® & Thumb-2 instructions.
Single unified memory address space (i.e. all
peripherals and I/O are accessed like normal
memory, at certain specific memory locations).
Relatively low power consumption.
Data Sizes and Instruction Sets
The ARM is a 32-bit architecture.

When used in relation to the ARM:


 Byte means 8 bits
 Halfword means 16 bits (two bytes)
 Word means 32 bits (four bytes)

Most ARM’s implement two instruction sets


 32-bit ARM Instruction Set
 16-bit Thumb Instruction Set

Jazelle cores can also execute Java bytecode


ARM Partners
The ARM processor is not sold as a processor chip but as
a hardware IP license.
Licensees add their own logic and customized peripherals
and then manufacture the silicon processor chip.
 Typically sold as ASIC/SOC for embedded applications

Some of the present and past licensees (ARM calls them


Partners) include:
 Texas Instruments, Philips, Analog Devices, Qualcomm
 Intel (StrongARM® and XScale®)
 Atmel – its processor is used on the ARM9 board
ARM Nomenclature
ARM Revision History
ARM Revision History (Contd..)
ARM Processor Family
ARM Core Data Flow Model
Operation unit /
Storage area

Buses

Flow of data
Registers
The active registers available in user mode (one
of the mode of operation of ARM) are shown
here.
There are up to 18 active registers
 16 data registers visible to the programmer as
(r0-r15)
 General Purpose registers (r0 – r12)
 Special Function registers:
 r13 – sp (Stack Pointer) (can also be used as general
purpose)
 r14 – lr (Link Register) (can also be used as general purpose)
 r15 – pc (Program Counter)
 2 processor status registers.
 cpsr (Current Program Status Register)

 Spsr (Saved Program Status Register)


r0 – r13 are orthogonal – any instruction that you
can apply to r0 you can equally well apply to any
of the other registers.
Current Program Status Register

 Generic psr format shown


above  Interrupt Disable bits.
  I = 1: Disables the IRQ.
To Monitor and control internal
operations  F = 1: Disables the FIQ.

 Condition code flags  T Bit


 N = Negative result from ALU  Architecture xT only

 Z = Zero result from ALU  T = 0: Processor in ARM state

  T = 1: Processor in Thumb state


C = ALU operation Carried out
 V = ALU operation oVerflowed  Mode bits
 Specify the processor mode
Processor Modes
 Determines which registers are active and the access rights to cpsr
register itself.
 Each processor mode is either
 Privileged : Allows full read – write access to cpsr
 nonprivileged : allows only read access to control field of cpsr but still
allows read-write access to condition flags.
 The ARM has seven basic operating modes:
 User : nonprivileged mode under which most tasks (applications)
run
 FIQ : entered when a high priority (fast) interrupt is raised
 IRQ : entered when a low priority (normal) interrupt is raised
 Supervisor : entered on reset and generally the mode that OS
kernel operates in
 Abort : used to handle memory access violations
 Undefined : used to handle undefined instructions
 System : privileged mode using the same registers as user mode
that allows full read-write access to cpsr
Complete ARM Register Set
Changing Mode on an Exception
Changing Mode on an Exception
ARM Processor Mode Select bits
Program Counter (r15)
 When the processor is executing in ARM state:
 All instructions are 32 bits wide
 All instructions must be word aligned
 Therefore the value is stored in bits [31:2] with bits
[1:0] undefined (as instruction cannot be half word or
byte aligned)
 When the processor is executing in Thumb state:
 All instructions are 16 bits wide
 All instructions must be half word aligned
 Therefore the value is stored in bits [31:1] with bit [0]
undefined (as instruction cannot be byte aligned)
 When the processor is executing in Jazelle state:
 All instructions are 8 bits wide
 Processor performs a word access to read 4 instructions
at once
Pipeline
What is pipelining :
A mechanism for overlapped execution of several
input sets by partitioning some computation into a
set of k – sub computations (or stages)
 Very nominal increase in the cost of implementation
 Very significant Speed up (Ideally, k)

To attain k times speed up for some computation


Alternative 1: Replicate the hardware by k times
(cost also goes upto k times)
Alternative 2: Split the computation into k stages
(very nominal cost increase)
 Need Buffering
Pipeline
ARM7 Three Stage Pipeline

Filling the Pipeline


Pipeline changes for ARM7TDMI and above

ARM7TDMI
ARM decode
Instruction Thumb®ARM Reg Reg
Shift ALU
Fetch decompress Read Write
Reg Select

FETCH DECODE EXECUTE

ARM9TDMI
ARM or Thumb
Instruction Inst Decode Memory Reg
Shift + ALU Write
Fetch Reg Reg Access
Decode Read
FETCH DECODE EXECUTE MEMORY WRITE
Pipeline changes for ARM7TDMI and above

ARM10

Branch ARM or Memory


Prediction Shift + ALU
Thumb Reg Read Access Reg
Instruction Write
Instruction
Decode Multipl
Fetch Multiply
y Add
FETCH ISSUE DECODE EXECUTE MEMORY WRITE
Pipeline (Contd..)
ARM9 five – stage pipeline (13%)

ARM10 six – stage pipeline (34%)

 Increased pipeline length


 Reduces the amount of work done at each stage

 Higher Operating frequency

 Increases pipeline latency

 Increase the data dependency between the stages.


Pipeline Executing Characteristics
Pipeline Executing Characteristics
From Pipeline filling ;
ARM State PC = PC+8 (2 inst. ahead)
Thumb State PC = PC+4 ( -do-
)
Execution of Branch instruction causes the
ARM core to flush its pipeline.
Branch prediction used by ARM10 reduces
the effect of pipeline flush.
If interrupt occurs, other instructions in the
pipeline will be abandoned and ARM starts
filling the pipeline from appropriate entry in
Vector table can
be at
Exception Handling 0xFFFF0000 on
ARM720T and
When an exception occurs, the ARM: on ARM9/10
family devices
 Copies CPSR into SPSR_<mode> .
 Sets appropriate CPSR bits .
 Change to ARM state .
 Change to exception mode Fast interrupt
 Disable interrupts (if appropriate)
0x1C request
Interrupt
 Stores the return address in LR_<mode>0x18 request
 Sets PC to vector address 0x14 Reserved
To return, exception handler needs0x10 to: Data abort
 Restore CPSR from SPSR_<mode>0x0C Prefetch abort
Software
 Restore PC from LR_<mode> 0x08 interrupt
Undefined
This can only be done in ARM state. 0x04 instruction
The Vector Table
ARM Instruction Set
Data processing Instructions
Branch Instructions
Load store Instructions
Software Interrupt Instructions
Program status register Instructions
Data Processing Instructions
 Consist of:
 Arithmetic: ADD ADC SUB SBC RSB RSC
 Logical: AND ORR EOR BIC
 Comparisons: CMP CMN TST TEQ
 Data movement: MOV MVN
 These instructions only work on registers, NOT memory.
 Syntax:
<Operation>{<cond>}{S} Rd, Rn, Operand2
 Comparisons set flags only - they do not specify Rd
 Data movement does not specify Rn

 Second operand is sent to the ALU via barrel shifter.


Conditional Execution and Flags
ARM instructions can be made to execute conditionally by post
fixing them with the appropriate condition code field.
 This improves code density and performance by reducing the
number of forward branch instructions.
CMP r3,#0 CMP r3,#0
BEQ skip ADDNE r0,r1,r2
ADD r0,r1,r2
skip
By default, data processing instructions do not affect the
condition code flags but the flags can be optionally set by using
“S”. CMP does not need “S”.
loop
… decrement r1 and set flags
SUBS r1,r1,#1
if Z flag clear then branch
BNE loop
Condition Codes
The possible condition codes are listed below
Note AL is the default and does not need to be specified
Immediate Constant
 No ARM instruction can contain a 32 bit immediate
instruction
 The data processing instruction format has 32 bit available
for operand2

 4 bit rotate value (0-15) is multiplied by 2 to give range 0-


30 in steps of 2
 Rule to remember is “8-bits shifted by even number of bit
positions”
Using a Barrel Shifter: The 2nd Operand
Operand Operand Register, optionally with shift operation
1 2  Shift value can be either be:
 5 bit unsigned integer
 Specified in bottom byte of
Barrel another register.
Shifter  Used for multiplication by
constant
Immediate value
 8-bit number with a range 0-
255
ALU  Rotated through even number
of positions
 Allows increased range of 32-
bit constants to be loaded
Result
directly into registers
The Barrel Shifter
Barrel Shifter Operations
Examples for Data Processing
1. MVN r6, #0 ; move not of 32 bit
value to the
; register

2. MOVS r7,r7 ; set the flags


RSBMI r7,r7,#0 ; if neg, r7=0-r7

3. ADD r9,r8,r8,LSL #2 ; r9=r8*5


RSB r10,r9,r9,LSL #3 ; r10=r9*7
Examples for Data Processing
PRE
cpsr = nzcvqiFt_USER
r0 = 0x0000 0000
r1 = 0x8000 0004
MOVS r0, r1, LSL #1

POST
cpsr = nzCvqiFt_USER
r0 = 0x0000 0008
r1 = 0x8000 0004
Multiply and Divide
 There are 2 classes of multiply - producing 32-bit and
64-bit results
 32-bit versions on an ARM7TDMI will execute in 2 - 5
cycles
 MUL r0, r1, r2 ; r0 = r1 * r2
 MLA r0, r1, r2, r3 ; r0 = (r1 * r2) + r3
 64-bit multiply instructions offer both signed and
unsigned versions
 For these instruction there are 2 destination registers
 [U|S]MULL r4, r5, r2, r3 ; r5:r4 = r2 * r3
 [U|S]MLAL r4, r5, r2, r3
; r5:r4 = (r2 * r3) +
r5:r4
 Most ARM cores do not offer integer divide instructions
 Division operations will be performed by C library
Load / Store Instructions
Single register data transfer
LDR STR Word
LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed byte load
LDRSH Signed halfword load

Memory system must support all access sizes


Syntax:
 LDR{<cond>}{<size>} Rd, <address>
STR{<cond>}{<size>} Rd, <address>
Single register load and store addressing
Pre or Post Indexed addressing
Address accessed
 Address accessed by LDR/STR is specified by a base register with
an offset
 For word and unsigned byte accesses, offset can be: Post-index
 An unsigned 12-bit immediate value (i.e. 0 - 4095 bytes)
LDR r0, [r1], #8 ;r0=[r1], r1=r1+8
 A register, optionally shifted by an immediate value
Pre-index
LDR r0, [r1, r2]
LDR r0, [r1, r2, LSL#2]
 This can be either added or subtracted from the base register:
LDR r0, [r1, #-8]
LDR r0, [r1, -r2, LSL#2]
 For half word and signed half word / byte, offset can be:
Update base
 An unsigned 8 bit immediate value (i.e. 0 - 255 bytes)
pointer
 A register (un shifted)
Pre-index
 Choice of pre-indexed or post-indexed addressing
 Choice of whether to update the base pointer (pre-indexed only)
LDR r0, [r1, #-8]! ;r0=[r1-8],r1=r1
ARM addressing Modes
Load and Store Multiples
Syntax:
 <LDM|STM>{<cond>}<addressing_mode> Rb{!}, <register
list>
4 addressing modes:
 LDMIA / STMIA increment after
 LDMIB / STMIB increment before
 LDMDA / STMDA decrement after
 LDMDB / STMDB decrement before
IA IB DA DB
LDMxx r10!, {r0,r1,r4} r4
STMxx r10!, {r0,r1,r4} r4 r1
r1 r0 Increasing
Base Register (Rb) r10 r0 r4 Address
r1 r4
r0 r1
r0

St. add. r10 r10+4 r10- r10-4*3


4*3+4
Load and Store Multiples
Syntax:
 <LDM|STM>{<cond>}<addressing_mode> Rb{!},
<register list>
Load and Store multiple Pairs when
base update is used
Store Multiple Load Multiple

STMIA LDMDB

STMIB LDMDA

STMDA LDMIB

STMDB LDMIA
Example
Example
Example
Stack Operations
Push operation – Placing data onto stack – Store
multiple instruction
Pop operation – Removing data from stack – Load
multiple instruction
Addressing methods
Ascending (A) or Descending(D)
Stack pointer points to Full (F) or Empty (E)
location
Example
Swap Instructions
Swap (SWP)
Swap byte (SWPB)

Ex:
SWP R12, R10, [R9];Load R12 from address R9 and
; store R10 to address R9
SWPB R3, R5, [R6] ;Load a byte to R3
from address R6 and ; store byte
from R5 to address R6
Example
Branch instructions
 Branch : B {<cond>} label
 Branch with Link : BL{<cond>} subroutine_label
 The address Label is stored in the inst. as signed PC – relative offset and must
be within ± 32 Mbyte range
31 28 27 25 24 23 0

Cond 1 0 1 L Offset

Link bit 0 = Branch


1 = Branch with link
Condition field

 The processor core shifts the offset field left by 2 positions,


sign-extends it and adds it to the PC
± 32 Mbyte range
ARM Branches and Subroutines
B <label>
 PC relative. ±32 Mbyte range.
BL <subroutine>
 Stores return address in LR
 Returning implemented by restoring the PC from LR
 For non-leaf functions, LR will have to be stacked
func1 func2

: STMFD sp!, :
: {regs,lr} :
BL func1 : :
: BL func2 :
: : :
LDMFD sp!, MOV pc, lr
{regs,pc}
Branch instructions (contd..)
Branch exchange :BX {<cond>} Rm
 Copies the contents of general purpose register Rm
into PC (PC = Rm & 0xfffffffe
 T bit of cpsr is updated from LSB of Rm
Branch exchange with Link :
BLX{<cond>} subroutine_label | Rm
 Copies the contents of general purpose register Rm or
label into PC (PC = Rm & 0xfffffffe
 Additionally sets the link register with the return
address

Available in T variants of ARM architecture versions 4 and


above
It is primarily used to branch to and from the Thumb code
Software Interrupt (SWI)

SWI causes a software interrupt exception,


which provides a mechanism for applications
to call operating system routines.
Each SWI inst. Has an associated SWI
number, which is used to represent a
particular function call or feature.
Software Interrupt (SWI)
PSR access
31 28 27 24 23 19 16 15 10 9 8 7 6 5 4 0

N Z C V Q de J GE[3:0] IT cond_abc E A I F T mode

flag status extension control


 MRS and MSR allow contents of CPSR / SPSR to be transferred to / from
a general purpose register or take an immediate value
 MSR allows the whole status register, or just parts of it to be updated

 Interrupts can be enable/disabled and modes changed, by writing to the


CPSR
 Typically a read/modify/write strategy should be used:

MRS r0,CPSR ; read CPSR into r0


BIC r0,r0,#0x80 ; clear bit 7 to enable IRQ
MSR CPSR_c,r0 ; write modified value to ‘c’ byte only

 In User Mode, all bits can be read but only the condition flags (_f) can be
modified
Loading Constants
Two pseudo instructions to move a 32 bit
value into a register
LDR Rd, =Constant
 It writes a 32 bit constant into a register

ADR Rd, label


 It writes a relative address into a register
ARM Instruction set format
Questions?

You might also like