ASSEMBLERS
Assemblers
Assembler is a program that converts assembly language program into
the machine language.
The assembly language program given as an input to assembler is
called source program and the machine language output produced by
assembler is called object program.
Role of an
Assemblers…
Functions:
Translate mnemonic operation codes to their
machine language equivalents.
Assign machine addresses to the symbolic labels
used in the program.
Produce information for the loader.
Check the syntactic correctness of the assembly
language program.
Assembler must also perform error detection and
should notify the errors to the user.
Assembly Language
An assembly language is a low-level
programming language designed for a specific type
of processor.
Assembly code can be converted to machine code
using an assembler.
Assembly language makes use of symbols and
mnemonics to represent instructions and addresses.
Because of this reason, assembly language is also
known as symbolic language.
Its heavily depends on the internal organization of
the computer architectural features such as memory
word size, number formats, internal character codes,
index registers and general purpose register, affect
the way assembler instructions are written and way
Elements of assembly language programming
Format of assembly language statement
[Label] <opcode> <operand specification> [, <operand specification> …]
1. […] optional field
2. Label symbolic name associated with memory words
generated for the statement
3. Opcode Mnemonic or pseudo-op
4. Operand specification Type of operand
Example
ADD AREG ONE
Add AREG ONE+10
Simple set of
instructions
Various mnemonic and their instruction
Simple set of
instructions…
Examples
MOVER AREG ONE
MOVEM AREG ONE
Simple set of
instructions…
Condition Codes
Machine Instruction
Format
Machine instruction
format
Here, Opcode occupies 2 digits, register operand
occupies 1 digit and memory operand occupies 3
Typical assembly language
program
Typical assembly language
program…
Types of Assembly language
statements
Assembly language has 3 kinds of
statements:
1. Imperative Statements
2. Declarative Statements
3. Assembler directives
Imperative Statements
An imperative statement indicates an action to be
performed during the execution of assembly program.
Each imperative statement typically translates into
one machine instruction.
Examples of imperative statements are MOVER,
MOVEM, ADD, SUB, MULT etc.
Declarative Statements
Declarative statements are used to declare storage
space or declare a constant and variable.
There are 2 declarative statements : DC and DS
DS Statement
DS stands for declare storage
DS statements reserves blocks of memory
It also associates symbolic names for these allocated
blocks.
The syntax for DS is: [LABEL] DS <Constant>
Examples: A DS 1
X DS 100
Declarative Statements…
DC Statement
DC stands for declare constant
DC constructs the memory blocks containing constants.
The syntax for DC statement is:
[LABEL] DC <Value>
Example: Two DC ‘2’
Assembler directives
Assembler directives direct or instruct the
assembler to perform particular actions
during the assembly of program.
No memory space is reserved
Some commonly used assembler directives
are : START, END, ORGIN, LTORG, EQU
Out of these statements ORIGIN, EQU and
LTORG are called advanced assembler
Assembler directives…
START
This statement specifies the memory address
where first word of the target program
generated by the assembler should be placed.
The syntax for START is: Start <Constant>
As shown in previous assembly program, the 1 st
statement specifies START 200 means READ N
will be placed at address 200.
Assembler directives…
END
This assembler directive indicates the
end of the source program.
The syntax for END statement is:
END [<Operand Specification>]
Constants and Literals
DC statement means ‘declare constant’.
It simply initializes the memory words to given value.
Statement ONE DC ‘1’ means that the memory
word stored at location 214 has value ‘1’ i.e decimal 1.
These values are constants like in a C statement e.g int A=5;
Assembly program uses constant in two ways:
Immediate operands
Literals
A literal is an operand with syntax:
=‘<Value>’
Constants and Literals…
A literal is different from a constant in two
ways:
Literal is called so because its value is
stated literally in the instruction.
Any literal is hence identified with prefix =
followed by specification of value.
Use of
literal in
assembly
language
Assembly Scheme
Specify the problem and identify the
information necessary to perform a task.
Specify the data structures to be used
Define the format of the data structures
required to record the information.
Specify the algorithm to be used to obtain
and maintain the information.
Assembly Scheme…
Phases of Assembler
Analysis Phase
Synthesis Phase
Analysis Phase
It is used to create symbol table and literal table.
Symbol table contains the symbol along with its address
in the program.
In order to create symbol table, analysis phase
determine the address of various symbols.
It perform memory allocation. Such a memory allocation
is implemented by the use of a data structure called
Location counter (LC) .
LC contains address of next memory word in target
program.
Whenever analysis phase sees a label in an assembly
statement, it enters the label and the contents of LC in a
new entry of the symbol table.
Analysis phase then finds the memory words required by
the assembly statement and updates the contents of LC.
Analysis Phase…
Analysis phase performs following tasks:
Isolates the label, mnemonic opcode and operands
fields of a statement. For example , for the statement
AGAIN MULT BREG TERM
If a label is present then the pair (symbol, <LC
contents>) is entered into a new entry of symbol table.
The validity of mnemonic opcode is checked through a
look-up in the mnemonics table.
Analysis Phase…
Information processed in synthesis and
analysis phase
Analysis Phase…
LC processing is performed i.e. the value
contained in LC is updated by considering
the opcodes and operands of the
statement.
Synthesis Phase
The main purpose of synthesis phase is to generate
the machine code for the assembly program.
In order to achieve its purpose of generating object
code it make use of certain data structures like
symbol table, literal table and mnemonics table.
This phase obtains the machine addresses of
different symbols used in the program from symbol
table is generated by analysis phase.
The machine opcode of the mnemonics such as
ADD, SUB,MULT is obtained from mnemonics table.
Synthesis Phase…
This phase obtains all these information by
searching the respective tables with the symbol
name and the mnemonics as keys.
To summarize ,Synthesis phase performs
following tasks:
Obtains machine opcode from mnemonics
table.
Obtains address of memory operands from the
symbol table.
Data structures used by
assembler…
The various data structures used by assembler are:
I. Symbol Table (SYMTAB)
II. Literal Table (LITTAB)
III. Mnemonic Table or Machine operation
table(MOT)
IV. Pseudo –opcode Table (POT) or operation code
table(OPTAB).
V. Location Counter (LC)
VI. Pool Table (POOLTAB)
Data structures used by
assembler…
Symbol Table (SYMTAB)
Data structures used by
assembler…
Literal Table
Data structures used by
assembler…
Mnemonic table or Machine
Operation table(MOT)
Data structures used by
assembler…
Operation code table (OPTAB) or
(POT)
Data structures used by
assembler…
Location Counter (LC)
Pool Table(POOLTAB)
Pass Structure of
Assemblers
Single pass assembler
Two pass assembler
Single pass assembler
Scans input file
Faster than two pass assembler
Constructs Symbol table, Literal Table and uses mnemonic
table and operation table.
Performs LC processing
Problem of forward referencing is handled by backpatching
TII (Table of Incomplete Instructions ) is created to solve
backpatching.
Syntax of TII is:
(<Instruction address>, <Symbol>)
Single pass assembler…
By the time END statement is processed, the SYMTAB would
contain the addresses of all symbols defined in the source
program.
TII would contain information regarding all forward references.
Single pass processes each entry in TII to complete the concerned
instruction.
It generates an object file by simply writing the object program
from memory to a file.
Two Pass Assembler
It scans the input assembly language program twice.
These scans are normally called Pass I and Pass II.
In Pass I all the symbols used in the program are entered
into symbol table and all the literals are entered into
literal table.
Pass II generates the target form using the address
information from symbol table and literal table.
Thus Pass I performs analysis of Source program
and Pass II performs synthesis of target program.
Two Pass assembler can handle forward reference easily.
Two Pass Assembler…
Apart from SYMTAB and LITTAB, pass I also generates
an intermediate representation of source program i.e
processes form of the source program. This partial
processed form is known as intermediate code (IC).
Thus in two pass assembler, Pass I generates symbol
table, literal table and IC and Pass II uses all these
data structures to generate machine code.
Two Pass Assembler…
Two Pass Assembler
Databases used by passes of
assembler
Databases used by Pass I and
Advanced Assembler Directives
1. ORIGIN
This assembler directive is used to set the value of
location counter (LC) to a particular address.
The syntax for ORIGIN is:
ORIGIN <address specification>
The <address specification> can be either
<operand specification> or <constant>. For
example-
ORIGIN LOOP
ORIGIN 200
Advanced Assembler
Directives…
Figure on next slides shows the usage of ORIGIN
statement. Statement number 20 i.e. ORIGIN LOOP+30
sets the LC to the value 532. Since symbol LOOP is
associated with address 502. so 502+30=532. The statement
MULT CREG M is associated with address 532.
Similarly statement number 22 : ORIGIN 520 sets LC to the
value 520. So the next statement N DS 1 is placed at 520.
Advanced Assembler
Directives…
2. EQU
EQU statement defines the symbol to represent the
address.
The syntax for EQU is:
EQU Statement simply associates the name <Symbol>
with <address specification>
Here <address specification> can be an <operand
specification> or <constant>. For example:
I. ONE EQU SUM
II. ONE EQU 550
Assembly program showing usage of ORIGIN,EQU and
LTORG
Assembly program showing usage of ORIGIN,EQU and
LTORG
Advanced Assembler Directives…
3. LTORG
LTORG is used to assign memory locations to various literals
used in the program.
Assembler allocates memory to the literals of a literal pool
whenever LTORG statement is encountered.
Literal pool contains all the literals used in the program since
the last LTORG statement.
By default, assembler places the literals after the END
statement.
In previous figure of program, the literals =‘6’ and =‘1’ are
added to the literal pool in the statements 2 and 6 respectively.
Other Assembler
1. Directives
USING
• USING is a pseudo- opcode that indicates to the assembler which general
register to use as a base register and what its content will be.
•The base register is used to store the address of program in the core just prior to
execution time.
•The USING statement provides information to assembler that what is in the base
register.
•The Syntax for USING is:
Other Assembler
Directives...
2. BALR
BALR instruction loads the base register.
It instructs the computer to load a register with the next
address and branch to the address in second field.
3. PURGE
EQU defines symbolic names to represent values. The names
so defined can be undefined by PURGE statement.
Design of Two Pass Assembler
Pass I of the Assembler
Pass II of the Assembler
Pass I of the Assembler
Main purpose is to assign a location to each instruction, to build Symbol Table, Literal Table
and generate IC.
The operation code field of statement field of statement is then examined to determine what
kind of statement it is.
In case of Imperative statement, length of machine instruction is added to the LC and
length is also entered into the SYMTAB entry.
For Declarative or AD statement the routine mentioned in the mnemonic info field is
called to perform processing of LC.
The operand field of instruction is scanned for the purpose of literals. If a new literal is found,
it is entered into literal table (LITTAB).
The different literal pools is maintained with the help of table called POOLTAB.
When LTORG (or END) statement is encountered, all the literals in the current literal pool are
allocated addresses starting with current value in LC and LC is appropriately incremented.
When END statement is encountered Pass I is terminated.
Table: OPTAB
SYMTAB
LITAB and POOLTAB
Algorithm for PASS I
Algorithm for PASS I…
Flowchart for PASS I
Flowchart for PASS II
Comparison of PASS I and PASS II