Compilers Project
Topic: Compiler for Flat Tiny C (FLTC)
Compilers
A Compiler translates a program in a source language to an equivalent program in a target language
Source Program
Compiler
Target Program
Typically
Source Languages - C, C++, ADA, FORTRAN etc.
Target languages - Instruction set of a microprocessor
An assembler translates assemble language programs in to object
code.
GCC
Structure of a Three Phase Compiler
Front End Optimize r Back End
Compile r
Source Progra m
IR
IR
Target ALP
An Optimizer
Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code
May also improve space, power consumption,
Must preserve meaning of the code Measured by values of named variables
progra m Lexical Analyzer token stream Syntax Analyzer syntax tree Symbol Table Semantic Analyzer syntax tree Intermediate Code Generator intermediate representation Machine-Independent Code Optimizer intermediate representation Code Generator
Structure of a Typical Compiler
Front End
Instruction Selection Register Allocation Instruction Scheduling MachineDependent Code Optimizer
Back End
target-machine code
target-machine code
The Front End
program Lexical Analyzer/Scanner token stream Syntax Analyzer/Parser syntax tree Symbol Table Semantic Analyzer syntax tree Intermediate Code Generator intermediate representation
The Front End: Scanner and Parser
Source Code
Scanner
Tokens
Parser
IR Error s
Parser
Takes as input a stream of tokens
Checks if the stream of tokens constitutes a syntactically valid program of
the language
If the input program is syntactically correct
Output an intermediate representation of the code (like AST)
If the input program has syntactic errors Outputs relevant diagnostic information
Context Free Grammars and Programming Languages Expr Binop Expr Binop Expr | Expr | ! Expr | ( Expr ) Arithop | Relop | Eqop | Condop
Arithop + | | * | / | % | << | >>
Relop < | > | <= | >=
Eqop
== | !=
Condop && | ||
CFGs and Programming Languages
Statement Location = Expr ; | MethodCall ;
|
|
if ( Expr ) Block
if ( Expr ) Block else Block
while ( Expr ) Block
| continue ; | Block
Block { VarDeclList StatementList } StatementList Statement | Statement StatementList
Context Free Grammars and Programming Languages
Key Idea: All modern programming languages can be expressed using context free grammars (by design!)
Programs have recursive structures
A program is a collection of functions A function is a sequence of statements A statement can be any of if, while, for, assignment statements etc. The body of a while loop is a sequence of statements An arithmetic expression is a sum/product of two AEs.
CFGs are a nice way of expressing programs with recursive structure
CFGs and Programming Languages
Advantages of using CFGs to specify syntactic structure of languages
Clear and concise syntactic specification for languages Language can be developed or evolved iteratively
New constructs in the language can be added with relative ease.
Programming languages can be specified using a special sub-class of CFGs for which efficient parsing techniques and automatic parser generators exists.
These special class of CFGs also allow for automatically capturing ambiguities in
the language
CFGs impose a structure on the program which facilitates easy translation to
intermediate or target object code.
Grammar for FTC
Program class main { Field_Decl* Statement* } Field_Decl Type { id | id [ int_literal ] }+, ;
Type int | boolean | char
Statement Labelled_Statement | Location = Expr ; | if Expr then goto label ; | goto label ; | Method_Call; Labelled_Statement label Statement //Think of label as id: // label is a token like id
Grammar for FTC
Location id | id [ Expr ] Expr Literal | Location | Expr Binop Expr
| - Exp | ! Expr | ( Expr )
Binop Arithop | Relop | Condop Arithop + | - | * | / Relop < | > | <= | >= | == | != Condop && | || Method_Call print( Expr +,); | read(Location); Literal int_literal | string_literal | char_literal | bool_literal
Parsing Approaches
Cocke-Younger-Kasami (CYK) algorithm can construct a parse tree
for a given string and CFG in (n3) worst-case time.
Earleys algorithm
O(n3) for general CFGs
O(n2) for unambiguous grammars
We would like to have linear-time algorithms for parsing programs.
Yacc (Bison)
Structure of a Yacc Specification file
Yacc
flex can generate the yylex() function using the lexical specification.
Yacc
Abstract Syntax Trees (ASTs)
Compilers often use an abstract syntax tree instead of a parse tree The AST summarizes grammatical structure, without including detail
about the derivation
x+2-y
This is much more concise ASTs are one kind of intermediate representation (IR)
Abstract Syntax Trees
While if
expr subtree
statement subtree
expr subtree
statement subtree
if-else
expr subtree
ifstatement subtree
elsestatement subtree
AST Construction for Expression Grammar
%{ struct { enum Op op; struct astnode *left; struct astnode *right } astnode; #define YYSTYPE struct astnode *; %} %token NUMBER %left - + %left * / %%
Note: This yacc specification is not complete. I highlighted only the important parts.
expr: expr + expr { $$ = getNewAstnode(); $$->op = plus; $$->left = $1; $$->right = $3; }
| expr - expr { $$ = getNewAstnode(); $$->op = minus; $$->left = $1; $$->right = $3; } | expr * expr { $$ = getNewAstnode(); $$->op = mult; $$->left = $1; $$->right = $3; } | expr / expr { $$ = getNewAstnode(); $$->op = div; $$->left = $1; $$->right = $3; }
Abstract Syntax Trees
While if
expr subtree
statement subtree
expr subtree
statement subtree
if-else
expr subtree
ifstatement subtree
elsestatement subtree
AST Construction
Note: The way statement lists are handled here is different from the code I have shown in the class.
Symbol Tables