Final Project Report
Department of Computer Science
University of Karachi – UBIT
Project Title:
Zypher – A Custom Python-Inspired Programming Language and Compiler
Submitted by:
Maha Khan - EB23210106055
Afreen Ilyas - EB23210106012
Jaweria Ali - EB23210106045
Muhammad Hassan - EB23210106083
Abdul Rehman - EB23210106008
Muhammad Rameez Qureshi - EB23210106090
Supervisor:
MISS FARHEEN SIDDIQUE
Final Project Report 1
Course:
Compiler Construction
Submission Date:
22 May 2025
1. Project Overview
Zypher is a custom-designed, Python-inspired programming language aimed at
providing a simple, readable, and structured syntax for writing small programs.
This language was built from scratch with a full compiler pipeline including:
Lexical Analysis
Context-Free Grammar-based Parsing
Syntax Error Handling
Intermediate Representation
Execution via a Virtual Machine
Token Tracking & Reporting
The Zypher compiler reads code from an input.txt file, analyzes it, reports any
syntax errors, or if valid, executes it and displays the output along with token
information.
2. Objectives
To design a new custom language inspired by Python
To build a complete compiler using Python
To demonstrate error detection in lexical and parsing phases
To support assignment, conditional, and arithmetic expressions
To produce output and token reports upon successful execution
3. Project Components
Final Project Report 2
3.1 Lexical Analyzer
Reads source code from input.txt
Tokenizes input using regular expressions
Detects invalid characters
Returns tokens with line numbers and types
TOKENS :
Token Type Description Examples
Identifier Variable names or user-defined symbols x , total , sum1
Keyword Reserved words in the language print , if
Number Integer literals 5 , 42 , 1000
Operator Mathematical and comparison operators = , + , - , * , / , > , < , ==
Parenthesis Grouping or function call syntax (, )
Colon Marks the start of code blocks :
3.2 Parser (CFG-based)
Converts token stream into an Abstract Syntax Tree (AST)
Based on a recursive descent parser
Validates correct syntax as per CFG rules
feature support list
Feature Description Example
Assignment Statements Assign values to variables x=5
Print Statements Output values to the screen print(x)
If Statements Conditional branching based on expressions if x > 5:
Arithmetic Expressions Mathematical operations with precedence x + y * (z - 1)
3.3 Error Handling
Final Project Report 3
Lexical Errors:
Illegal characters (e.g., $ , @ )
Parsing Errors:
Missing tokens (e.g., : , ) )
Invalid syntax (e.g., print x) )
On errors, the compiler halts and prints a detailed error message with line number.
3.4 Intermediate Representation
AST node types
AST Node Type Structure Purpose
Assignment ('assign', var, expr) Represents a variable assignment
Print Statement ('print', expr) Represents a print command
('if', left_expr, operator,
If Statement Represents a conditional if-statement
right_expr)
Represents binary arithmetic
Binary Operation ('binop', '+', left, right)
expressions
Number Literal ('num', 5) Represents a numeric constant
Variable
('var', x) Represents usage of a variable
Reference
3.5 Virtual Machine
Evaluates AST instructions in order
Maintains a runtime environment (symbol table)
Supports expression evaluation, variable storage, and conditionals
Displays the output of print statements
4. Features
Feature Status
Final Project Report 4
Assignment Statements ✅ Supported
Arithmetic Expressions ✅ Supported
Print Statements ✅ Supported
If Statements ✅ Supported
Else Statements ❌ Not yet implemented
Error Detection ✅ Supported
Token Reporting with Line Numbers ✅ Supported
Total Token Count Display ✅ Supported
Input from File ( input.txt ) ✅ Supported
Output to Terminal ✅ Supported
5. Example Input and Output
Input Code (input.txt):
x=5
y=x*2+3
print(y)
if y > 10:
print(x)
Output:
=== OUTPUT ===
13
Condition passed: 13 > 10
5
=== TOKENS ===
Line 1: (ID, x)
Line 1: (ASSIGN, =)
Line 1: (NUMBER, 5)
Final Project Report 5
Line 2: (ID, y)
Line 2: (ASSIGN, =)
Line 2: (ID, x)
Line 2: (OP, *)
Line 2: (NUMBER, 2)
Line 2: (OP, +)
Line 2: (NUMBER, 3)
Line 3: (PRINT, print)
Line 3: (LPAREN, ()
Line 3: (ID, y)
Line 3: (RPAREN, ))
Line 4: (IF, if)
Line 4: (ID, y)
Line 4: (OP, >)
Line 4: (NUMBER, 10)
Line 4: (COLON, :)
Line 5: (PRINT, print)
Line 5: (LPAREN, ()
Line 5: (ID, x)
Line 5: (RPAREN, ))
Total tokens: 27
6. Technologies Used
Component Technology / Tool
Language Python 3
Libraries re (Regular Expressions)
Development Environment VS Code, Terminal
Parsing Structure Abstract Syntax Tree (AST)
AST Construction Recursive Descent Parsing
AST Nodes Custom node structures like assign , print , binop etc.
Execution Model Tree-walk interpreter via vm.py
Final Project Report 6
7. Directory Structure
zypher_project/
├── lexer.py
├── parser.py
├── vm.py
├── main.py
├── input.txt
└── bytecodegen.py
8. Challenges Faced
Designing a proper CFG to handle nested expressions
Implementing recursive descent parsing for operator precedence
Balancing between flexibility and simplicity of the language
Ensuring error reporting is clear and user-friendly
9. Example on Vs code:
A. Sample Code Snippets
a=3
b=a
c=b+2
print(c)
B. Screenshots
Final Project Report 7
A. Sample Code Snippets
val = 100 @ 2
print(val)
B. Screenshots
Final Project Report 8
10.Error Examples
Error Type Example Code Error Message
Lexical Error x=5$3 Lexical Error: Invalid character '$' on line 1
Syntax Error print x) Syntax Error: Expected '(', found 'x' on line 1
Missing Colon if x > 5 Syntax Error: Missing ':' at end of if statement
Unbalanced Parentheses print((x + 2) Syntax Error: Unmatched '(' on line 1
Invalid Assignment =5 Syntax Error: Expected identifier before '='
Unknown Keyword pront(x) Syntax Error: Unknown keyword 'pront' on line 1
Invalid Operator x => 5 Syntax Error: Invalid operator '=>' on line 1
Unexpected Token x5= Syntax Error: Unexpected token '5' after identifier
11. Conclusion
Zypher is a minimal yet extensible custom language designed to simulate real-
world compiler construction. Through lexical analysis, parsing, error handling, and
execution, Zypher demonstrates all critical phases of compiler design. This project
Final Project Report 9
provided deep insights into language design, parsing theory, and virtual machine
execution.
Final Project Report 10