CSC 401 2
CSC 401 2
LANGUAGES
1.0 Introduction
Before getting into computer programming, we need to understand what are computer
programs and what they do. A computer program is a sequence of instructions written
using a computer programming language to perform a specified task by the computer. The
two important terms used in the above definition are:
• Sequence of instructions
• Computer Programming Language
To understand these terms, consider a situation when someone asks you about how to go
to a nearby supermarket. What exactly do you do to tell him the way to go to Todays Shop?
You will use human language to tell the way to go to Todays Shop, something like this:
First go straight, after half kilometer, take left from the red light and then drive around one
kilometer and you will find Todays Shop at the right. Here, you have used English language
to give several steps to be taken to reach Todays Shop. If they are followed in the following
sequence, then you will reach Todays Shop:
1. Go straight
2. Drive half kilometer
3. Take left
4. Drive around one kilometer
5. Search for Todays Shop at your right side.
Now, try to map the situation with a computer program. The above sequence of
instructions is actually a human program written in English Language, which instructs on
how to reach Todays Shop from a given starting point. This same sequence could have
been given in Spanish, Hindi, Arabic, or any other human language, provided the person
seeking direction knows any of these languages. Now, let's go back and try to understand a
computer program, which is a sequence of instructions written in a computer language to
perform a specified task by the computer. Following is a simple program written in Python
programming language:
print "Hello, World!"
The above computer program instructs the computer to print "Hello, World!" on the
computer screen. A computer program is also called a computer software, which can
range from two lines to millions of lines of instructions.
• Computer program instructions are also called program source code and computer
programming is also called program coding.
• A computer without a computer program is just a dump box; it is programs that
make computers active.
Computer programming is the process that professionals use to write code that instructs
how a computer, application or software program performs. At its most basic, computer
programming is a set of instructions to facilitate specific actions. A programming language
is a computer language that is used by programmers (developers) to communicate with
computers. It is a set of instructions written in any specific language to perform a specific
task. It is made up of a series of symbols that serves as a bridge that allow humans to
translate our thoughts into instructions computers can
Generations of computers have seen changes based on evolving technologies. With each
new generation, computer circuitry, size, and parts have been miniaturized, the processing
and speed doubled, memory got larger, and usability and reliability improved. Note that
the timeline specified for each generation is tentative and not definite. The generations are
actually based on evolving chip technology rather than any particular time frame.
The five generations of computers are characterized by the electrical current flowing
through the processing mechanisms listed below:
assembly languages is that programs are still machine dependent and, in general, only
readable by the authors.
Fourth generation languages/non procedural languages deal with the following two fields
which have become more and more important: database and query languages, and
program or application generators. The steadily increasing usage of software packages like
database systems, spread sheets, statistical packages, and other (special purpose)
packages makes it necessary to have a medium of control available which can easily be
used by non-specialists. In fourth generation languages the user describes what he wants
to be solved, instead of how he wants to solve a problem - as it is done using procedural
languages. In general, fourth generation languages are not only languages, but interactive
programming environments. E.g. SQL (Structured Query Language): a query language for
relational databases based on Codd's requirements for nonprocedural query languages.
Another example is NATURAL emphasizing on a structured programming style. Program or
application generators are often based on a certain specification method and produce an
output (e.g. a high-level program) to an appropriate specification. There exist already a
great number of fourth generation languages.
5GL are programming languages based around solving problems using constraints given to
program rather using an algorithm written by a programmer. 5GL allows computers to have
their own ability to think and their own inferences can be worked out by using the
programmed information in large databases. 5GL gave birth to the dream of robot with AI
and Fuzzy Logic. The fifth-generation languages are also called 5GL. It is based on the
concept of artificial intelligence. It uses the concept that rather than solving a problem
algorithmically, an application can be built to solve it based on some constraints, i.e., we
make computers learn to solve any problem. Parallel processing & superconductors are
used for this type of language to make real artificial intelligence. Advantages of this
generation is that machines can make decisions, it reduces programmer effort to solve a
problem and very easier than 3GL or 4GL to learn and use. Examples are: PROLOG, LISP,
etc.
Example – ENIAC, UNIVAC, Mark –I, mark-III, IBM 700 series, IBM 700 series, IBM 701
series IBM 709 series etc.
Second Generation
Speed – Relatively fast as compared to first generation, thousand instructions per second.
Example – IBM-7000, CDC 3000 series, PDP1, PDP3, PDP 5, PDP8, ATLAS, IBM-7094 etc.
Third Generation
Size – Smaller than Second Generation Computers. Disk size mini computers.
Speed – Relatively fast as compared to second generation, million instructions per second
(MIPS).
Fourth Generation
A) Computer Characteristics and Capabilities 4GL Size – Typewriter size micro computer.
Speed – Relatively fast as compared to third generation, tens of millions of instructions per
second. Cost – Cost lower than third generation. Language– High level languages like C++,
KL1, RPG, SQL. Reliability – Failure of circuits in months. Power– Low power consumption.
Fifth Generation
A) Computer Characteristics and Capabilities 5GL Size – Credit card size microcomputers.
Speed – Billions of instructions per second. Cost – Cost slightly lower than fourth
generation. Language– Artificial Intelligence (AI) Languages like LISP, PROLOG etc Power–
Low power consumption.
It is natural for students to wonder how they will benefit from the study of programming
language concepts. After all, many other topics in computer science are worthy of serious
study. The following is what we believe to be a compelling list of potential benefits of
studying concepts of programming languages.
1.3.1 Increased ability to express ideas/algorithms
In Natural language, the depth at which people think is influenced by the expressive power
of the language they use. In programming language, the complexity of the algorithms that
people implement is influenced by the set of constructs available in the programming
language. The language in which they develop software places limits on the kinds of
control structures, data structures, and abstractions they can use; thus, limiting the forms
of algorithms they can construct. Awareness of a wider variety of programming language
features can reduce such limitations in software development. Programmers can increase
the range of their software development thought processes by learning new language
constructs. In other words, the study of programming language concepts build an
appreciation for valuable language features and constructs and encourages programmers
to use them, even when the language they are using does not directly support such
features and constructs.
Many programmers use the language with which they are most familiar, even though poorly
suited for their new project. It is ideal to use the most appropriate language. If these
programmers were familiar with a wider range of languages and language constructs, they
would be better able to choose the language with the features that best address the
problem. However, it is preferable to use a feature whose design has been integrated into a
language than to use a simulation of that feature, which is often less elegant, more
cumbersome, and less safe.
For instance, knowing the concepts of object-oriented programming (OOP) makes learning
Java significantly easier and also, knowing the grammar of one's native language makes it
easier to learn another language. If thorough understanding of the fundamental concepts
of languages is acquired, it becomes far easier to see how these concepts are
incorporated into the design of the language being learned therefore it is essential that
practicing programmers know the vocabulary and fundamental concepts of programming
languages so they can read and understand programming language descriptions and
evaluations, as well as promotional literature for languages and compilers.
This leads to understanding of why languages are designed the way they are. This is an
ability to use a language more intellectually, as it was designed to be used. We can
become better programmers by understanding the choices among programming language
constructs and the consequences of those choices. Certain kinds of program bugs can be
found and fixed only by a
Many contemporary programming languages are large and complex. It is uncommon for a
programmer to be familiar with and use all of the features of a language uses. By studying
the concepts of programming languages, programmers can learn about previously
unknown and unused parts of the languages they already use and begin to use those
features.
The study of programming language concepts should be justified and the chosen
languages should be well informed so that better languages would eventually squeeze out
poorer ones.
Computers have been applied to a myriad of different areas, from controlling nuclear
power plants to playing video games in mobile phones. Because of this great diversity in
computer-use, programming languages with very different goals have been developed.
This section discusses areas of computer applications and their associated languages.
One of the prerequisites for the development of a programming language is that we have a
definition and a clear understanding of the contents of the application domain concerned.
This is the part of an organization for which application software is developed. This means
that the application domain is our starting point and the context for programming language
to software development. Many development methodologies take this understanding of
the application domain for granted. They assume that the developers somehow know w hat
domain they have to deal with.
elaborate reports, precise ways of describing and storing decimal numbers and character
data, and the ability to specify decimal arithmetic operations.
These are characterized as those whose principal activity involves the manipulation of
natural language text, rather than numbers as their data. SNOBOL and C language have
strong text processing capabilities.
These are characterized as those programs which are designed principally to emulate
intelligent behavior. They include game playing algorithms such as chess, natural language
understanding programs, computer vision, robotics and expert systems. LISP has been the
predominant AI programming language, and also PROLOG using the principle of 'Logic
programming'. Lately, AI applications are written in Java, C++ and python.
System programming applications involve developing those programs that interface the
computer system (the hardware) with the programmer and the operator. These programs
include compilers, assembles, interpreters, input-output routines, program management
facilities and schedules for utilizing and serving the various resources that comprise the
system. Ada, C and Modula 2 are examples of programming languages used.
The World Wide Web is supported by an eclectic collection of languages, ranging from
markup languages, such as HTML, which is not a programming language, to general-
purpose programming languages, such as Java. Because of the pervasive need for dynamic
web content, some computation capability is often included in the technology of content
presentation. This functionality can be provided by embedding programming code in an
HTML document. Such code is often in the form of a scripting language, such as JavaScript
or PHP. There are also some markup-like languages that have been extended to include
constructs that control document processing, collection of languages which includes:
Markup (e.g. XHTML); scripting for dynamic content under; client side, using scripts
embedded in the XHTML documents e.g. JavaScript; PHP Server side, using the common
gateway interface e.g. JSP, ASP, PHP; general purpose, executed on the web server, e.g.
Java, Python, etc.
1. Expressivity: means the ability of a language to clearly reflect the meaning intended
by the algorithm designer (the programmer). Thus, an expressive language permits
an utterance to be compactly stated, and encourages the use of statement forms
associated with structured programming (usually while loops and if – then – else
statements).
2. Well-Definiteness: By well-definiteness, we mean that the language’s syntax and
semantics are free of ambiguity, are internally consistent and complete. Thus, the
implementer of a well-defined language should have, within its definition a
complete specification of all the language’s expressive forms and their meanings.
The programmer, by the same virtue should be able to predict exactly the behavior
of each expression before it is actually executed.
3. Data types and structures: By data types and structures, we mean the ability of a
language to support a variety of data values (integers, real, strings, pointers etc.)
and non-elementary collections of these.
4. Readability: One of the most important criteria for judging a programming language
is the ease with which programs can be read and understood. Maintenance was
recognized as a major part of the cycle, particularly in terms of cost and once the
ease of maintenance is determined in large part by the readability of programs,
readability became an important measure of the quality of programs and
programming languages.
5. Overall Simplicity: The overall simplicity of a programming language strongly affects
its readability. A language with a large number of basic constructs is more difficult
to learn than one with a smaller number.
6. Modularity: Modularity has two aspects: the language’s support for sub-
programming and the language’s extensibility in the sense of allowing programmer –
defined operators and data types. By sub programming, we mean the ability to
define independent procedures and functions (subprograms), and communicate
via parameters or global variables with the invoking program.
7. Input-Output facilities: In evaluating a language Input-Output facility, we are looking
at its support for sequential, indexed, and random-access files, as well as its
support for database and information retrieval functions.
8. Portability: A language which is portable is one which is implemented on a variety of
computers. That is, its design is relatively machine independent. Languages which
are well- defined tend to be more portable than others.
9. Efficiency: An efficient language is one which permits fast compilation and
execution on the machines where it is implemented. Traditionally, FORTRAN and
COBOL have been relatively efficient languages in their respective application
areas.
10. Orthogonality: This in a programming language means that a relatively small set of
primitive constructs can be combined in a relatively small number of ways to build
the control and data structures of the language. Furthermore, every possible
combination of primitives is legal and meaningful.
11. Pedagogy: Some languages have better pedagogy than others. That is, they are
intrinsically easier to teach and to learn, they have better textbooks; they are
implemented in a better program development environment, they are widely known
and used by the best programmers in an application area.
12. Generality: It means that a language is useful in a wide range of programming
applications. For instance, APL has been used in mathematical applications
involving matrix algebra and in business applications as well.
Programming languages can be divided into multiple types based on their approach to
writing and solving problems. Some major categories include imperative, object-oriented,
functional, and logic-based languages.
Object-oriented languages use the concept of objects, encapsulating data and behaviour
in a single unit. This abstraction allows for the creation of reusable, modular code.
Examples include:
• Java: Platform-independent and widely employed, Java follows the "write once, run
anywhere" philosophy, enabling the development of portable, secure, and robust
applications.
• C++: With its support for both procedural and object-oriented programming, C++
offers powerful features, such as classes, inheritance, and polymorphism, for
efficient memory management and code reuse.
• C#: Developed by Microsoft, C# is a versatile language, commonly used for building
Windows applications and games, thanks to its integration with the .NET
framework.
10
Python: Python's versatile nature, syntactical simplicity, and extensive libraries make it an
attractive choice for various applications, from web development to data science and
machine learning.
Ruby: With an emphasis on simplicity and productivity, Ruby's expressive, readable syntax
and object-oriented nature make it a popular choice for web application development.
• Prolog: Prolog is a logic programming language suited for tasks involving symbolic
reasoning and manipulation, such as natural language processing, expert systems
development, and database query languages.
• Mercury: A purely declarative, logic/functional programming language, Mercury
offers strong typing, mode, and determinism systems for creating efficient, reliable,
and maintainable code in various applications.
• Logtalk: An object-oriented extension of Prolog, Logtalk adds modularity,
encapsulation, and inheritance, allowing for the creation of reusable and
maintainable logic-based programs.
• Datalog: A subset of Prolog, Datalog is a rule-based language focused on deductive
database queries and reasoning, offering efficient evaluation strategies and
recursion over large datasets.
11
ASP (Answer Set Programming): ASP is a declarative language used to solve complex
search problems, leveraging efficient solvers and logic programming paradigms to find
solutions in areas such as planning, scheduling, and configuration.
There are several factors to take into account when choosing the right programming
language for a particular project. These factors ultimately influence the performance,
maintainability, and success of your project. Key factors to consider include:
• Project goals: Clearly define the objective of your project and list the features that
the chosen programming language should support. This will help you determine
which language possesses the necessary capabilities to achieve the desired
outcome.
• Type of application: Some programming languages are better suited for specific
types of applications, such as web development, mobile app development, or data
science. Consider the target environment and platform before selecting the
language.
• Development speed: Depending on the project's timeline, select a language that
enables rapid development, or allows for quick prototyping and iteration.
Languages with extensive libraries, frameworks, or tools can help accelerate
development.
• Developer expertise: Choose a language that is familiar to your development team
or can be learned relatively quickly. The team's proficiency in the language is a
significant factor in determining the quality of the final product.
• Performance and scalability: Opt for a language that delivers the required
performance and can handle the application's potential growth over time. The
chosen language should be efficient in terms of processing time and memory
usage.
• Community support: Programming languages with active communities and
extensive documentation help solve technical challenges more quickly and provide
valuable resources for learning and support.
• Maintenance and long-term support: Consider the language's support and
maintainability in the long run, including the availability of updates, bug fixes, and
security patches.
Considering the type of project you are working on is vital when selecting a suitable
programming language. Here is a comprehensive list of popular programming languages
and their suitability for various project types:
When it comes to web development projects, there are multiple languages suitable for
different aspects of the project:
• For front-end development, HTML, CSS, and JavaScript are essential for building
responsive and user-friendly websites.
12
For server-side or back-end development, Python with Django or Flask, Ruby with Ruby on
Rails, PHP with Laravel or Symfony, and Java with Spring or JavaScript Faces (JSF) are
notable options. For full-stack development, using JavaScript with frameworks like
Node.js, Express, and React can help streamline the development process by using a
single language for both front-end and back-end.
In mobile app development, the choice of language often depends on the target platform:
• For native Android app development, Java or Kotlin are the primary choices, with
Kotlin gaining popularity in recent years due to its concise syntax and modern
features.
• For native iOS app development, Swift or Objective-C can be used, with Swift being
the more modern and recommended choice for newer projects.
• For cross-platform app development, React Native (JavaScript) or Flutter (Dart) can
enable the creation of apps for both Android and iOS using a single codebase,
saving development time and resources.
In the field of data science, machine learning and artificial intelligence, several languages
have emerged as popular choices due to their extensive libraries and community support:
13
5.1 Introduction
When programs are developed to solve real-life problems like inventory management,
payroll processing, student admissions, examination result processing, etc. they tend to
be huge and complex. The approach to analyzing such complex problems, planning for
software development and controlling the development process is called programming
methodology. New software development methodologies (e.g. Object-Oriented Software
Development) led to new paradigms in programming and by extension, to new
programming languages. A programming paradigm is a pattern of problem-solving thought
that underlies a particular genre of programs and languages. Also a programming paradigm
is the concept by which the methodology of a programming language adheres to.
Paradigm is a model or world view. Paradigms are important because they define a
programming language and how it works. A great way to think about a paradigm is as a set
of ideas that a programming language can use to perform tasks in terms of machine-code
at a much higher level. These different approaches can be better in some cases, and worse
in others. A great rule of thumb when exploring paradigms is to understand what they are
good at. While it is true that most modern programming languages are general-purpose
and can do just about anything, it might be more difficult to develop a game, for example,
in a functional language than an object-oriented language. Many people classify languages
into these main paradigms:
These are mostly influenced by the von Neumann computer architecture. Problem is
broken down into procedures, or blocks of code that perform one task each. All
procedures taken together form the whole program. It is suitable only for small programs
that have low level of complexity. Typical elements of such languages are assignment
statements, data structures and type binding, as well as control mechanisms; active
procedures manipulate passive data objects. Example, for a calculator program that does
addition, subtraction, multiplication, division, square root and comparison, each of these
operations can be developed as separate procedures. In the main program, each
procedure would be invoked on the basis of user's choice. E.g. FORTRAN Algol, Pascal,
C/C++, C#, Java, Perl, JavaScript, Visual BASIC.NET.
These types of languages have no assignment statements. Their syntax is closely related to
the formulation of mathematical functions. Thus, functions are central for functional
programming languages. Here the problem, or the desired solution, is broken down into
functional units. Each unit performs its own task and is self-sufficient. These units are then
stitched together to form the complete solution. Example - A payroll processing can have
functional units like employee data maintenance, basic salary calculation, gross salary
calculation, leave processing, loan repayment processing, etc. E.g. LJSP, Scala, Haskell,
Python, Clojure, Erlang. It may also include OO (Object Oriented) concepts.
Facts and rules (the logic) are used to represent information (or knowledge) and a logical
inference process is used to produce results. In contrast to this, control structures are not
explicitly defined in a program, they are part of the programming language (inference
mechanism). Here the problem is broken down into logical units rather than functional
units. Example: In a school management system, users have very defined roles like class
teacher, subject teacher, lab assistant, coordinator, academic in-charge, etc. So the
software can be divided into units depending on user roles. Each user can have different
interface, permissions, etc. e.g. PROLOG, PERL, this may also include OO concepts.
Know that object represents data as well as procedures. Data structures and their
appropriate manipulation processes are packed together to form a syntactical unit. Here
the solution revolves around entities or objects that are part of problem. The solution deals
with how to store data related to the entities, how the entities behave and how they
interact with each other to give a cohesive solution. Example, if we have to develop a
payroll management system, we will have entities like employees, salary structure, leave
rules, etc. around which the solution must be built. E.g. SIMULA 67, SMALLTALK, C++, Java,
Python, C#, Perl, Lisp or EIFFEL.
5.2.5.1 Top-down approach: The problem is broken down into smaller units, which may be
further broken down into even smaller units. Each unit is called a module. Each module is
a self-sufficient unit that has everything necessary to perform its task.
===== Page 16
Here is the retyped content from page 16 to the end of the document:
A trade-off is made when using an interpreted language. One can trade speed of
development for higher execution costs. Because each line of an interpreted program must
be translated each time it is executed, there is a higher overhead. Thus, an interpreted
language is generally more suited to ad hoc requests than predefined requests. That is
especially true for programs that are based around manipulating state over a long term.
Trade-off is a clear, logically simple structure that makes complex algorithms easy to build
right and scales well but makes stateful systems harder to build. There are many trade-offs
in language design such as:
• Reliability: This takes into account the time required for malfunction detection and
reconfiguration or repair.
• Expandability: Measures the computer system’s ability to conveniently
accommodate increased requirements by higher speed or by physical expansion
without the cost of a major redesign. Modularity is a desirable method for providing
expandability and should be incorporated whenever feasible.
• Programmability: There should be a balance between programming simplicity and
hardware complexity to prevent the cost of programming from becoming
overwhelming. The degree of software sophistication and the availability of support
software should be considered during the design.
• Maintainability: Should not be neglected when designing the computer; repair
should be readily accomplished during ground operation.
• Compatibility: This should be developed between computer and interfaces;
software, power levels to facilitate programming.
• Adaptability: Is defined as the ability of the system to meet a wide range of
functional requirements without requiring physical modifications.
• Availability: Is the possibility that the computer is operating satisfactorily at a given
time. It is closely related to reliability.
• Development status and cost: Are complex management factors which can have
significant effects on the design as well. They require the estimation of a number of
items such as the extent of off-the-shelf hardware use, design risks in developing
new equipment using advanced technologies, potential progress in the state of the
art during the design and development.
7.1 Compilation
7.2 Interpretation
1. Read expression in the input language (usually translating it into some internal
form).
2. Evaluates the internal forms of the expression.
3. Print the result of the evaluation.
4. Loops and reads the next input expression until exit.
7.3 Hybrid
Some language implementation systems are a compromise between compilers and pure
interpreters; they translate high-level language programs to an intermediate language
designed to allow easy interpretation. This method is faster than pure interpretation
because the source language statements are decoded only once. Such implementations
are called hybrid implementation systems.
After hybrid, then compile subprograms code the first time they are called. This
implementation initially translates programs to an intermediate language then compiles
the intermediate language of the subprograms into machine code when they are called.
Machine code version is kept for subsequent calls. Just-in-time systems are widely used
for Java programs. Also, .NET languages are implemented with a JIT system.
Finally, language users must be able to determine how to encode software solutions by
referring to a language reference manual. Textbooks and courses enter into this process,
but language manuals are usually the only authoritative printed information source about a
language. The study of programming languages, like the study of natural languages, can be
divided into examinations of syntax and semantics. The syntax of a programming language
is the form of its expressions, statements, and program units. Its semantics is the meaning
of those expressions, statements, and program units.
The semantics of this statement form is that when the current value of the Boolean
expression is true, the embedded statement is executed. Otherwise, control continues
after the while construct. Then control implicitly returns to the Boolean expression to
repeat the process. Although they are often separated for discussion purposes, syntax and
semantics are closely related. In a well-designed programming language, semantics
should follow directly from syntax; that is, the appearance of a statement should strongly
suggest what the statement is meant to accomplish. Describing syntax is easier than
describing semantics, partly because a concise and universally accepted notation is
available for syntax description, but none has yet been developed for semantics.
A language, whether natural (such as English) or artificial (such as Java), is a set of strings
of characters from some alphabet. The strings of a language are called sentences or
statements. The syntax rules of a language specify which strings of characters from the
language's alphabet are in the language. English, for example, has a large and complex
collection of rules for specifying the syntax of its sentences. By comparison, even the
largest and most complex programming languages are syntactically very simple.
Syntax is the set of rules that define what the various combinations of symbols mean. This
tells the computer how to read the code. Syntax refers to a concept in writing code dealing
with a very specific set of words and a very specific order to those words when we give the
computer instructions. This order and this strict structure is what enables us to
communicate effectively with a computer. Syntax is to code, like grammar is to English or
any other language. A big difference though is that computers are really exacting in how we
structure that grammar or our syntax. This syntax is why we call programming coding. Even
amongst all the different languages that are out there. Each programming language uses
different words in a different structure in how we give it information to get the computer to
follow our instructions. Syntax analysis is a task performed by a compiler which examines
whether the program has a proper associated derivation tree or not. The syntax of a
programming language can be interpreted using the following formal and informal
techniques:
• Lexical syntax: For defining the rules for basic symbols involving identifiers, literals,
punctuators, and operators.
• Concrete syntax: Specifies the real representation of the programs with the help of
lexical symbols like its alphabet.
• Abstract syntax: Conveys only the vital program information.
The syntax of a programming language is used to signify the structure of programs without
considering their meaning. It basically emphasizes the structure, layout of a program with
their appearance. It involves a collection of rules which validates the sequence of symbols
and instructions used in a program. In general, languages can be formally defined in two
distinct ways: by recognition and by generation.
The syntax analysis part of a compiler is a recognizer for the language the compiler
translates. In this role, the recognizer need not test all possible strings of characters from
some set to determine whether each is in the language. Rather, it need only determine
whether given programs are in the language. In effect then, the syntax analyzer determines
whether the given programs are syntactically correct. The structure of syntax analyzers,
also known as parsers as discussed before. Language recognizer is like a filter, separating
legal sentences from those that are incorrectly formed.
A language generator is a device that can be used to generate the sentences of a language.
A generator seems to be a device of limited usefulness as a language descriptor. However,
people prefer certain forms of generators over recognizers because they can more easily
read and understand them. By contrast, the syntax-checking portion of a compiler (a
language recognizer) is not as useful a language description for a programmer because it
can be used only in trial-and-error mode. For example, to determine the correct syntax of a
particular statement using a compiler, the programmer can only submit a speculated
version and note whether the compiler accepts it. On the other hand, it is often possible to
determine whether the syntax of a particular statement is correct by comparing it with the
structure of the generator. There is a close connection between formal generation and
recognition devices for the same language which led to formal languages.
8.2 Parsing
In linguistics, parsing is the process of analyzing a text, made of a sequence of tokens (for
example, words), to determine its grammatical structure with respect to a given (more or
less) formal grammar. Parsing can also be used as a linguistic term, especially in reference
to how phrases are divided up in garden path sentences.
8.2.1 Parser
The task of the parser is essentially to determine if and how the input can be derived from
the start symbol of the grammar. This can be done in essentially two ways:
Syntax analysis is another phase of the compiler design process in which the given input
string is checked for the confirmation of rules and structure of the formal grammar. It
analyses the syntactical structure and checks if the given input is in the correct syntax of
the programming language or not.
Parsing only verifies that the program consists of tokens arranged in a syntactically valid
combination. Now we’ll move forward to semantic analysis, where we delve even deeper
to check whether they form a sensible set of instructions in the programming language.
Whereas any old noun phrase followed by some verb phrase makes a syntactically correct
English sentence, a semantically correct one has subject-verb agreement, proper use of
gender, and the components go together to express an idea that makes sense. For a
program to be semantically valid, all variables, functions, classes, etc. must be properly
defined, expressions and variables must be used in ways that respect the type system,
access control must be respected, and so forth. Semantic analysis is the front end’s
penultimate phase and the compiler’s last chance to weed out incorrect programs. We
need to ensure the program is sound enough to carry on to code generation.
Semantics in a programming language is used to figure out the relationship among the
syntax and the model of computation. It emphasizes the interpretation of a program so
that the programmer could understand it in an easy way or predict the outcome of program
execution. An approach known as syntax-directed semantics is used to map syntactical
constructs to the computational model with the help of a function. Semantic analysis is to
provide the task acknowledgment and statements of a semantically correct program. The
following are styles of semantics.
9.1 Operational
Determining the meaning of a program in place of the calculation steps which are
necessary to idealized execution. Some definitions used structural operational semantics
which intermediate state is described on the basis of the language itself others use
abstract machine to make use of more ad-hoc mathematical constructions. With an
operational semantics of a programming language, one usually understands a set of rules
for its expressions, statements, programs, etc., are evaluated or executed. These
guidelines tell how a possible implementation of a programming language should be
working and it is not difficult to give skills an implementation of an interpreter of a language
in any programming languages simply by monitoring and translating it operational
semantics of the language destination deployment.
9.2 Denotational
The definition of a program defining indirectly, by providing the axioms of logic to the
characteristics of the program. Compare with specification and verification.
Types of semantic analysis involves the following: static and dynamic semantics.
The static semantics defines restrictions on the structure of valid texts that are hard or
impossible to express in standard syntactic formalisms. For compiled languages, static
semantics essentially include those semantic rules that can be checked at compile time.
Examples include checking that every identifier is declared before it is used (in languages
that require such declarations) or that the labels on the arms of a case statement are
distinct. Many important restrictions of this type, like checking that identifiers are used in
the appropriate context (e.g. not adding an integer to a function name), or that subroutine
calls have the appropriate number and type of arguments, can be enforced by defining
them as rules in a logic called a type system. Other forms of static analyses like data flow
analysis may also be part of static semantics. Newer programming languages like Java and
C# have definite assignment analysis, a form of data flow analysis, as part of their static
semantics.
Once data has been specified, the machine must be instructed to perform operations on
the data. For example, the semantics may define the strategy by which expressions are
evaluated to values, or the manner in which control structures conditionally execute
statements. The dynamic semantics (also known as execution semantics) of a language
defines how and when the various constructs of a language should produce a program
behavior. There are many ways of defining execution semantics. Natural language is often
used to specify the execution semantics of languages commonly used in practice. A
significant amount of academic research went into formal semantics of programming
languages, which allow execution semantics to be specified in a formal manner. Results
from this field of research have seen limited application to programming language design
and implementation outside academia.
It uses syntax tree and symbol table to check whether the given program is semantically
consistent with language definition. It gathers type information and stores it in either
syntax tree or symbol table. This type information is subsequently used by compiler during
intermediate code generation.
Some of the semantics errors that the semantic analyzer is expected to recognize:
• Type mismatch.
• Undeclared variable.
• Reserved identifier misuse.
• Multiple declaration of variable in a scope.
• Accessing an out of scope variable.
• Actual and formal parameter mismatch.
1. Type Checking: Ensures that data types are used in a way consistent with their
definition.
2. Label Checking: A program should contain labels references.
3. Flow Control Check: Keeps a check that control structures are used in a proper
manner (example: no break statement outside a loop).
9.6 Fundamental Semantic Issues of Variables, Nature of Names and Special Words in
Programming Languages
9.6.1 Variables
Variables in programming tell how the data is represented which can range from very
simple value to complex one. The value they contain can be changed depending on
condition. When creating a variable, we also need to declare the data type it contains. This
is because the program will use different types of data in different ways. Programming
languages define data types differently: Data can hold a very simple value like an age of the
person to something very complex like a student track record of his performance of whole
year. It is a symbolic name given to some known or unknown quantity or information; for
the purpose of allowing the name to be used independently of the information it
represents. Compilers have to replace variables' symbolic names with the actual locations
of the data. While the variable name, type, and location generally remain fixed, the data
stored in the location may get altered during program execution.
For example, almost all languages differentiate between integers (or whole numbers, e.g.,
12), non-integers (numbers with decimals, e.g., 0.24), and characters (letters of the
alphabet or words). In programming languages, we can distinguish between different type
levels which from the user's point of view form a hierarchy of complexity, i.e., each level
allows new data types or operations of greater complexity.
• Elementary level: Elementary (sometimes also called basic or simple) types, such
as integers, reals, booleans, and characters, are supported by nearly every
programming language. Data objects of these types can be manipulated by well-
known operators, like +, -, *, or /, on the programming level. It is the task of the
compiler to translate the operators onto the correct machine instructions, e.g.,
fixed-point and floating-point operations.
• Structured level: Most high-level programming languages allow the definition of
structured types which are based on simple types. We distinguish between static
and dynamic structures. Static structures are arrays, records, and sets, while
dynamic structures are a bit more complicated, since they are recursively defined
and may vary in size and shape during the execution of a program. Lists and trees
are dynamic structures.
• Abstract level: Programmer-defined abstract data types are a set of data objects
with declared operations on these data objects. The implementation or internal
representation of abstract data types is hidden to the users of these types to avoid
uncontrolled manipulation of the data objects (i.e., the concept of encapsulation).
9.6.3 Binding
Binding describes how a variable is created and used (or "bound") by and within the given
program, and, possibly, by other programs, as well. There are two types of binding:
Dynamic and Static binding.
• Dynamic Binding: Also known as Dynamic Dispatch, is the process of mapping a
message to a specific sequence of code (method) at runtime. This is done to
support the cases where the appropriate method cannot be determined at compile-
time. It occurs first during execution, or can change during execution of the
program.
• Static Binding: It occurs first before run time and remains unchanged throughout
program execution.
9.6.4 Scope
The scope of a variable describes where in a program's text, the variable may be used,
while the extent (or lifetime) describes when in a program's execution a variable has a
(meaningful) value. Scope is a lexical aspect of a variable. Most languages define a specific
scope for each variable (as well as any other named entity), which may differ within a given
program. The scope of a variable is the portion of the program code for which the variable's
name has meaning and for which the variable is said to be "visible". It is also of two types:
static and dynamic scope.
• Static Scope: The static scope of a variable is the most immediately enclosing
block, excluding any enclosed blocks where the variable has been re-declared. The
static scope of a variable in a program can be determined by simply studying the
text of the program. Static scope is not affected by the order in which procedures
are called during the execution of the program.
• Dynamic Scope: The dynamic scope of a variable extends to all the procedures
called thereafter during program execution, until the first procedure to be called
that re-declares the variable.
9.6.5 Referencing
The referencing environment is the collection of variables which can be used. In a static
scoped language, one can only reference the variables in the static reference environment.
A function in a static scoped language does have dynamic ancestors (i.e., its callers), but
cannot reference any variables declared in that ancestor.
A declaration in a program refers to a statement that provides the data about the name and
type of data objects to the programming language translators. For example, consider the
following C declaration:
int a, b;
This declaration provides the programming language translator with the information that a
and b are the data objects of type integer that are needed during the execution of the
subprogram. The declaration also defines the binding of the data objects to the name a
and b during their lifetimes.
• Type Checking: The declaration allows the programmers for static type checking
i.e., checking the types of data objects at compile time rather than at execution
time.
• Choice of Storage Representation: The declaration supports the data about the
type of the declared data object which helps the programming language translator
to decide the good possible storage representation for that data object. This helps
in reducing the overall storage requirement and execution time for the program
being translated.
• Storage Management: The declarations also serve to indicate the desired lifetime
of the data object that creates it possible to use a more effective storage
management process during program execution. For example, in C some of the
data objects can be declared at the beginning of a subprogram while some other
data objects are generated dynamically by the use of a specific function malloc.
• Polymorphic Operations: An operation is said to be a polymorphic operation if an
operation may take on a variety of implementation depending upon the types of its
arguments. The declaration allows the programming language translator to decide
at compile time the specific operation named by an overloaded operation symbol.
For example, in C, the declarations of data object a and b helps in determining the
possible addition operation (integer addition or float addition) named by a + b.
10.3.1 Scope
The scope of a variable is the range of program statements that can access that variable.
The lifetime of a variable is the interval of time in which storage is bound to the variable. A
variable is visible within its scope and invisible or hidden outside it. The scope of an
identifier is that portion of the program code in which it is visible, that is, it can be used.
Variables can be bound to a scope either statically or dynamically.
• Static Scope: Static scope defines the scope of a variable in terms of lexical
structure of a program. Using static scope, each reference to a variable is statically
bound to a particular (implicit or explicit) variable declaration. All variable
references can be resolved by looking at the program’s source code and is
independent of execution. Static code rules are used by most traditional imperative
programming languages.
• Dynamic Scope: It defines the scope of a variable in terms of program execution.
Each variable declaration extends its effects over all subsequent statement
execution, until a new declaration for identifier is encountered. Dynamic scope
rules are easy to implement but have drawbacks. Dynamic scopes is most often
used by interpreted languages such as APL, LISP, and SNOBOL.
The scope of a variable can be either local or global. Global variable’s scope includes all
the statements in a program. The scope of a local variable includes only statements inside
the function in which it is declared. The same identifier can be reused inside different
functions to name different variables. A name is local if it is declared in the current scope,
and it is global if declared in an outer scope.