Automata Theory Lecture Notes
Automata Theory Lecture Notes
Automata theory is the foundational study of computational models that perform calculations, recognize patterns, and
solve problems. Its relevance lies in the following domains:
1. Understanding Computation:
Automata theory defines the fundamental concepts of computation, helping us understand what problems can and
cannot be solved by machines.
It provides formal tools to classify computational problems based on their complexity and solvability.
2. Practical Applications:
Automata are widely used in software design, hardware design, natural language processing, and other fields.
Examples include lexical analyzers in compilers, regular expression engines, and control systems in hardware.
Automata theory serves as a bridge between theoretical computer science and practical applications by providing
models for real-world systems.
The study of automata helps in creating efficient algorithms and robust computational models.
Finite automata are one of the simplest yet most powerful models of computation. Informally, they can be thought of as
machines that:
Key Characteristics:
1. Deterministic Behavior:
In deterministic finite automata (DFA), for each input symbol, the machine transitions to exactly one state.
2. Non-Deterministic Behavior:
In non-deterministic finite automata (NFA), multiple transitions for a single input symbol are allowed, offering
multiple paths for computation.
Real-World Analogy:
It transitions through states based on user inputs (coin insertion, button presses).
1/145
It determines the output (dispense item) based on its current state and input sequence.
Limitations:
Finite automata have no memory beyond their current state, making them unsuitable for recognizing patterns that
require counting (e.g., matching nested parentheses).
Structural Representations
1. State Diagrams:
A special node denotes the start state, and one or more nodes are designated as accept states.
2. Transition Tables:
Tabular representation showing transitions for each state and input symbol.
3. Formal Definition:
Automata are crucial for understanding computational complexity classes such as P, NP, and PSPACE.
2. Key Concepts:
Decidability: Automata help formalize the concept of decidability—problems that can be solved algorithmically.
Efficiency: Automata provide models to analyze the time and space complexity of algorithms.
3. Applications in Complexity:
2/145
Lexical analysis in compilers leverages finite automata to tokenize input efficiently.
Pushdown automata form the basis of parsing algorithms for context-free grammars.
Conclusion:
Automata theory provides a theoretical framework to study computation and its limitations. It is indispensable for
understanding how computational systems are designed and optimized. The introductory concepts of finite automata, their
structural representations, and their relation to complexity establish a foundation for advanced study in computation and
language theory.
Deductive Proofs
A deductive proof establishes the truth of a statement by logically deriving it from previously known facts, axioms, or
theorems. It ensures that if the premises are true, the conclusion must also be true.
Example:
1. Premises:
2. Proof:
3. Conclusion:
Deductive proofs like this rely on the clear use of definitions and logical steps to establish results.
Reduction to Definitions
Reduction to definitions involves proving a statement by directly using the formal definitions of the terms involved. This
approach is particularly useful for simple properties and fundamental results.
Example:
3/145
Prove that the product of two odd numbers is always odd.
3. Conclusion:
Reduction to definitions provides direct insight into why the statement is true.
While many theorems are presented in "if-then" form, others may involve equivalent statements, constructive proofs, or
claims of uniqueness.
Example:
2x = 7 − 3 = 4 ⇒ x=2
Thus, x = 2 satisfies the equation.
2. Uniqueness:
3. Conclusion:
Theorem forms like this often combine proof techniques to establish both existence and uniqueness.
Some theorems are not explicitly written as "if-then" statements but can often be reformulated in this way for clarity.
Example:
4/145
2. Proof:
3. Conclusion:
Such theorems illustrate the flexibility in how statements can be expressed and proven.
Conclusion:
Formal proofs, through deductive reasoning, reduction to definitions, and alternative theorem forms, provide a structured
method for verifying mathematical claims. Understanding these techniques lays the foundation for rigorous reasoning
across mathematics and other theoretical disciplines.
2/2
To prove that two sets A and B are equivalent (A = B ), we must show two properties:
1. Subset Relation: A ⊆ B and B ⊆ A.
2. Bijection (in some contexts): If relevant, establish a one-to-one correspondence between the elements of A and B .
Example:
Since A ⊆ B and B ⊆ A, A = B .
Proving equivalence about sets often requires carefully verifying both subset relations.
5/145
The Contrapositive
A contrapositive reformulates an "if-then" statement P ⟹ Q as ¬Q ⟹ ¬P . Both forms are logically equivalent, and
the contrapositive is often easier to prove.
Example:
1. Contrapositive Formulation:
The contrapositive is particularly useful when proving the original statement directly is challenging.
Proof by Contradiction
Proof by contradiction establishes the truth of a statement P by assuming ¬P (its negation) and deriving a logical
inconsistency.
Example:
Prove: 2 is irrational.
2. Derive Contradiction:
p2
Squaring both sides gives 2 = q2
, or
p2 = 2q 2 .
Thus, p2 is even, implying p is even (p = 2k ).
Substituting p = 2k into p2 = 2q 2 gives (2k)2 = 2q 2 , or 4k 2 = 2q 2 , which simplifies to q 2 = 2k 2 .
Hence, q 2 is even, implying q is even.
3. Contradiction:
Proof by contradiction is a powerful tool for establishing the impossibility of certain claims.
6/145
Counterexamples
A counterexample disproves a universal statement by providing a specific instance where the statement fails.
Example:
1. Counterexample:
2. Conclusion:
Since there exists a prime number that is not odd, the statement is false.
Counterexamples provide direct evidence against universal claims and are invaluable for clarifying misconceptions.
Conclusion:
These additional forms of proof—proving equivalence about sets, the contrapositive, proof by contradiction, and
counterexamples—enhance the mathematical toolkit for rigorous reasoning. Each method has its strengths and is suited to
specific types of problems, offering flexibility in constructing logical arguments.
Inductions to Integers
The principle of mathematical induction proves statements about integers by demonstrating two key steps:
1. Base Case:
Show that the statement holds for the initial value, typically n = 0 or n = 1.
2. Inductive Step:
Example:
n(n+1)
Prove: 1 + 2 + ⋯ + n = 2
for all
n ≥ 1.
1. Base Case (n = 1):
Left-hand side: 1.
1(1+1)
Right-hand side: 2
= 1.
Thus, the statement holds for n = 1.
2. Inductive Step:
7/145
k(k + 1)
1+2+⋯+k = .
2
3. Conclusion:
n(n+1)
By induction, 1 + 2 + ⋯ + n = 2 for all n ≥ 1.
While standard induction proves statements for n ≥ 1, more general forms extend this principle:
1. Strong Induction:
Example:
3. Conclusion:
Used for proving statements about finite sequences by reversing the direction of induction.
Structural Inductions
8/145
Structural induction generalizes mathematical induction to prove properties of recursively defined objects, such as trees or
strings.
Example:
Prove: For a binary tree T , the number of leaves is one more than the number of internal nodes.
1. Base Case:
A tree with a single node (leaf) has 0 internal nodes and 1 leaf. The statement holds.
2. Inductive Step:
Combining T1 and T2 under R adds one internal node (the root R) and no new leaves, preserving the
relationship.
3. Conclusion:
Mutual Inductions
Mutual induction proves interdependent statements simultaneously by leveraging their recursive definitions.
Example:
Prove:
Conclusion:
Inductive proofs, including standard, strong, and structural induction, along with mutual induction, provide a systematic
framework for proving propositions over integers and recursively defined structures. These methods are essential for
9/145
verifying properties in mathematics and theoretical computer science.
1. Alphabets (Σ)
An alphabet is a finite, non-empty set of symbols used as the building blocks for constructing strings.
Examples:
Properties:
Alphabets are essential for defining strings, which in turn form the basis of languages.
2. Strings
A string is a finite sequence of symbols from an alphabet Σ. The set of all possible strings over Σ is denoted Σ∗ .
Key Terms:
Operations on Strings:
Suffix: A substring that ends at the end of w . Example: cde is a suffix of abcde.
10/145
5. Lengthening (String Power): Repeating w k -times, denoted w k .
Example: If w = ab, w3 = ababab.
6. Empty String Property:
wϵ = ϵw = w.
Strings form the building blocks for languages, and operations on strings enable transformations essential for language
manipulation.
3. Languages
A language is a set of strings over a given alphabet Σ. Formally, L ⊆ Σ∗ . Languages can be finite or infinite.
Examples of Languages:
1. Finite Language:
L = {a, ab, abc} over Σ = {a, b, c}.
2. Infinite Language:
L = {an ∣ n ≥ 1} (strings of one or more a's) over Σ = {a}.
Operations on Languages:
Example: L1 ∩ L2 = {b}.
3. Concatenation (L1 L2 ): The set of strings formed by concatenating a string from L1 with a string from L2 .
4. Kleene Star (L∗ ): The set of all strings formed by concatenating zero or more strings from L.
Example: If L = {a}, L∗ = {ϵ, a, aa, aaa, … }.
5. Difference (L1− L2 ): The set of strings in L1 but not in L2 .
Example: L1 − L2 = {a}.
Languages are the formal objects that automata recognize and manipulate.
4. Problems
In automata theory, a problem is represented as a language, with strings encoding problem instances and decisions about
membership indicating solutions. A problem is considered decidable if there exists an algorithm that can determine, for any
string, whether it belongs to the language.
11/145
The problem is to decide if a given string w belongs to L (i.e., is w a palindrome?).
These examples illustrate how computational problems can be formalized as languages, enabling their study within the
automata theory framework.
Conclusion:
The central concepts of automata theory—alphabets, strings, languages, and problems—provide the foundation for
understanding computation. By abstracting problems as languages, automata theory enables a precise analysis of their
computational properties, including decidability and complexity.
Imagine a scenario where a Customer interacts with a Store to make a purchase, and the Store communicates with a Bank
to process the payment. The interactions between these entities follow specific rules:
The Customer can make requests, such as adding items to a cart or attempting to purchase.
The Store accepts payments only after validating them with the Bank.
The Bank authorizes or denies payments based on account balance and transaction validity.
These rules ensure the smooth functioning of the system. The finite automaton representing each entity enforces these
ground rules by defining states and transitions based on specific actions or events.
12/145
2. The Protocol
The protocol governs how interactions occur between the Customer, Store, and Bank. It can be outlined as follows:
1. Customer Actions:
The Customer can initiate a transaction, such as selecting items or requesting payment.
2. Store Actions:
The Store waits for customer actions and forwards payment requests to the Bank.
The Store transitions to a "waiting for authorization" state upon initiating communication with the Bank.
3. Bank Actions:
The Bank processes the payment request and responds with either an authorization or denial.
Example: If the Customer's account balance is sufficient, the Bank authorizes the transaction, and the Store
completes the purchase.
Each entity is modeled as a finite automaton with states representing the current status (e.g., "Idle," "Awaiting Payment")
and transitions triggered by specific actions.
Not all actions in a system are relevant to every automaton. For example:
The Bank does not care about the items the Customer adds to the cart; it only processes payment requests.
The Customer need not know whether the Bank is processing other transactions while waiting for their request.
This capability to ignore irrelevant actions is critical. In finite automata, this is achieved by allowing certain inputs to trigger
transitions that effectively "skip" or ignore those actions, ensuring the automaton focuses on relevant events.
Example:
The Store automaton might transition from "Idle" to "Waiting for Payment Authorization" regardless of whether the
Customer selects one or multiple items.
This flexibility simplifies the automaton's design and prevents unnecessary complexity.
Each component (Customer, Store, Bank) operates as an individual finite automaton, but the entire system can be viewed as
a single composite automaton.
The state of the system is a combination of the states of the individual automata. For example:
13/145
Transitions of the System:
The system transitions occur when an action from one automaton triggers a response in another. For example:
The Customer transitions to "Requesting Payment," prompting the Store to transition to "Waiting for Bank Response."
The Store's payment request triggers the Bank to transition to "Processing Payment."
The composite automaton models the entire sequence of interactions, ensuring that the system's behavior adheres to the
defined protocol.
The product automaton combines the individual automata of the Customer, Store, and Bank into a single automaton. It
allows us to validate the correctness of the protocol by verifying all possible sequences of actions and states.
Example:
1. Initial State:
Store: Idle.
Bank: Idle.
2. Transition 1:
Result:
Bank: Idle.
3. Transition 2:
Result:
4. Transition 3:
Result:
Bank: Idle.
Validation:
14/145
No invalid sequences (e.g., Bank processing payment without a request) are allowed.
The system returns to a valid state (e.g., all automata idle) after each transaction.
Conclusion:
Through the example of the Customer, Store, and Bank, we have explored the informal workings of finite automata. The
ground rules, protocol, and interactions define the system's behavior, while the product automaton validates its correctness.
This approach provides an intuitive understanding of finite automata, preparing us for their formal representation in
subsequent discussions.
1. Definition of a DFA
Key Points:
The automaton is deterministic, meaning for each state and input symbol, there is exactly one state transition.
The automaton has a finite number of states and processes input strings of finite length.
It either accepts or rejects a string based on whether the final state is an accepting state.
1. Initialization:
The DFA starts in the initial state q0 .
2. Processing Input:
The DFA reads the input string one symbol at a time. For each symbol σ from the string, it transitions from its current
state to a new state, as defined by the transition function δ . This continues until all symbols have been processed.
15/145
3. Final Decision:
After the entire input string has been processed, the DFA checks if it is in an accepting state (i.e., if the current state is in
F ).
If the DFA is in an accepting state, the string is accepted.
Example:
Let’s consider a simple DFA that accepts binary strings ending with ‘01’.
Q = {q0 , q1 , q2 }
Σ = {0, 1}
Transition function δ :
δ(q0 , 0) = q0
δ(q0 , 1) = q1
δ(q1 , 0) = q2
δ(q1 , 1) = q1
δ(q2 , 0) = q0
δ(q2 , 1) = q1
For the string ‘1101’, the DFA would process the symbols as follows:
Read ‘1’, transition back to q1 . Since the DFA ends in q1 (not an accepting state), the string is rejected.
To make the representation of DFAs more intuitive, we often use transition diagrams and transition tables.
Transition Diagrams:
A transition diagram is a graphical representation of a DFA where each state is represented by a circle, and transitions
between states are represented by directed edges labeled with input symbols.
Example:
Transitions between states are labeled with the appropriate input symbols.
Transition Tables:
A transition table is a tabular representation where each row corresponds to a state, each column corresponds to an
16/145
input symbol, and each cell shows the next state resulting from applying the input symbol to the state.
State 0 1
q0 q0 q1
q1 q2 q1
q2 q0 q1
Both notations are equivalent, and the choice of notation depends on the specific use case (e.g., clarity in visualization or
simplicity in computation).
The transition function δ can be extended to handle entire strings. Instead of processing one symbol at a time, we apply δ
iteratively for each symbol in the string.
Formally, for a string w = w1 w2 … wn (where each wi ∈ Σ), the extended transition function δ ∗ (q, w) is defined as:
In other words, the extended transition function processes each symbol in the string, starting from the initial state, and
transitions through the states as defined by δ .
Example:
Process ‘1’, transition to q1 . The DFA ends in q1 , which is not an accepting state, so the string is rejected.
5. Language of a DFA
The language of a DFA is the set of strings that the DFA accepts. Formally, it is defined as:
L(M ) = {w ∈ Σ∗ ∣ δ ∗ (q0 , w) ∈ F }
In other words, the language of a DFA M is the set of all strings w such that, starting from the initial state q0 , the DFA
Example:
For the DFA accepting binary strings ending with ‘01’, the language is the set of all strings that end with ‘01’, including
strings like:
17/145
{0, 1, 100, 111} (as these do not end with ‘01’).
Conclusion:
In this lecture, we introduced Deterministic Finite Automata (DFA) as a model for recognizing regular languages. We
explored how DFAs process strings through state transitions and defined the language of a DFA. We also discussed simpler
notations for representing DFAs, such as transition diagrams and tables, and extended the transition function to handle
entire strings. Understanding DFAs is fundamental for studying more complex computational models and understanding
regular languages.
A Non-Deterministic Finite Automaton (NFA) is similar to a DFA, but with one key difference: an NFA can have multiple
possible next states for a given state and input symbol. This non-deterministic behavior means that at any point during the
computation, the automaton can "choose" between several transitions, rather than being forced into a single transition as in
the case of a DFA. This flexibility allows NFAs to potentially recognize the same languages as DFAs, but the way they process
strings is different.
Multiple Transitions: For a given state and input symbol, there can be more than one next state (or none).
Epsilon Transitions: An NFA can transition to a new state without consuming any input symbol (via epsilon transitions ϵ
).
2. Definition of NFA
Notable Features:
18/145
The transition function δ is a set-valued function, meaning for a given state and input symbol, the NFA may transition
to several different states, or even no state at all.
Epsilon transitions: The NFA may transition to a state without consuming any input symbol, denoted as δ(q, ϵ).
The extended transition function δ ∗ for an NFA is used to determine the states that the automaton can reach after
processing an entire string. It extends the original transition function to handle strings rather than individual symbols.
δ ∗ (q, ϵ) = {q} for any state q (the empty string leaves the state unchanged).
δ ∗ (q, w1 w2 … wn ) = ⋃p∈δ(q,w1 ) δ ∗ (p, w2 … wn ) for any string w1 w2 … wn .
In other words, given a current state q and input string w , the automaton can non-deterministically choose one of the
possible transitions for the first symbol, then continue processing the rest of the string from the resulting states.
If the NFA can reach any of the accepting states after processing the string, the string is accepted. If no accepting state is
reached, the string is rejected.
The language of an NFA M = (Q, Σ, δ, q0 , F ) is the set of strings that the automaton accepts, i.e., the set of strings for
For each symbol wi in the input string, the automaton can transition from qi to qi+1 according to the transition function
δ.
Formally:
L(M ) = {w ∈ Σ∗ ∣ δ ∗ (q0 , w) ∩ F =
∅}
This means that a string w is accepted by an NFA if there exists some sequence of transitions starting from q0 , possibly
Despite the non-deterministic nature of NFAs, every NFA has an equivalent DFA. This equivalence is crucial because it shows
that NFAs and DFAs recognize exactly the same class of languages, i.e., regular languages.
Given an NFA N = (Q, Σ, δ, q0 , F ), we can construct a deterministic finite automaton (DFA) D = (Q′ , Σ, δ ′ , q0′ , F ′ ) that
recognizes the same language. The construction involves the subset construction or powerset construction algorithm.
19/145
States of the DFA: The states of the DFA correspond to subsets of the states of the NFA. If the NFA has ∣Q∣ states, the
DFA will have at most 2∣Q∣ states.
Initial state of the DFA: The initial state of the DFA is the set of NFA states that can be reached from the initial state q0
Transition function of the DFA: For each state in the DFA, we determine the possible states it can transition to for each
input symbol. This is done by considering all possible NFA states the NFA can transition to for the input symbol and then
taking the epsilon closure of the resulting set of states.
Accepting states of the DFA: The accepting states of the DFA correspond to any subset of NFA states that contains at
least one accepting state of the NFA.
This construction shows that for every NFA, there exists an equivalent DFA, and both automata recognize the same
language. However, the number of states in the resulting DFA can grow exponentially compared to the NFA, which is a key
difference between the two models.
While the subset construction guarantees an equivalent DFA for any NFA, it can sometimes lead to a large number of states
in the resulting DFA, even if the NFA itself is relatively simple. This exponential growth in the number of states is often
referred to as the "state explosion problem."
Formal Example:
Consider the following NFA N with states Q = {q0 , q1 }, alphabet Σ = {a}, initial state q0 , and accepting state F = {q1 }.
δ(q0 , a) = {q0 , q1 }
δ(q1 , a) = {q1 }
This NFA accepts strings that contain at least one 'a'. Now, let's construct the equivalent DFA using the subset construction
method:
1. The DFA will have states corresponding to subsets of Q, i.e., the power set of Q, which is {∅, {q0 }, {q1 }, {q0 , q1 }}.
4. The DFA will have four states, and it will need to process all these states for every input symbol. This results in
exponential growth in the number of states, which is inefficient.
Conclusion:
In this lecture, we introduced Non-Deterministic Finite Automata (NFA), defined them formally, and discussed how they
process strings. We extended the transition function to handle entire strings and defined the language of an NFA. The
20/145
equivalence of DFAs and NFAs was formally established through the subset construction, which converts an NFA into an
equivalent DFA. However, we also highlighted the "state explosion" problem, where the number of states in the resulting
DFA can grow exponentially, leading to inefficiency in certain cases.
Finding a string (or substring) within a larger text is a common problem in computer science. Given a pattern P and a text
T , the goal is to determine whether P appears in T . This problem is crucial in text processing, search engines, and many
other applications that require string matching.
Basic Approach:
For each character in the text, we check if it matches the first character of the pattern. If it matches, we continue
checking the subsequent characters of the text to see if the full pattern appears.
If the full pattern matches, we have found the string; otherwise, we continue scanning the text.
However, a more efficient approach to string searching can be achieved using automata.
To efficiently perform string searching, we can build an automaton that recognizes the target string as a pattern. The
automaton will transition through its states as we scan the text, and when we reach an accepting state, we will know that
the pattern has been found.
A Non-Deterministic Finite Automaton (NFA) can be constructed to search for a string in a text. This NFA will simulate the
process of matching a given pattern by making transitions through multiple possible states as it reads each symbol in the
text.
Consider searching for the string "ab" in a text. We can build an NFA for this search task as follows:
States: Q = {q0 , q1 , q2 }
Alphabet: Σ = {a, b}
Transition Function δ :
21/145
δ(q0 , a) = {q1 } (After reading 'a', move to state q1 ).
δ(q2 , a) = {q1 } (After matching "ab", restart the matching process by reading 'a').
Initial State: q0
How it works:
At each state, the NFA checks the current character of the text and decides whether to transition to another state. If the
automaton reaches the accepting state q2 , we have successfully matched the pattern.
As the NFA is non-deterministic, it can try multiple possible paths in parallel, enabling it to handle situations where the
pattern partially matches at multiple locations in the text.
Example Execution:
Continue reading the text, re-entering q1 on 'a', and transitioning back to q2 on 'b', accepting another occurrence of
"ab".
Now, we will focus on using a Deterministic Finite Automaton (DFA) to recognize a set of keywords. This is a more efficient
approach compared to NFAs when we need to recognize multiple patterns simultaneously in the text.
Given a set of keywords {P1 , P2 , … , Pn }, the goal is to construct a DFA that can recognize any of these keywords in a
given text. This can be done using a generalized DFA where each state represents the progress made in matching any of
the keywords.
Consider the keywords "ab" and "bc". The DFA needs to be designed to transition between states that represent the
progress of matching any of these two patterns. Here’s how we can approach the construction:
States: The DFA has a state for each possible combination of progress through both keywords.
22/145
Alphabet: Σ = {a, b, c}
Transition Function δ :
Initial State: q0
How it works:
The DFA processes the text character by character, transitioning between states as it matches characters from either of
the keywords.
If the DFA reaches state q2 or q4 , the text contains one of the keywords and the string is accepted.
Example Execution:
Continue reading the text, re-entering q3 on 'b', and transitioning to q4 on 'c', accepting the keyword "bc".
Conclusion:
In this lecture, we explored the application of automata theory in text searching. We discussed how an NFA can be
constructed to search for a single string or pattern in a text, and how a DFA can be used to recognize a set of keywords
efficiently. Both methods are powerful tools in text processing, and automata theory provides a formal foundation for
understanding and implementing text search algorithms, which are crucial in fields such as information retrieval, document
indexing, and web searching.
23/145
1. Use of Epsilon-Transitions
Epsilon-transitions (ε-transitions) are transitions in an automaton that do not require the consumption of any input
symbol. Instead, these transitions allow the automaton to "move" between states without reading any characters from the
input string. This feature gives the automaton more flexibility in recognizing languages and constructing more compact
representations of regular languages.
Key Points:
Non-consumptive: Epsilon-transitions do not consume any input symbol, which means that they allow the automaton
to jump between states without advancing in the input string.
Generalization: Epsilon-transitions generalize the idea of non-determinism, since an automaton can "choose" to take an
epsilon-transition at any point in its execution.
Epsilon-transitions are especially useful in the construction of automata for certain regular languages, and they make the
automaton more efficient by reducing the number of states and transitions required.
An epsilon-NFA (ε-NFA) is a type of Non-Deterministic Finite Automaton (NFA) that allows epsilon-transitions. It is formally
defined as a 5-tuple:
N = (Q, Σ, δ, q0 , F )
Where:
Transition Function:
The use of epsilon-transitions adds additional complexity to the transition function, as it introduces the possibility of
reaching a state without consuming an input symbol.
3. Epsilon-Closures
24/145
The epsilon-closure of a state q in an epsilon-NFA, denoted as ϵ-closure(q), is the set of states that can be reached from q
by taking epsilon-transitions, including q itself.
In other words, the epsilon-closure of a state is the set of all states that can be reached from q by following epsilon-
transitions alone. This closure is important for computing the next states during the processing of input strings, as we need
to account for all possible states that can be reached without consuming symbols.
Example:
δ(q0 , a) = {q1 }
δ(q1 , ϵ) = {q2 }
δ(q2 , b) = {q3 }
The extended transition function δ ∗ for an epsilon-NFA is an extension of the transition function that handles strings of
arbitrary length, including the epsilon transition.
δ ∗ (q, ϵ) = ϵ-closure(q) (the epsilon-closure of state q is the set of states that can be reached with epsilon-transitions).
For a non-empty string w = a1 a2 ⋯ an , the extended transition function is defined recursively:
δ ∗ (q, w) = ⋃
δ ∗ (p, a2 ⋯ an )
p∈δ(q,a1 )
In other words, δ ∗ (q, w) is the set of states reachable from q after processing the string w , including the epsilon-closures
for each state along the way.
Language of an ε-NFA:
The language of an epsilon-NFA N = (Q, Σ, δ, q0 , F ), denoted L(N ), is the set of strings that lead the automaton from
the initial state q0 to any accepting state in F , while possibly passing through epsilon-transitions.
Formally:
L(N ) = {w ∈ Σ∗ ∣ δ ∗ (q0 , w) ∩ F =
∅}
This means that a string w is accepted by an epsilon-NFA if, after processing w , the automaton can reach at least one of the
accepting states, possibly using epsilon-transitions along the way.
5. Eliminating Epsilon-Transitions
While epsilon-transitions can simplify the construction of automata, they can also complicate the processing of strings. To
convert an epsilon-NFA into an equivalent NFA without epsilon-transitions, we can use an algorithm called epsilon
25/145
elimination.
1. Compute the Epsilon-Closure: For each state q , compute the epsilon-closure ϵ-closure(q).
2. Update the Transition Function: For each state q and input symbol a, update the transition function δ by adding
transitions from the epsilon-closure of q to the epsilon-closures of the states reached by the symbol a.
3. Handle Epsilon-Closure of Accepting States: If a state in the epsilon-closure of a state is accepting, mark the
corresponding state as accepting in the new automaton.
4. Create the New NFA: After the transition function has been updated for all states, construct the new NFA that no longer
contains epsilon-transitions.
Example:
δ(q0 , a) = {q1 }
δ(q1 , ϵ) = {q2 }
δ(q2 , b) = {q3 }
Transition on 'a' from q0 leads to q1 , and then on 'b' from q1 via epsilon-transition to q2 , eventually leading to q3 .
Thus, the new transition function eliminates the epsilon transitions, resulting in an equivalent NFA without epsilon-
transitions.
Conclusion:
In this lecture, we explored Finite Automata with Epsilon-Transitions (ε-NFAs). We discussed how epsilon-transitions work,
how to compute epsilon-closures, and how to extend the transition function to handle strings. We also examined the
process of eliminating epsilon-transitions to convert an ε-NFA into an NFA, which is a crucial step in making automata
easier to process and implement. The ability to handle epsilon-transitions extends the power and flexibility of finite
automata, but eliminating them often simplifies the automaton for practical use.
Regular expressions use a set of operators to define patterns in strings. These operators allow for the specification of
complex string patterns, such as repetitions, choices, and groupings.
26/145
Key Operators:
1. Concatenation:
This operator specifies that two patterns must appear consecutively in the string.
Example: The regular expression ab matches the string "ab", where 'a' is followed by 'b'.
2. Union (Alternation):
Represented by the vertical bar | , it specifies that either one pattern or another pattern can appear.
Example: The regular expression a|b matches either "a" or "b". The expression abc|def matches either "abc" or
"def".
Represented by the asterisk * , it matches zero or more occurrences of the preceding pattern.
Example: The regular expression a* matches the empty string "" , "a", "aa", "aaa", and so on.
Represented by the plus sign + , it matches one or more occurrences of the preceding pattern.
Example: The regular expression a+ matches "a", "aa", "aaa", but does not match the empty string "" .
Represented by the question mark ? , it matches zero or one occurrence of the preceding pattern.
Example: The regular expression a? matches the empty string "" or "a".
6. Character Classes:
Character classes define a set of characters that a single character can be matched against. They are enclosed in
square brackets [] .
Example: The regular expression [a-z] matches any lowercase letter from 'a' to 'z'. The expression [0-9] matches
any digit.
A negated character class, represented by [^...] , matches any character that is not in the specified set.
Example: The regular expression [^0-9] matches any character that is not a digit.
8. Anchors:
Example: The regular expression ^a matches "a" at the start of the string, and the expression b$ matches "b" at
the end of the string.
Example: The regular expression (ab)+ matches one or more occurrences of "ab".
Special characters can be escaped using a backslash \ to match the literal character.
Example: The regular expression \. matches a literal period (dot), whereas . without the backslash matches any
character.
27/145
2. Building Regular Expressions
Regular expressions can be built incrementally by combining operators to form complex patterns. Let's go through some
examples to understand how different components are used to construct regular expressions.
We want to create a regular expression that matches a phone number in the format (xxx) xxx-xxxx , where x is a digit.
Step 1: Match the opening parenthesis ( : This is just a literal character, so we use \( .
Step 2: Match three digits: This can be done with the character class [0-9] repeated three times: [0-9]{3} .
Step 3: Match the closing parenthesis ) : Again, this is just a literal character, so we use \) .
Step 4: Match a space: This can be done with the literal character .
\text{^\(\d{3}\) \d{3}-\d{4}$}
An email address typically consists of a local part, an "@" symbol, and a domain part. The domain part can be a string with
periods separating the parts (e.g., example.com ).
Step 1: Match the local part, which can include alphanumeric characters and some special symbols: [a-zA-Z0-
9._%+-]+ .
Step 3: Match the domain part, which consists of alphanumeric characters and periods: [a-zA-Z0-9.-]+ .
Step 4: Optionally, match the top-level domain, which consists of two or more letters: [a-zA-Z]{2,} .
\text{^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$}
A URL typically has a protocol (e.g., http or https ), followed by :// , and then a domain name and possibly a path.
Step 3: Match the domain name, which consists of alphanumeric characters and dots: [a-zA-Z0-9.-]+ .
\text{^https?://[a-zA-Z0-9.-]+(/.*)?$}
28/145
3. Precedence of Regular Expression Operators
Regular expression operators have a defined precedence, which determines the order in which they are applied in the
absence of parentheses. The standard precedence is as follows (from highest to lowest):
1. Parentheses () : Used for grouping and capturing sub-expressions. Expressions inside parentheses are evaluated first.
2. Kleene Star * , Plus + , and Question Mark ? : These operators are applied to the preceding sub-expression and have
the second-highest precedence.
3. Concatenation (implicit): The concatenation operator is applied after Kleene star, plus, and question mark, but before
alternation.
Because the alternation operator ( | ) has lower precedence than the Kleene star ( * ), the expression is interpreted as a |
(b*) . Thus, it matches either "a" or zero or more occurrences of 'b'.
If we wanted to change the meaning of this expression, we could add parentheses, like (a|b)* , which would match zero or
more occurrences of "a" or "b".
Conclusion:
In this lecture, we have covered Regular Expressions—a vital tool for pattern matching and string manipulation. We
discussed the basic operators used in regular expressions, how to construct regular expressions through examples, and the
importance of operator precedence. Regular expressions are essential for many text processing tasks, such as searching,
replacing, and validating strings, and understanding how to construct and interpret them is crucial for working with
patterns in strings efficiently.
1. From DFA to RE: We will explore the formal proof for converting a Deterministic Finite Automaton (DFA) to a regular
expression.
2. Converting DFA to RE by State Elimination: This method is a systematic procedure to convert a DFA into a regular
expression.
3. Converting RE to Automata: We will also explore how to convert a regular expression back into a finite automaton.
These conversions will be discussed through formal proofs and examples to ensure a deep understanding of how finite
automata and regular expressions are equivalent in their ability to recognize regular languages.
29/145
1. From DFA to RE (Formal Discussion and Proofs)
The process of converting a Deterministic Finite Automaton (DFA) to a Regular Expression (RE) is based on the idea that
every language accepted by a DFA can be described by a regular expression. We will prove this by showing that for any DFA,
there exists an equivalent regular expression that describes the same language.
We aim to construct a regular expression R such that L(D) = L(R), where L(D) is the language accepted by the DFA.
We can represent the transitions of the DFA in a matrix form, where each element Tij represents the transition
from state i to state j on some input symbol. If there is no direct transition, we represent it as an empty string.
Over time, we can update this matrix to represent all possible paths between states using regular expressions.
We start by initializing the regular expressions for each transition between states. These initial expressions
correspond to the individual symbols in the alphabet that lead from one state to another.
We then iteratively compute the regular expressions for longer paths (using the Kleene star and concatenation)
between states by considering multiple transitions.
The final regular expression is derived by considering all paths from the start state q0 to any accepting state f
∈ F,
incorporating any intermediate states and transitions, and combining them using the appropriate operators.
Example:
States: Q = {q0 , q1 }
Alphabet: Σ = {a, b}
Start state: q0
δ(q0 , a) = q0
δ(q0 , b) = q1
δ(q1 , a) = q1
δ(q1 , b) = q1
30/145
The regular expression for this DFA can be found as follows:
The path from q0 to q1 involves the transition on b, and once at q1 , we can loop on both 'a' and 'b'.
Thus, the language accepted by the DFA is L(D) = {b(a∣b)∗ }, which is equivalent to the regular expression b(a∣b)∗ .
1. Initial Setup: Begin with the transition table of the DFA. For each transition, assign a regular expression corresponding
to the transition between states.
2. Eliminate States:
Choose a state q (except the start and accepting states) and eliminate it.
For each pair of states p and r that are connected via state q , update the transition between p and r to reflect the
new path that goes through q .
This involves combining the regular expressions for paths that pass through q using union and concatenation
operators.
3. Repeat the process until only the start state and accepting states remain.
4. Final Regular Expression: Once all non-start and non-accepting states are eliminated, the regular expression for the
transition between the start state and accepting states is the final regular expression that describes the language of the
DFA.
Example:
States: Q = {q0 , q1 , q2 }
Alphabet: Σ = {a, b}
Start state: q0
Accepting state: q2
Transition function δ :
δ(q0 , a) = q1 , δ(q0 , b) = q0
δ(q1 , a) = q1 , δ(q1 , b) = q2
δ(q2 , a) = q2 , δ(q2 , b) = q2
1. Eliminate q1 :
a(b∣a)∗ b.
31/145
2. Eliminate q0 :
The remaining regular expression between the start state and the accepting state is a(b∣a)∗ b.
3. Converting RE to Automata
The conversion of a regular expression (RE) to a finite automaton (FA) involves constructing an automaton that recognizes
the same language described by the regular expression. This can be done using methods such as the Thompson's
construction for Non-deterministic Finite Automata (NFA).
1. Base Case:
If the regular expression is a single symbol, create a simple NFA with two states: one for the start and one for the
accepting state, with a transition labeled by the symbol.
2. Recursive Case:
For each operator in the regular expression (concatenation, alternation, or Kleene star), build an NFA for that
operator by combining smaller NFAs based on the specific construction rules.
Concatenation: If R1 and R2 are regular expressions, construct an NFA for R1 and R2 by linking the accepting state of
with epsilon transitions to the start states of the NFAs for R1 and R2 .
Kleene Star: For a regular expression R, construct an NFA for R∗ by creating a new start state with an epsilon transition
to both the start state of R and a new accepting state, and adding an epsilon transition from the accepting state of R
back to the start state of R.
Example:
For the regular expression (a∣b)∗ , Thompson's construction produces an NFA that accepts any string consisting of zero or
more 'a's or 'b's.
Conclusion
In this lecture, we explored the connections between Finite Automata and Regular Expressions. We covered the conversion
process from DFA to RE, including formal proofs and examples, as well as the state elimination method for converting a
DFA into a regular expression. Additionally, we discussed how to convert a regular expression to an automaton using
Thompson’s construction. These conversions show the equivalence of regular expressions and finite automata, both of
which describe the class of regular languages. Understanding these conversions is essential for analyzing and constructing
automata and regular expressions.
32/145
Lecture 13: Applications of Regular Expressions
In this lecture, we explore practical applications of Regular Expressions (RE), focusing on their usage in Unix systems,
lexical analysis, and pattern matching in text. Regular expressions are powerful tools for searching, manipulating, and
analyzing strings, and they are widely used in both academic and industrial applications.
Functionality: grep searches through text or files for lines that match a specified pattern. It supports both basic and
extended regular expressions.
Syntax:
bash
Example: To find all lines containing the word "error" in a log file:
bash
You can also use regular expression operators to make searches more sophisticated, such as matching specific patterns
or character classes.
Functionality: sed is used for text transformation and stream editing. It allows for search and replace operations, and
it supports regular expressions to match patterns in the input text.
Syntax:
bash
bash
Regular expressions in sed allow for advanced text manipulation, such as deleting lines matching a pattern, inserting
text, or modifying specific parts of lines.
33/145
Functionality: awk is a powerful text-processing tool that uses regular expressions to pattern match text and perform
actions on it, such as printing selected fields, performing calculations, or reformatting the text.
Syntax:
bash
Example: To print the second field of each line that contains the word "apple":
bash
Regular expressions are integral to awk 's ability to perform complex text manipulation tasks.
2. Lexical Analysis
Lexical analysis is the process of converting a sequence of characters (such as source code or text) into a sequence of
tokens, which are meaningful chunks of information. This process is often the first step in the compilation of programming
languages, but it is also applicable in many other areas, such as natural language processing and data extraction.
Tokenization: Regular expressions are used to define the patterns of valid tokens in a language. For example, in a
programming language, tokens might include keywords, operators, identifiers, and literals, all of which can be
described using regular expressions.
Finite Automata: The process of lexical analysis can be modeled using finite automata. Each regular expression can be
converted into a finite automaton (either deterministic or non-deterministic), which then performs the tokenization by
matching the input against the regular expressions.
Example:
Identifiers: Any sequence of letters and digits starting with a letter (e.g., var1 , x , hello ).
regex
if | else | while
regex
[a-zA-Z][a-zA-Z0-9]*
34/145
Number Regular Expression:
regex
[0-9]+
A lexical analyzer would use these regular expressions to scan the input string and extract tokens such as if , var1 , and
123 .
Search and Replace: A common use of regular expressions is to find a pattern in a text and replace it with another
string. This is particularly useful for tasks such as correcting errors, formatting text, or performing batch modifications
on documents.
Pattern Matching: Regular expressions allow for the detection of complex patterns, such as email addresses, phone
numbers, or dates, within large bodies of text. These patterns can be used for data extraction or validation.
Example:
Consider the task of extracting email addresses from a block of text. A regular expression for matching most common email
formats could be:
regex
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
This regular expression matches email addresses with the following components:
By using this regular expression in a text search utility, we can extract all email addresses from the text.
Regular expressions are also used in more advanced scenarios, such as searching for patterns that span multiple lines or
handling cases with optional or repeated text elements. This flexibility is critical for working with real-world text, where the
structure can vary significantly.
Conclusion
35/145
Regular expressions have a broad range of applications, from basic text processing to complex tasks like lexical analysis and
data mining. In Unix systems, they are essential tools for text search, manipulation, and processing, enabling users to
efficiently work with large datasets and files. In lexical analysis, regular expressions help define the structure of tokens,
playing a crucial role in the parsing of programming languages and other formal grammars. Finally, pattern matching in
text is one of the most widely used applications of regular expressions, enabling users to search, extract, and manipulate
data in powerful ways. Understanding these applications equips you with the tools needed for a wide variety of practical
problems in text processing and automation.
2. Distributive Laws
3. Idempotent Law
4. Closure Laws
a) Associativity
The associative property of regular expressions refers to the grouping of operations in a regular expression. The grouping
does not affect the result of the operation when combined with other operations.
Formal Statement:
Concatenation:
The order in which concatenation is performed does not affect the result.
Union:
Example:
(a ⋅ (b ⋅ c)) = (a ⋅ b ⋅ c) = ((a ⋅ b) ⋅ c)
Union example: Let R1
= a, R2 = b, and R3 = c. Then:
(a ∪ (b ∪ c)) = (a ∪ b ∪ c)
b) Commutativity
36/145
The commutative property of regular expressions refers to the order in which two operations are performed, not affecting
the result.
Formal Statement:
Union:
(R1 ∪ R2 ) = (R2 ∪ R1 )
The order of the operands in a union operation does not change the resulting language.
Example:
(a ∪ b) = (b ∪ a)
2. Distributive Laws
The distributive property involves the distribution of one operation over another. This property is essential in simplifying
regular expressions and is frequently used in the manipulation and optimization of patterns.
Formal Statement:
Example:
Let R1 = a, R2 = b, and R3 = c.
3. Idempotent Law
The idempotent law states that applying the same operation multiple times does not change the result. This is useful for
simplifying regular expressions and eliminating redundant parts of a pattern.
Formal Statement:
Union:
(R ∪ R) = R
Concatenation:
(R ⋅ R) = R
37/145
Example:
Let R = a.
For union:
(a ∪ a) = a
For concatenation:
(a ⋅ a) = a
4. Closure Laws
The closure laws apply specifically to the Kleene star operation, which denotes zero or more repetitions of a pattern.
Formal Statement:
Identity:
R∗ = (R∗ )∗
Repeatedly applying the Kleene star to a regular expression does not change its meaning.
R⋅ϵ=R=ϵ⋅R
Concatenating a regular expression with the empty string (ϵ) does not affect the regular expression.
Example:
Let R = a.
For closure:
(a∗ ) = ((a∗ )∗ )
For identity:
(a ⋅ ϵ) = a
Reverse Law: For a regular expression R, the reverse of the language L(R) is the language generated by reversing the
string of each element in L(R). Mathematically, it can be expressed as:
This law indicates that the reverse of a concatenated expression is the concatenation of the reverses of the individual
expressions, in reverse order.
The reverse of this language is {ba}, which is the same as the language of R2R ⋅ R1R .
38/145
Thus, the reverse law holds for this simple case, and we can generalize it to all regular expressions.
The language of R1 ⋅ R2 is the set of strings that start with 'a', followed by any combination of 'b' and 'c', and ending
with 'd'.
The reverse of this language is a set of strings starting with 'd' and followed by any combination of 'b' and 'c', ending
with 'a'.
Thus:
Conclusion
In this lecture, we explored several important algebraic laws of regular expressions, including associativity,
commutativity, distributive laws, idempotent law, and closure laws. We also derived a new law for regular expressions,
the reverse law, and tested it through examples. These algebraic properties are foundational for simplifying, optimizing,
and reasoning about regular expressions, and they play a critical role in various applications like text processing, lexical
analysis, and pattern matching. Understanding and applying these laws will improve your ability to work with regular
expressions efficiently.
39/145
The Pumping Lemma for Regular Languages provides a necessary condition for a language to be regular. It states that for
any regular language L, there exists a constant p (called the pumping length) such that any string s ∈ L with length
greater than or equal to p can be split into three parts s = xyz with the following properties:
1. ∣xy∣ ≤ p
2. ∣y∣ > 0
3. xy k z ∈ L for all k ≥ 0
In other words, if L is regular, then any sufficiently long string in L can be decomposed into three parts such that the
middle part (denoted y ) can be repeated any number of times (including zero) while still remaining in L.
The Pumping Lemma is typically proven by contradiction. We will now provide a proof of the lemma:
Since L is regular, there exists a deterministic finite automaton (DFA) M that recognizes L. Let the number of states in M
be p (called the pumping length).
Because M loops on y , repeating the loop any number of times will result in a string that is still accepted by M . Therefore,
xy k z ∈ L for all k ≥ 0, which proves the pumping lemma.
This proof shows that any string s in a regular language can be decomposed into three parts, and the middle part can be
pumped (repeated) without leaving the language.
40/145
2. ∣y∣ > 0,
3. xy k z ∈ L for all k ≥ 0.
The string s = ap bp consists of p 'a's followed by p 'b's. The part y must consist only of 'a's, since ∣xy∣ ≤ p, meaning
that y = ai for some i > 0.
xy 2 z = ap+i bp
This string ap+i bp is not in L because the number of 'a's and 'b's is no longer equal. Thus, the pumping lemma is violated,
and we have a contradiction.
Since we cannot decompose s in such a way that xy k z ∈ L for all k ≥ 0, we conclude that L is not regular.
The string s= ap bp cp consists of p 'a's, p 'b's, and p 'c's. Since ∣xy∣ ≤ p, the part y must consist only of 'a's, so y = ai
for some i > 0.
xy 2 z = ap+i bp cp
This string ap+i bp cp is not in L because the number of 'a's, 'b's, and 'c's is no longer the same. Thus, the pumping lemma is
violated, and we have a contradiction.
Since we cannot decompose s in such a way that xy k z ∈ L for all k ≥ 0, we conclude that L is not regular.
41/145
Now, we decompose s = xyz , where:
1. ∣xy∣ ≤ p,
2. ∣y∣ > 0,
3. xy k z ∈ L for all k ≥ 0.
The string s= ap bp ap bp consists of p 'a's, followed by p 'b's, followed by another p 'a's, and then another p 'b's. Since
∣xy∣ ≤ p, the part y must consist of 'a's from the first half of the string, so y = ai for some i > 0.
xy 2 z = ap+i bp ap bp
This string is not in L because it no longer has the form ww , as the number of 'a's and 'b's in the two halves are no longer
equal. Thus, the pumping lemma is violated, and we have a contradiction.
Since we cannot decompose s in such a way that xy k z ∈ L for all k ≥ 0, we conclude that L is not regular.
Conclusion
In this lecture, we discussed the Pumping Lemma for Regular Languages, which provides a powerful tool for proving that
certain languages are not regular. By assuming a language is regular and showing that no decomposition of a sufficiently
long string satisfies the pumping lemma, we can demonstrate that the language cannot be recognized by a finite
automaton. Through several examples, we applied the Pumping Lemma to prove that languages such as {an bn ∣ n ≥ 0},
∗
{a b c ∣ n ≥ 0}, and {ww ∣ w ∈ {a, b} } are not regular.
n n n
Each of these operations will be discussed with detailed proofs demonstrating that the resulting languages remain regular.
Union:
Regular languages are closed under union, meaning that if L1 and L2 are regular languages, then L1
∪ L2 is also regular.
We will show this through a construction of a nondeterministic finite automaton (NFA) for the union of two regular
languages.
Proof:
42/145
Let L1 and L2 be regular languages, and let M1
= (Q1 , Σ, δ1 , q10 , F1 ) and M2 = (Q2 , Σ, δ2 , q20 , F2 ) be the NFAs
Q = Q1 ∪ Q2 ,
F = F1 ∪ F2 (the accepting states are the union of the accepting states of M1 and M2 ),
δ(q0 , a) = {q10 , q20 } for any input a ∈ Σ (this is a nondeterministic choice to start either in M1 or M2 ),
δ1 and δ2 are used for transitions within the individual NFAs M1 and M2 .
Since an NFA recognizing the union of L1 and L2 can be constructed, the union of regular languages is regular.
Intersection:
Regular languages are closed under intersection. We will prove this closure property by showing that the intersection of
two regular languages can be recognized by a deterministic finite automaton (DFA).
Proof:
q0 = (q10 , q20 ) (the initial state is the pair of initial states of M1 and M2 ),
F = F1 × F2 (the accepting states are the pairs of accepting states from M1 and M2 ),
Since a DFA recognizing the intersection of L1 and L2 can be constructed, the intersection of regular languages is regular.
Complement:
Regular languages are closed under complement. We will show that if a language is regular, its complement is also regular
by using the construction of a DFA and applying the complement operation on its accepting states.
Proof:
To construct a DFA that recognizes the complement of L, we can use the same DFA M but change the set of accepting
states. The new DFA M ′ = (Q, Σ, δ, q0 , Q ∖ F ) recognizes L, the complement of L.
Since the new DFA recognizes the complement of L, regular languages are closed under complement.
43/145
Proof:
Since an NFA recognizing the reverse of L can be constructed, the reversal of a regular language is regular.
A homomorphism h is a mapping h : Σ∗ → Γ∗ that replaces each symbol of Σ with a string over some alphabet Γ. If w =
a1 a2 … an is a string over Σ, then h(w) = h(a1 )h(a2 ) … h(an ).
Proof:
Since we can construct a DFA recognizing h(L), regular languages are closed under homomorphisms.
Proof:
We define the inverse homomorphism h−1 (L) as the set of strings w ∈ Σ∗ such that h(w) ∈ L.
Construct an automaton that simulates applying h−1 to the strings in L and accepts them if the transformed string is in
L.
Since an automaton recognizing h−1 (L) can be constructed, regular languages are closed under inverse homomorphisms.
44/145
Conclusion
In this lecture, we explored several closure properties of regular languages. Specifically, we demonstrated that regular
languages are closed under the following operations:
For each property, we provided detailed proofs that show how regular languages remain regular after applying these
operations. These closure properties are essential tools for analyzing and manipulating regular languages in automata
theory.
1. Converting between different representations of regular languages (such as from automata to regular expressions,
etc.),
Each of these properties plays a key role in the analysis and manipulation of regular languages in automata theory.
Regular Expressions,
Regular Grammars.
One of the fundamental decision problems is determining how to convert between these different representations.
Specifically, we will focus on conversions from finite automata to regular expressions, since this is a commonly
encountered task.
We will now discuss how to convert a deterministic finite automaton (DFA) to a regular expression (RE).
Proof/Procedure:
2. Goal: We aim to find a regular expression R such that R represents the language recognized by M , i.e., L(M ) =
L(R).
3. State Elimination Method: One of the most common methods to convert a DFA to a regular expression is state
elimination. In this approach, we eliminate states from the DFA one by one while updating the transition relations to
reflect the removal of each state. The process can be outlined as follows:
45/145
Start with the original DFA.
For each state q ∈ Q, for every pair of states p, r ∈ Q, replace the transitions in the automaton with regular
expressions.
Eliminate states one by one by updating the transitions for all pairs of states that could be affected by the removal.
After all states are eliminated (except for the initial and final states), the regular expression corresponding to the
language accepted by the DFA is formed.
4. Example:
States: Q = {q0 , q1 , q2 },
Start state: q0 ,
After applying state elimination, the DFA can be converted into the following regular expression:
The reverse process is also possible: converting a regular expression to a DFA. The process involves:
Converting the regular expression to an NFA using standard construction methods (e.g., Thompson's construction),
Converting the resulting NFA to a DFA using the subset construction algorithm.
Definition:
The emptiness problem for a regular language asks whether the language L recognized by an automaton M is empty, i.e.,
L(M ) = ∅.
Procedure:
Given a DFA M = (Q, Σ, δ, q0 , F ), we want to determine whether L(M ) = ∅. The basic idea is to check whether there is
any path from the start state q0 to any accepting state in F . If no such path exists, the language is empty.
1. Reachability Check:
Perform a breadth-first search (BFS) or depth-first search (DFS) starting from the initial state q0 .
46/145
Let M = (Q, Σ, δ, q0 , F ) be a DFA with the following components:
States: Q = {q0 , q1 },
Alphabet: Σ = {a},
Transitions: δ(q0 , a) = q1 , δ(q1 , a) = q0 ,
Start state: q0 ,
A BFS or DFS starting from q0 would show that q1 is reachable, and hence L(M )
∅.
=
3. Time Complexity: The time complexity of this check is O(∣Q∣), where ∣Q∣ is the number of states in the DFA, as we are
simply performing a reachability check.
Procedure:
Given a DFA M = (Q, Σ, δ, q0 , F ) and a string w = a1 a2 … an ∈ Σ∗ , the task is to determine whether w ∈ L(M ). This
For each symbol ai in the string w , update the current state to δ(qi−1 , ai ).
After processing the entire string w , check if the current state is in the set of accepting states F .
2. Example:
States: Q = {q0 , q1 },
Alphabet: Σ = {a},
Transitions: δ(q0 , a) = q1 , δ(q1 , a) = q0 ,
Start state: q0 ,
For the input string w= a, the DFA will start at q0 , process the first symbol a, transition to q1 , and accept the string
because q1 is an accepting state. For w = aa, the DFA will transition to q0 after processing the second a, and since q0 is
3. Time Complexity: The time complexity of testing membership is O(∣w∣), where ∣w∣ is the length of the string. Each
transition is made once for each symbol in the string.
Conclusion
47/145
In this lecture, we discussed several decision properties of regular languages, including:
2. Testing emptiness of a regular language, by checking for reachable accepting states in the DFA,
3. Testing membership of a string in a regular language, by simulating the DFA on the input string.
These decision problems are fundamental tools for working with regular languages, as they allow for efficient analysis and
manipulation of languages represented by automata and regular expressions.
Two states are considered equivalent if, for every string in the alphabet Σ, the automaton transitions to equivalent states
for that string, and ultimately either both accept or both reject the string.
Formal Definition:
Let M = (Q, Σ, δ, q0 , F ) be a DFA. Two states q1 , q2 ∈ Q are equivalent if for all strings w ∈ Σ∗ , the following holds:
δ(q1 , w) ∈ F ⟺ δ(q2 , w) ∈ F
In other words, q1 and q2 are equivalent if they lead to the same set of accepting and rejecting states for every string in the
alphabet.
1. Inductive Method: The simplest way to test whether two states are equivalent is by induction:
Initialize a table T where each pair of states (q1 , q2 ) is marked as either equivalent or non-equivalent.
Mark the pairs (q1 , q2 ) as equivalent if they lead to the same final state for all strings.
For non-equivalent states, one of them must eventually lead to an accepting state, and the other to a rejecting state
for some input string.
By systematically examining all possible strings, it becomes clear which states are equivalent.
Example:
Q = {q0 , q1 , q2 },
Σ = {a, b},
Transitions: δ(q0 , a) = q1 , δ(q0 , b) = q2 , δ(q1 , a) = q1 , δ(q1 , b) = q0 , δ(q2 , a) = q0 , δ(q2 , b) = q2 ,
Start state: q0 ,
48/145
For w = a, δ(q0 , a) = q1 and δ(q1 , a) = q1 , which are both non-accepting.
For w = b, δ(q0 , b) = q2 and δ(q1 , b) = q0 , where q0 is accepting and q2 is non-accepting. Thus, q0 and q1 are not
equivalent.
We repeat this process for all state pairs, ultimately determining the equivalence of all states in the automaton.
Procedure:
1. Converting the regular expressions R1 and R2 into their equivalent DFAs M1 and M2 ,
2. Testing the equivalence of the resulting DFAs using the state equivalence procedure outlined above.
If the DFAs derived from R1 and R2 are equivalent, then the original regular expressions R1 and R2 are equivalent as well.
Example:
Let:
R1 = (a∣b)∗,
R2 = (a ∗ ∣b∗)∗.
Both regular expressions generate the same language, i.e., the set of all strings over Σ = {a, b}. However, if we convert
them to DFAs, we find that the structure of the DFAs may differ, yet they recognize the same language, hence R1 and R2 are
equivalent.
3. Minimization of DFAs
The goal of minimization of a DFA is to reduce the number of states in the DFA while preserving the language it recognizes.
The process eliminates redundant states that do not affect the language accepted by the automaton.
States that are equivalent can be merged into a single state in the minimized DFA.
After merging equivalent states, the resulting DFA will have fewer states but recognize the same language.
3. Distinguish States:
The key idea is to mark distinguishable and indistinguishable states by examining their behavior on all possible
strings.
49/145
Start by marking all pairs of states that are obviously distinguishable (i.e., one is accepting and the other is
rejecting).
Continue by refining the partition of states by considering all transitions for each string in the alphabet.
Example:
States: Q = {q0 , q1 , q2 },
Start state: q0 ,
Thus, the minimized DFA has the following states: {q0 }, {q1 , q2 }.
The minimized DFA has two states and is simpler than the original DFA.
Proof (Intuition):
1. For any regular language L, a DFA must distinguish between different equivalence classes of strings.
3. The minimized DFA achieves this by merging all states that are equivalent, leading to the smallest number of states.
4. Therefore, a minimized DFA is optimal in terms of state count and cannot be further reduced without losing its ability to
recognize the language.
This minimality is not just theoretical—it provides a concrete method for efficiently implementing DFAs in practice,
particularly in lexical analysis and text search algorithms.
Conclusion
In this lecture, we thoroughly discussed:
50/145
1. Testing equivalence of states in DFAs, which involves checking whether two states behave the same for all input
strings,
2. Testing equivalence of regular expressions, where we convert the expressions into DFAs and check if the DFAs are
equivalent,
3. Minimization of DFAs, a process of reducing the number of states in a DFA while preserving its language recognition
capability,
4. Why the minimized DFA is the most efficient and cannot be further reduced.
These techniques are central to the optimization and analysis of finite automata, and they have important applications in
areas like compiler design and text processing.
Expressions can either be a number or an expression followed by an operator and another expression.
2. If an expression is the result of an addition, it consists of two expressions separated by a "+" symbol.
3. If an expression is the result of multiplication, it consists of two expressions separated by a "*" symbol.
3+4∗5
This should be interpreted according to the precedence of operations, where multiplication takes precedence over
addition.
51/145
S ∈ V is the start symbol, the initial variable from which the derivation process begins.
Key Points:
A production rule A → α means that the non-terminal A can be replaced with the string α during derivation.
The language generated by the grammar is the set of strings derived from the start symbol S that consist solely of
terminal symbols.
1. E → E + E,
2. E → E ∗ E,
3. E → (E),
4. E → digit,
S = E is the start symbol.
This grammar allows us to generate expressions like 3 + 4 ∗ 5 by starting with E and applying the appropriate production
rules.
Example of Derivation:
1. Start with E .
2. Apply rule E → E + E: E + E.
3. Apply rule E → digit for the first E : 3 + E .
4. Apply rule E → digit for the second E : 3 + 4.
5. Apply rule E → E ∗ E for the second E : 3 + (4 ∗ 5).
6. Apply rule E → digit for both 4 and 5: 3 + (4 ∗ 5).
52/145
A leftmost derivation is a derivation where, at each step, the leftmost non-terminal is replaced by one of its productions.
Similarly, a rightmost derivation involves replacing the rightmost non-terminal at each step.
Consider the expression 3 + 4 ∗ 5 again. Starting with E , the leftmost derivation proceeds as follows:
1. Start with E ,
1. Start with E ,
2. Apply E → E + E,
3. Apply E → digit for the first E , yielding 3 + E ,
4. Apply E → digit for the second E , yielding 3 + 4,
5. Apply E → E ∗ E for the second E , yielding 3 + (4 ∗ 5).
5. Language of a Grammar
The language of a grammar, L(G), is the set of all strings that can be derived from the start symbol S using the production
rules. The language is a subset of the set of terminal symbols Σ∗ .
Example:
For the arithmetic grammar above, the language consists of all valid arithmetic expressions formed by the production rules,
such as:
3 + 4,
5 ∗ 6,
7 + 8 ∗ 9,
(3 + 4) ∗ 5.
6. Sentential Forms
A sentential form is any string of terminals and non-terminals that can be derived from the start symbol S in one or more
derivation steps. A sentential form is not necessarily a string of terminal symbols, but it is a step in the derivation process.
Example:
53/145
The sentential form 3 + E appears before the final step,
The sentential form 3 + 4 ∗ 5 is a string of only terminal symbols, and it is a sentential form that is also part of the
language generated by the grammar.
Conclusion
In this lecture, we have:
Introduced Context-Free Grammars (CFGs), providing both an informal example and a formal definition.
Discussed how to perform derivations using CFGs, including both leftmost and rightmost derivations.
Examined the language of a grammar, which is the set of all strings derivable from the start symbol.
Defined sentential forms, which are intermediate strings during the derivation process.
CFGs are foundational for understanding syntax in formal languages and play a crucial role in parsing algorithms for
programming languages.
Each leaf node represents a terminal symbol (or a string of terminal symbols).
The root of the tree represents the start symbol of the grammar.
2. Apply the production rules step by step, replacing non-terminals with their right-hand side (RHS) rules.
E →E+E
E →E∗E
E → digit
54/145
1. Start with E .
→ E + E to get:
2. Apply E
E → E + E.
→ digit (since 3 is a digit):
3. For the left E , apply E
E + E becomes 3 + E .
→ E ∗ E (since the second part involves multiplication):
4. For the right E , apply E
3 + E becomes 3 + (E ∗ E).
5. Now, for each E on the right, apply E → digit for both 4 and 5:
3 + (E ∗ E) becomes 3 + (4 ∗ 5).
mathematica
E
/ \
E E
/ \ / \
digit + E E
| / \ / \
3 digit * digit
| |
4 5
Example:
For the above parse tree of 3 + 4 ∗ 5, the yield is the sequence of terminal symbols (the digits and operators) in the leaves
of the tree, read from left to right:
Yield: 3 + 4 ∗ 5.
This is exactly the string we set out to parse, demonstrating that the parse tree represents the correct derivation of the
string from the grammar.
55/145
Derivation is the entire process of applying a series of production rules, starting from the start symbol and producing a
string in the language.
A parse tree represents one possible way of applying rules to derive a string, and it uniquely corresponds to a specific
derivation.
Example:
1. E → E + E,
2. E → digit, yielding 3 + E ,
3. E → E ∗ E , yielding 3 + (E ∗ E),
4. E → digit for both right-hand E 's, yielding 3 + (4 ∗ 5).
An inference where we apply a production A → α becomes a parent node A with children representing the symbols in
α.
Example:
E → E + E , this creates a node labeled E with two children, each of which is labeled E .
Then apply E → digit for the left E , resulting in a leaf node labeled "3".
Apply E → E ∗ E for the right E , leading to the creation of nodes labeled E , connected by a child labeled "*" and
further children labeled "4" and "5".
The entire process is reflected in the parse tree that grows with each rule application.
1. Start at the root and use the production rule that led to the child nodes.
2. Recursively apply the same process to each subtree, tracing back through the tree to find the sequence of production
rules.
Example:
Starting with the parse tree for 3 + 4 ∗ 5, we can reconstruct the derivation as follows:
56/145
1. From the root node E , the rule applied was E → E + E.
2. For the left E , the rule applied was E → digit, yielding 3.
3. For the right E , the rule applied was E → E ∗ E , with each E becoming a digit (4 and 5).
4. Thus, the sequence of production rules is:
E → E + E,
E → digit,
E → E ∗ E,
E → digit,
E → digit.
For example, in the derivation of 3 + 4 ∗ 5, the recursive inferences would look like this:
First, we replace E → E + E,
Then we recursively infer E → digit for the left E (yielding 3),
Next, for the right E , we apply E → E ∗ E , recursively inferring digits for both E 's.
Each recursive inference is a step towards fully replacing non-terminals with terminal symbols.
Conclusion
In this lecture, we have:
Defined and illustrated how to construct parse trees, using a step-by-step process.
Discussed the yield of a parse tree, which corresponds to the string derived by the grammar.
Explored the relationship between inferences, derivations, and parse trees, and how these concepts are connected.
Demonstrated how to move from inferences to parse trees, from trees to derivations, and from derivations to
recursive inferences.
Parse trees are crucial for understanding the structure of strings in a language and are foundational in the design of
parsers for programming languages.
57/145
1. Parsers
A parser is a component of a compiler or interpreter that analyzes a string of symbols (often program code or data) to
determine its grammatical structure with respect to a given formal grammar. The parser takes an input string and attempts
to build a parse tree that represents the syntactic structure of the string according to the grammar.
Role of CFGs in Parsers: Context-Free Grammars are widely used for defining the syntax of programming languages
and data formats. CFGs provide a formal way of specifying the structure of a language, which is essential for designing
parsers that can check whether a string belongs to the language and how it can be structured.
Types of Parsers:
Top-down Parsers: These start from the start symbol and try to rewrite it into the input string, matching the string
from left to right. Examples include recursive descent parsers.
Bottom-up Parsers: These begin with the input string and attempt to reduce it to the start symbol by applying
productions in reverse. Examples include shift-reduce parsers.
Both types of parsers rely heavily on the grammar of the language being parsed, and CFGs provide the structure
necessary for such parsing algorithms to function.
Example:
E →E+E
E →E∗E
E → (E)
E → digit
A parser for this language would take an expression like 3 + 4 ∗ 5, try to match it against these production rules, and build
a parse tree representing the syntactic structure of the expression. The parser helps ensure that the expression is valid
according to the grammar, and it can also provide a structure for evaluating the expression.
2. YACC Parser-Generator
YACC (Yet Another Compiler Compiler) is a tool used to generate parsers for context-free grammars. It is one of the most
widely used parser generators for C and C++ compilers.
1. Grammar Specification: A programmer defines the grammar of the language they want to parse, typically using a
BNF (Backus-Naur Form) or a similar notation.
2. YACC Input: The grammar is input into YACC, which automatically generates C code for a parser.
3. Parser Construction: YACC generates a parser that can take an input string and construct a parse tree based on the
provided grammar.
4. Action Code: YACC also allows the inclusion of action code that is executed when a specific rule is applied, allowing
the parser to not only validate but also perform computations or transformations as it parses the input.
Example Usage: A typical YACC specification might define a grammar for arithmetic expressions like the one shown
earlier, and when YACC processes this input, it generates a C program with parsing functions that can evaluate or
analyze such expressions.
58/145
YACC in Practice: YACC is typically used in combination with lex (a lexical analyzer generator) to build complete
compilers. Lex handles tokenizing the input, while YACC takes care of parsing the structure of the tokens according to
the grammar. The combination allows the construction of parsers for complex languages.
3. Markup Languages
Markup languages use a system of tags to annotate text, typically to define the structure and presentation of documents.
Context-Free Grammars are widely used to define the syntax of these languages.
Defining Structure with CFGs: The syntax of markup languages, such as HTML and XML, can be described by context-
free grammars. For instance, the rules for nested tags in XML documents can be formalized using a CFG that defines
how elements and attributes should be structured and nested.
This grammar captures the essence of how an XML document is structured, with nested elements and content.
DTD (Document Type Definition): DTDs define the structure and rules for XML documents. While DTDs are often
described in terms of regular expressions or simpler context-free grammars, they specify the valid structure of XML
elements, attributes, and their relationships.
For example, a simple DTD for an XML document might define that a document contains a sequence of <book>
elements, each containing a <title> , <author> , and <price> :
xml
Here, the DTD specifies that a catalog contains one or more book elements, and each book contains a title ,
author , and price . This structure can be represented as a context-free grammar, where each rule corresponds to a
non-terminal, and the structure of the tags and content is captured.
XML Schema: An XML Schema is more powerful than a DTD and allows for more precise definition of data types,
constraints, and more complex structures. Though XML Schema is more expressive, it often relies on the underlying
59/145
idea of CFGs for defining structural rules.
Example:
xml
<catalog>
<book>
<title>Introduction to Automata Theory</title>
<author>John Doe</author>
<price>39.99</price>
</book>
</catalog>
A parser (such as one generated by YACC) for this XML document would validate that the document follows the structure
defined by the XML grammar or DTD, ensuring that elements are correctly nested and properly ordered.
Conclusion
In this lecture, we have discussed the applications of Context-Free Grammars in several key areas:
1. Parsers: CFGs are essential for designing parsers that validate and process strings in programming languages and data
formats.
2. YACC Parser-Generator: YACC automates the generation of parsers from CFGs, greatly simplifying the process of
building compilers and interpreters.
3. Markup Languages: CFGs are used to define the structure of markup languages like XML, ensuring that documents are
correctly structured and follow the required syntax.
4. XML and DTDs: Context-free grammars provide a formal basis for defining the valid structure of XML documents
through DTDs, enabling validation and parsing.
These applications illustrate the powerful role that context-free grammars play in formal language design, data
representation, and the processing of complex languages in computing.
1. Ambiguous Grammars
A grammar is said to be ambiguous if there exists at least one string in the language it generates for which there are
multiple distinct leftmost or rightmost derivations (or equivalently, multiple distinct parse trees).
60/145
Definition: A context-free grammar (CFG) is ambiguous if there is a string w that can be derived in more than one way,
i.e., it has more than one leftmost derivation, rightmost derivation, or parse tree.
S →S+S ∣S×S ∣a
This grammar describes expressions involving addition and multiplication. Now, consider the string w = a + a × a. This
string can be derived in two different ways, leading to different parse trees.
css
S
/ | \
S + S
| / \
a S a
|
a
css
S
/ | \
S × S
| |
a S
|
a + a
As shown, the string a + a × a can be derived in two different ways, with two distinct parse trees. This shows that the
grammar is ambiguous.
Restructuring the Grammar: We can change the structure of the grammar to make the order of operations explicit.
61/145
Using Operator Precedence and Associativity: In many cases, the ambiguity arises from operations with different
precedence (e.g., multiplication has higher precedence than addition). By introducing rules that specify the precedence
and associativity, we can resolve ambiguity.
S →S+S ∣S×S ∣a
can be rewritten to avoid ambiguity by ensuring that multiplication has higher precedence than addition. This can be done
by introducing a new non-terminal for terms involving multiplication:
S →S+T ∣T
T →T ×a∣a
Now, consider the string w = a + a × a. This string will always be parsed in a single, unambiguous way:
Derivation:
css
S
/ \
S T
| |
T T
| |
a a
|
a
With this revised grammar, the string a + a × a is always parsed with multiplication taking precedence over addition,
resolving the ambiguity.
Example:
S →S+S ∣S×S ∣a
62/145
Second Leftmost Derivation (multiplication first):
Thus, we have two different leftmost derivations for the same string, demonstrating the ambiguity in the grammar.
4. Inherent Ambiguity
An inherent ambiguity refers to a situation where no matter how the grammar is rewritten, there will always be ambiguity
for certain strings. Some languages, by their very nature, are inherently ambiguous. Context-free languages can
sometimes have inherent ambiguity, where there is no way to define a grammar that avoids multiple parse trees for certain
strings.
Consider the language consisting of palindromes, i.e., strings that read the same forward and backward, over the alphabet
{a, b}. A grammar for palindromes could be defined as:
S → aSa ∣ bSb ∣ a ∣ b ∣ ϵ
Now, consider the string w = abba. This string can be derived in multiple ways:
First Derivation:
Second Derivation:
S ⇒ bSb ⇒ bb ⇒ abba
Here, we see that for the same string, there are two different derivations, demonstrating inherent ambiguity. This
ambiguity arises due to the nature of palindromes, and no matter how we modify the grammar, we will always face this
ambiguity.
Conclusion
In this lecture, we have covered the following important topics:
1. Ambiguous Grammars: A grammar is ambiguous if there is a string that can be derived in more than one way,
resulting in different parse trees.
2. Removing Ambiguity in Grammars: Ambiguity can often be resolved by restructuring the grammar and introducing
precedence rules, ensuring that each string has a unique parse tree.
3. Leftmost Derivations: Ambiguity can be expressed through different leftmost derivations of the same string, showing
the multiple ways the grammar can be applied.
4. Inherent Ambiguity: Some languages, like the language of palindromes, are inherently ambiguous, and there is no way
to define a grammar that completely avoids this ambiguity.
The concept of ambiguity is critical in formal language theory, as it affects the ease with which we can process and interpret
strings. Understanding how to deal with ambiguous grammars is an important skill for designing parsers and compilers.
63/145
Lecture 23: Pushdown Automata
In this lecture, we will explore Pushdown Automata (PDA), a type of automaton that extends finite automata with the ability
to use a stack for additional memory. We will start with an informal introduction to PDAs, followed by their formal
definition and graphical notation. Finally, we will discuss instantaneous descriptions of PDAs, which are critical in
understanding their operation.
A finite automaton can only remember a limited amount of information (typically about the current state). In contrast, a PDA
can "push" symbols onto a stack and "pop" symbols from the stack, allowing it to recognize patterns that require memory
beyond the current state. This capability makes PDAs suitable for parsing context-free languages, such as programming
languages or arithmetic expressions.
Example:
Consider a language like L = {an bn ∣ n ≥ 0}, which consists of strings with an equal number of a 's followed by b 's. This
language cannot be recognized by a finite automaton because it requires the ability to "remember" how many a 's have
been encountered to match them with b 's later in the string. A PDA can handle this by pushing a 's onto the stack and
popping them when it encounters b 's.
Σ is a finite input alphabet (the set of symbols that can appear in the input string).
Γ is a finite stack alphabet (the set of symbols that can appear on the stack).
δ is the transition function, defined as:
δ : Q × (Σ ∪ {ϵ}) × Γ → P(Q × Γ∗ )
This function describes the behavior of the PDA based on the current state, the current input symbol (or an empty input
symbol for epsilon transitions), and the top of the stack.
Z0 ∈ Γ is the initial stack symbol, which marks the bottom of the stack.
It reads an input symbol from the string, updates its state, and modifies the stack (by pushing or popping symbols).
The machine can either process an input symbol, or transition based on the top of the stack (without reading an input
symbol, using an epsilon transition).
64/145
The PDA accepts a string if, after reading all the input symbols, it reaches an accepting state and the stack is in an
appropriate configuration.
Make an epsilon transition (move without reading an input symbol) and possibly modify the stack.
This additional power, using a stack, allows the PDA to recognize languages that require a form of memory beyond just the
current state.
Stack Operations: Indicated in the transition labels (e.g., pop a, push b).
4. If the stack is empty and the input is exhausted, the PDA reaches an accepting state.
scss
This diagram represents the PDA's process of reading the string and manipulating the stack.
65/145
4. Instantaneous Descriptions of Pushdown Automata
An instantaneous description (ID) of a PDA represents the current configuration of the machine at any point in time during
its computation. It consists of three components:
2. Remaining Input: The remaining string to be processed, starting from the current position.
3. Current Stack Contents: The symbols currently on the stack, with the topmost symbol at the front.
⟨q, w, γ⟩
where:
Consider the string w = aab and a PDA in state q0 with stack symbol Z0 as the initial stack symbol.
2. After reading the first a and pushing a onto the stack, the ID is: ⟨q0 , ab, aZ0 ⟩.
3. After reading the second a and pushing another a onto the stack, the ID is: ⟨q0 , b, aaZ0 ⟩.
4. After reading the b and popping a from the stack, the ID is: ⟨q0 , ϵ, aZ0 ⟩.
5. Finally, after making the epsilon transition and reaching the accepting state, the ID is: ⟨qaccept , ϵ, Z0 ⟩.
The instantaneous descriptions provide a step-by-step view of how the PDA processes the string and manipulates the stack.
Conclusion
In this lecture, we introduced the Pushdown Automaton (PDA), focusing on the following key points:
1. Informal Introduction: PDAs extend finite automata by using a stack, enabling them to recognize context-free
languages.
2. Formal Definition: A PDA is formally defined as a 7-tuple, with a transition function that includes both input symbols
and stack operations.
3. Graphical Notation: PDAs can be represented visually using state transition diagrams that include stack manipulations.
4. Instantaneous Descriptions: An instantaneous description provides a snapshot of the PDA’s current state, input, and
stack contents at any point during the computation.
Understanding PDAs is essential for parsing context-free languages, which are foundational in the design of compilers and
interpreters.
66/145
Lecture 24: Languages of a Pushdown Automaton (PDA)
In this lecture, we will explore the various ways in which a Pushdown Automaton (PDA) can accept strings from its input
alphabet. Specifically, we will focus on two primary methods of acceptance:
1. Acceptance by Final State: A PDA accepts a string if it reaches an accepting (final) state after processing the entire input
string.
2. Acceptance by Empty Stack: A PDA accepts a string if, after processing the entire string, the stack is empty.
Additionally, we will explore the relationship between these two methods of acceptance, particularly how one can be used
to simulate the other.
Formal Definition:
A string w is accepted by the PDA if there exists a sequence of transitions that leads the PDA to an accepting state after
consuming all of w .
Example:
Consider the PDA P that recognizes the language L = {an bn ∣ n ≥ 0}, where the PDA accepts by final state. The PDA
operates as follows:
1. Initially, in state q0 , the PDA pushes a onto the stack for every a in the input.
3. If, after reading all b 's, the input is exhausted and the PDA is in an accepting state (say q1 ), then the string is accepted.
On reading the first a , it transitions to q0 and pushes a onto the stack, resulting in the stack aZ0 .
On reading the second a , it transitions again to q0 and pushes another a , resulting in the stack aaZ0 .
On reading the b , it transitions to q0 and pops a from the stack, resulting in the stack aZ0 .
After processing all the input, the PDA reaches the accepting state q1 and the string is accepted.
67/145
Formal Definition:
A string w is accepted by the PDA if there exists a sequence of transitions that causes the PDA’s stack to be empty after
processing all of w .
Example:
Consider the same PDA P recognizing L = {an bn ∣ n ≥ 0}, but this time the PDA accepts by empty stack.
1. Initially, the PDA is in state q0 with the stack initialized to Z0 .
4. After reading all input, if the stack is empty, the string is accepted.
On reading the first a , it transitions to q0 and pushes a , resulting in the stack aZ0 .
On reading the b , it pops one a from the stack, resulting in the stack aZ0 .
Since the stack is empty at the end of the input, the string is accepted.
Construction:
Given a PDA P that accepts by empty stack, we construct a new PDA P ′ that accepts by final state as follows:
2. Modify the transition function such that when the PDA reaches the empty stack configuration (i.e., the stack is empty), it
transitions to the new final state qf .
3. The PDA P ′ will accept the string if it reaches the state qf with an empty stack.
This construction ensures that the PDA accepts by final state whenever the original PDA would have accepted by empty
stack.
Example:
For the PDA P that accepts by empty stack for L = {an bn ∣ n ≥ 0}:
1. Construct P ′ , where the state qf is added as a new final state.
This transformation ensures equivalence between acceptance by empty stack and acceptance by final state.
68/145
4. From Final State to Empty Stack
Conversely, we can also show that a PDA that accepts by final state can be transformed into a PDA that accepts by empty
stack.
Construction:
Given a PDA P that accepts by final state, we construct a new PDA P ′ that accepts by empty stack as follows:
2. Modify P ′ to ensure that it can only reach the final state when the stack is empty. Specifically, make transitions such
that, if P ′ reaches a final state, it must pop every symbol from the stack, leaving the stack empty.
This construction ensures that the PDA P ′ accepts by empty stack whenever the original PDA accepted by final state.
Example:
For the PDA P that accepts by final state for L = {an bn ∣ n ≥ 0}:
1. Construct P ′ , where we add a new stack symbol Zf to mark the bottom of the stack.
2. Modify P ′ 's transition function to ensure the stack is emptied before reaching the final state.
This transformation ensures equivalence between acceptance by final state and acceptance by empty stack.
Conclusion
In this lecture, we have discussed the two primary methods by which a Pushdown Automaton (PDA) can accept strings:
1. Acceptance by Final State: The PDA accepts a string if it reaches an accepting state after processing the entire input.
2. Acceptance by Empty Stack: The PDA accepts a string if, after processing the input, the stack is empty.
We also explored the equivalence between these two methods, showing that a PDA that accepts by empty stack can be
transformed to one that accepts by final state and vice versa. Understanding both methods is crucial in studying the
expressive power of PDAs and their ability to recognize context-free languages.
1. From Grammars to PDAs: How to construct a PDA that accepts the language generated by a given CFG.
2. From PDAs to Grammars: How to construct a CFG that generates the language recognized by a given PDA.
Both of these directions are fundamental in understanding the relationship between these two models of computation,
both of which recognize context-free languages.
69/145
Given a Context-Free Grammar (CFG), we can construct a Pushdown Automaton (PDA) that recognizes the same language.
The construction works by simulating the derivation process of the grammar with the stack of the PDA.
Construction Steps:
1. States of the PDA: The PDA will have a single state q0 where it stays during the entire computation.
2. Stack Alphabet: The stack of the PDA will use symbols from V ∪ Σ ∪ {#}, where # is a new stack symbol
representing the bottom of the stack.
3. Start Symbol: The initial stack symbol is S , the start symbol of the grammar G.
4. Transition Function: The PDA will make transitions based on the top of the stack:
If the top of the stack is a terminal symbol (from Σ), the PDA will match it with the input string.
If the top of the stack is a non-terminal symbol (from V ), the PDA will apply the corresponding production rule from
G, replacing the non-terminal with the right-hand side of the production.
5. Acceptance Condition: The PDA accepts by empty stack, meaning that when the input is completely consumed, the
stack must be empty.
Example:
From q0 , on reading a and having S on top of the stack, replace S with aSb.
From q0 , on reading b and having S on top of the stack, pop S from the stack.
If the stack is empty after processing the entire input, the PDA accepts the string.
Key Idea: The PDA simulates the leftmost derivation of the grammar G, and by following the production rules, it ensures
that each a read corresponds to a matching b .
70/145
Given a Pushdown Automaton (PDA), we can construct a Context-Free Grammar (CFG) that generates the same language.
The construction works by simulating the computation of the PDA and producing derivations that correspond to its
behavior.
Construction Steps:
1. Variables of the Grammar: Each variable in G corresponds to a pair of a PDA state and a stack symbol, i.e., a variable of
the form [p, X], where p ∈ Q and X ∈ Γ. The variable [p, X] represents the derivation of strings from state p with X
on the stack.
2. Start Symbol: The start symbol of the grammar is [q0 , Z0 ], representing the initial configuration of the PDA.
3. Production Rules: The production rules of the CFG are constructed based on the transitions of the PDA:
If there is a transition δ(p, a, X) = (q, α), where a ∈ Σ and X ∈ Γ, and α ∈ Γ∗ , we add the rule [p, X] → a[q, α]
to the grammar.
If there is a transition δ(p, ϵ, X) = (q, α), we add the rule [p, X] → [q, α] to the grammar.
If the PDA has a final state pf ∈ F , we add the rule [pf , Z0 ] → ϵ to allow the empty string to be derived when the
stack is empty.
Example:
Consider a PDA P that recognizes the language L = {an bn ∣ n ≥ 0}. The PDA has:
States Q = {q0 , q1 },
71/145
5. δ(q1 , ϵ, Z0 ) = (q1 , Z0 ).
Production rules:
[q0 , a] → b[q1 , Z0 ],
[q1 , Z0 ] → ϵ.
This CFG generates the same language {an bn ∣ n ≥ 0} as the original PDA.
Conclusion
In this lecture, we explored the equivalence between Pushdown Automata (PDAs) and Context-Free Grammars (CFGs).
Specifically, we discussed the two constructions:
1. From Grammars to PDAs: We constructed a PDA that simulates the derivation process of a CFG.
2. From PDAs to Grammars: We constructed a CFG that generates the language recognized by a PDA by simulating the
PDA’s computation.
These constructions show that the class of context-free languages can be recognized both by PDAs and generated by CFGs,
demonstrating the deep equivalence between these two models.
72/145
δ is the transition function δ : Q × Σ × Γ → Q × Γ∗ , which is deterministic: for each combination of state, input
symbol, and stack symbol, there is at most one transition.
The stack is empty at the end of the computation (or it reaches an accepting state depending on the acceptance
condition).
In a Non-Deterministic PDA (NFA), for a given state, input symbol, and stack symbol, there may be multiple possible
transitions. However, in a Deterministic PDA (DPDA), there is exactly one transition for each such combination, making the
computation deterministic.
Any regular language can be recognized by a DPDA. In fact, regular languages are a subset of deterministic context-
free languages.
Since regular languages can be recognized by finite automata (FAs), and deterministic pushdown automata (DPDAs)
can simulate finite automata (since they have the capability to ignore the stack), any regular language is also recognized
by a DPDA.
Example:
Consider the regular language L = {an bn ∣ n ≥ 0}. This language is a regular language and can be recognized by both a
finite automaton and a DPDA. The DPDA, in this case, can simply use its stack to track the number of a s and then match
them with the corresponding b s.
73/145
CFLs: The set of all context-free languages is the set of languages that can be recognized by a non-deterministic
pushdown automaton (NPDA). NPDAs allow multiple possible transitions for a given input symbol and stack symbol,
which means they can recognize a broader set of languages.
DCFLs: The set of deterministic context-free languages is a proper subset of CFLs. These are the languages that can
be recognized by a deterministic pushdown automaton (DPDA).
Example of DCFL:
Consider the language L = {an bn ∣ n ≥ 0}. This is a deterministic context-free language, and a DPDA can recognize it by
pushing an a onto the stack and popping it when it reads a b . The DPDA can deterministically match each a with a b .
The language L = {an bn cn ∣ n ≥ 0} is a context-free language (CFL), but it is not deterministic. A DPDA cannot
recognize this language, as it requires non-determinism to match the a s, b s, and c s in the correct order. A non-
deterministic PDA (NPDA) can recognize this language, but a DPDA cannot.
DPDAs cannot recognize languages generated by ambiguous grammars. If a context-free language is ambiguous,
there is no DPDA that can recognize it deterministically.
Non-deterministic PDAs (NPDAs), on the other hand, can handle ambiguous grammars because they can "guess" the
correct parse tree non-deterministically.
Consider the context-free grammar G = {S → aS ∣ bS ∣ ϵ}. The language generated by this grammar is L = {an bn ∣
n ≥ 0}. However, this grammar is ambiguous because the string ab can be derived in two ways:
1. S → aS → ab,
2. S → bS → ab.
The ambiguity of this grammar means that it cannot be recognized by a DPDA. An NPDA could recognize the language, but
a DPDA cannot, since it would need non-determinism to choose between the two possible derivations.
Summary
Deterministic Pushdown Automata (DPDA) are a restricted form of pushdown automata where for every combination
of state, input symbol, and stack symbol, there is at most one transition.
Regular Languages (RL) can be recognized by both Finite Automata (FA) and DPDA.
Deterministic Context-Free Languages (DCFLs) are a subset of Context-Free Languages (CFLs) that can be recognized
by DPA. Some context-free languages (e.g., an bn cn ) cannot be recognized by a DPDA.
74/145
Ambiguous grammars generate languages that cannot be recognized by DPDAs because DPDAs require deterministic
parsing, which is not possible for ambiguous languages.
This concludes the lecture on Deterministic Pushdown Automata (DPDA) and their properties.
1. Generating symbols: A symbol is generating if there exists a derivation from it to a string of terminals.
2. Reachable symbols: A symbol is reachable if it can be reached from the start symbol.
Step-by-step Process:
Identify Generating Symbols: A non-terminal is generating if there exists a derivation from it to a terminal string
(possibly using other non-terminals).
Identify Reachable Symbols: A non-terminal is reachable if there exists a derivation from the start symbol to it.
Example:
S → AB
A → aA ∣ a
B → bB ∣ b
C→c
A → aA (generating), A → a (generating).
B → bB (generating), B → b (generating).
C → c (generating).
So, A and B are generating symbols. S is generating because it can produce AB , and both A and B are generating.
However, C is not generating since there is no derivation from C to any terminal string.
The start symbol is S , and from S we can reach A and B , so A and B are reachable.
75/145
C is not reachable from S (there is no production starting from S that uses C ).
Updated Grammar:
S → AB
A → aA ∣ a
B → bB ∣ b
For non-terminal symbols, iteratively check whether there is a production rule that can lead to a string of terminals.
For reachable symbols, start from the start symbol and mark all non-terminals that can be reached through the
production rules.
3. Eliminating Epsilon-Productions
An epsilon-production is a production rule of the form A → ϵ, where ϵ represents the empty string. Eliminating epsilon-
productions involves the following steps:
For each production A → X1 X2 ⋯ Xn , generate new productions by considering the possibility of each Xi being
replaced by ϵ.
Example:
S→A∣b
A → ϵ ∣ aA
B → bB ∣ ϵ
76/145
Step 2: Modify the Productions
For every production that includes A or B , add new productions where A and B are omitted:
Updated Grammar:
S→A∣b
A → aA
B → bB ∣ b
Example:
S→A∣b
A → B ∣ aA
B→b
Updated Grammar:
S → B ∣ aA ∣ b
A → B ∣ aA
B→b
77/145
3. The start symbol can produce ϵ (if the language includes the empty string).
1. Eliminate epsilon-productions.
Example:
S → AB
A → aA ∣ a
B→b
S → AB
A → XA ∣ a
B→b
X→a
Now, all the productions are of the form A → BC or A → a, so the grammar is in Chomsky Normal Form (CNF).
Summary
Eliminating Useless Symbols: We identify and remove symbols that do not contribute to generating terminal strings.
Generating and Reachable Symbols: Symbols that can derive terminal strings and can be reached from the start
symbol are useful for grammar.
Eliminating Epsilon-Productions: We eliminate A → ϵ productions by adjusting other productions that might involve
non-terminals that can derive ϵ.
Eliminating Unit Productions: We remove unit productions (i.e., A → B ) by directly substituting the rules for B into A.
Chomsky Normal Form (CNF): We transform the grammar so that each production is either of the form A → BC or
A → a, ensuring that the grammar is in the desired normal form for further analysis or parsing algorithms.
78/145
This concludes the lecture on Normal Forms of Context-Free Grammars.
For context-free grammars, we know from the Chomsky hierarchy that the height of a parse tree for a string of length n
grows logarithmically in the worst case. More specifically, for a context-free grammar, the length of the derivation (the
number of nodes or levels in the parse tree) grows at most linearly with the size of the string, i.e., a context-free grammar
can generate a string of length n with a parse tree having a maximum height proportional to n.
The pumping lemma exploits the structure of these parse trees, particularly when the string length is sufficiently large, to
show that some parts of the string can be "pumped" (repeated) while still belonging to the language.
For any context-free language L, there exists some constant p (called the pumping length) such that any string s ∈L
with ∣s∣ ≥ p can be divided into five substrings s = uvwxy satisfying the following conditions:
1. Length of the substrings: ∣vwx∣ ≤p
2. Non-empty substrings: ∣vx∣ ≥1
3. Pumping property: For all i≥ 0, the string u(v i )w(xi )y is in L, i.e., the string formed by repeating the substrings v
and x any number of times still belongs to L.
This lemma asserts that in any sufficiently long string derived from a context-free grammar, we can "pump" (repeat) some
part of the string, and the resulting string will still be part of the language. This is an important tool for proving that certain
languages are not context-free by showing that no decomposition can satisfy the conditions of the pumping lemma.
79/145
3. Detailed Proof of the Pumping Lemma for CFGs
We now provide a proof of the Pumping Lemma for context-free languages. The proof proceeds by induction on the
number of non-terminal symbols in the grammar.
Proof Outline:
1. Assumption: Suppose L is a context-free language. By the definition of context-free languages, there exists a context-
free grammar G = (V , Σ, R, S) that generates L.
2. Let p be the pumping length for L. By the pumping lemma, any string s ∈ L with ∣s∣ ≥ p can be decomposed as s =
uvwxy satisfying the conditions stated earlier.
3. Since G is context-free, we know that any string s ∈ L of length at least p must have a corresponding parse tree with at
most p internal nodes. The structure of this tree forces the existence of repeating substrings in the derivation, which
can be "pumped".
4. The key idea of the proof is that the repeated substrings correspond to parts of the parse tree where a non-terminal
symbol is rewritten, allowing for the "pumping" of the corresponding part of the string.
The proof leverages the structure of the parse tree and the properties of context-free grammars, particularly their ability to
rewrite non-terminals recursively. Since a context-free grammar has a finite number of non-terminals and rules, there will
always be repetitions in the derivation process for long enough strings. These repetitions form the substrings v and x,
which can be pumped as described by the lemma.
Let's consider a few examples of applying the Pumping Lemma to prove that a language is not context-free.
∣vwx∣ ≤ p,
∣vx∣ ≥ 1,
For all i ≥ 0, uv i wxi y ∈ L.
5. The string s = ap bp cp consists of three distinct segments: a block of a's, a block of b's, and a block of c's. Given the
condition that ∣vwx∣ ≤ p, it is clear that vwx must lie entirely within one of these blocks (since the blocks of a's, b's,
and c's are distinct).
6. If we try to pump v and x in such a way that the pumped string still has the form an bn cn , we will end up with a string
that has more of one character (say, more a's) and fewer of the others, thus breaking the structure of the language.
80/145
This contradiction shows that L is not context-free.
5. We know that vwx must be entirely within the first half or the second half of the string (because the two halves are
identical).
However, when we pump v and x, we disrupt the structure of the string, making it impossible to maintain the form ww
where both halves are identical.
Summary
Pumping Lemma for Context-Free Languages provides a necessary condition for a language to be context-free.
Specifically, it says that sufficiently long strings in a CFL can be "pumped" (with repeated substrings) without leaving the
language.
Proof of the Pumping Lemma involves showing that for any sufficiently long string, parts of its derivation tree can be
repeated (pumped) without violating the structure of the language.
Applications of the Pumping Lemma: We applied the lemma to show that certain languages, such as {an bn cn } and
{ww}, are not context-free by demonstrating that no valid pumping can preserve the structure of the language.
The Pumping Lemma is a powerful tool in the theory of formal languages, especially for proving that certain languages are
not context-free.
We will also provide formal proofs and examples to demonstrate how these properties hold.
1. Substitutions
The substitution operation refers to replacing each symbol in a string with a string from a certain language. Specifically, if
L1 and L2 are languages, and G1 = (V1 , Σ1 , R1 , S1 ) is a context-free grammar for L1 , we can define a substitution as
81/145
follows:
For each symbol X in the alphabet of the language L1 , replace X with a string derived from the language L2 .
The Substitution Theorem for context-free languages states that the class of context-free languages is closed under
substitution. That is, if we substitute a non-terminal in a context-free grammar with another context-free language, the
resulting language will still be context-free.
2. We then define a new grammar G′ that generates the resulting language, where we replace the occurrences of X in
G1 with the rules of G2 .
By the definition of context-free grammars and the properties of non-terminals and derivations, we can construct a
grammar that generates the substituted language. Hence, the language formed by substitution of L2 into L1 is still context-
free.
Example of Substitution
Let’s consider the languages L1 = {an bn ∣ n ≥ 0} (which is context-free) and L2 = {am bm ∣ m ≥ 0} (which is also
context-free).
S1 → aSb ∣ ϵ
The resulting language will still be context-free because substitution preserves the context-free property.
Combining different context-free languages: By substituting one context-free language into another, we can create
new context-free languages.
Grammar transformations: Substitution can be used to transform grammars for complex constructs in programming
languages.
Example Application
Let’s consider two context-free languages L1 = {an bn ∣ n ≥ 0} and L2 = {cm dm ∣ m ≥ 0}. If we use substitution to
replace b with cm dm , we create a new language L = {an cm dm ∣ n ≥ 0, m ≥ 0}, which is still context-free by the closure
under substitution.
82/145
3. Reversal
The reversal of a language L, denoted LR , is the language formed by reversing each string in L. Specifically, if L =
{w1 , w2 , w3 , … }, then:
The Reversal Theorem for context-free languages states that the class of context-free languages is closed under reversal.
That is, if L is a context-free language, then its reversal LR is also context-free.
1. Reverse all production rules of G. Specifically, for each production A → X1 X2 … Xk in G, replace it with A →
Xk Xk−1 … X1 .
By reversing all the production rules and the start symbol, the resulting grammar will generate the language LR . Therefore,
the class of context-free languages is closed under reversal.
Example of Reversal
Consider the language L = {an bn ∣ n ≥ 0}, which is context-free. The reversal of L is LR = {bn an ∣ n ≥ 0}. We can
easily observe that this language is still context-free because we can construct a grammar for LR using similar rules to
those of L, but in reversed order.
Counterexample:
The intersection L1∩ L2 will contain strings of the form an bn cn , which is context-free. However, the result still requires a
non-context-free property, as the intersection is a strict subset of L1 and requires a non-context-free computation.
5. Inverse Homomorphisms
An inverse homomorphism is an operation that maps a language by reversing the process of a homomorphism. Given a
homomorphism h from Σ to Γ, the inverse homomorphism is defined as:
83/145
h−1 (L) = {w ∣ h(w) ∈ L}
The Inverse Homomorphism Theorem for context-free languages states that the class of context-free languages is closed
under inverse homomorphisms. That is, if L is a context-free language and h is a homomorphism, then h−1 (L) is also
context-free.
Let L be a context-free language, and h be a homomorphism from Σ to Γ. We know that context-free languages are closed
under composition with homomorphisms, so applying h−1 to a context-free language L will result in another context-free
language.
By constructing a context-free grammar for h−1 (L), we can show that the resulting language is context-free. This can be
done by systematically reversing the homomorphism and adjusting the grammar accordingly.
Let L = {an bn ∣ n ≥ 0} and let h : {a, b} → {a, b, c} be the homomorphism where h(a) = ac and h(b) = bc. The
inverse homomorphism h−1 (L) will map the strings an bn to strings of the form acn bcn , which remains a context-free
language.
Summary
Substitutions: The class of context-free languages is closed under substitution.
Intersection with Regular Languages: The class of context-free languages is not closed under intersection with
regular languages.
Inverse Homomorphisms: The class of context-free languages is closed under inverse homomorphisms.
We have explored these closure properties in depth through formal proofs and examples, demonstrating the versatility and
limitations of context-free languages under various operations.
Each topic will be discussed with proofs and examples where applicable.
84/145
A Context-Free Grammar (CFG) and a Pushdown Automaton (PDA) are two equivalent formal models of computation for
context-free languages. The question of converting between CFGs and PDAs is important because it allows us to move
between two representations of a context-free language.
Conversion: Every context-free grammar can be converted into an equivalent pushdown automaton. The PDA simulates
the derivation process of the grammar by pushing symbols onto the stack and popping them as rules are applied.
Complexity: Converting a context-free grammar G to a PDA involves constructing a PDA that simulates the leftmost
derivation of G. This can be done in linear time with respect to the size of the grammar, as each non-terminal in the
grammar corresponds to a state in the PDA, and each production rule corresponds to a transition in the PDA.
Conversion: Similarly, every pushdown automaton can be converted into an equivalent context-free grammar. This is
done by simulating the push and pop operations of the PDA and translating them into grammar production rules.
Complexity: The conversion process from a PDA to a CFG can be more complex because it involves handling the non-
deterministic transitions and the stack behavior of the PDA. However, the process can be achieved in polynomial time in
the size of the PDA.
Thus, both conversions from a CFG to a PDA and from a PDA to a CFG can be done in polynomial time, but the specific
complexities depend on the details of the grammar or automaton.
The conversion of a CFG to CNF involves removing useless symbols, eliminating ε-productions, eliminating unit
productions, and ensuring the grammar meets the CNF structure.
1. Eliminating Useless Symbols: First, remove any non-terminal symbols that do not generate any terminal strings or
cannot be reached from the start symbol. This step involves identifying the "reachable" and "generating" non-terminals,
and it can be done in linear time.
2. Eliminating ε-productions: ε-productions (productions of the form A → ϵ) need to be eliminated. This process involves
finding all nullable non-terminals and replacing the ε-productions with the appropriate modifications to other rules. The
complexity of this step is linear in the size of the grammar.
3. Eliminating Unit Productions: Unit productions are those of the form A → B , where A and B are non-terminals.
These can be eliminated by substituting all productions of B into A's rules. This step takes polynomial time.
4. Breaking Down Long Productions: The final step is to ensure all productions are either of the form A → BC or A →
a. For any production longer than two symbols, new non-terminals are introduced to shorten the right-hand side of the
production. This process takes linear time.
Thus, the total running time for converting a CFG into CNF is polynomial in the size of the grammar.
85/145
Example:
S → AB
A→a
B→b
This grammar is already in CNF. If we had longer productions, we would split them into smaller ones by introducing new
non-terminals.
To test whether a context-free language L is empty, we need to check whether the start symbol S of the CFG can derive
any string. This can be done by performing a reachability analysis where we compute the set of symbols that can derive
strings, starting from the start symbol.
1. Finding all the non-terminal symbols that can generate terminal strings.
The complexity of this procedure is linear in the size of the grammar because we can iteratively mark non-terminals
that can eventually derive terminal strings.
Example:
S → aSb ∣ ϵ
We can see that S can derive the empty string (because S → ϵ). Therefore, the language is not empty.
PDA Simulation: To test whether a string w belongs to the language L, we can simulate a pushdown automaton (PDA)
that accepts the language L. The PDA will push and pop symbols from the stack according to the production rules of
the CFG.
Dynamic Programming: Another approach is to use dynamic programming (CYK algorithm), which runs in cubic time
with respect to the length of the string. This algorithm constructs a table where each cell T [i, j] represents the set of
non-terminals that can derive the substring of w from position i to j .
86/145
Thus, testing membership in a CFL can be done in cubic time in the length of the string using the CYK algorithm.
Example:
S → aSb ∣ ϵ
To test membership for the string "ab", the CYK algorithm will check if there is a derivation for the string using the grammar.
It will confirm that "ab" is part of the language.
1. Universality Problem: Given a CFG G, is the language generated by G equal to Σ∗ ? This is undecidable because it
involves determining whether every possible string is generated by the grammar.
2. Finiteness Problem: Given a CFG G, is the language generated by G finite? This is undecidable because determining
whether a context-free language is finite is equivalent to checking whether a language can generate an infinite number
of strings.
3. Equivalence Problem: Given two context-free grammars G1 and G2 , do they generate the same language? This
undecidable because checking the intersection of CFLs and RLs is computationally hard.
These problems have been proven to be undecidable, and there is no algorithm that can solve them for all context-free
languages.
Summary
In this lecture, we have discussed the following decision properties of context-free languages (CFLs):
Testing emptiness of a CFL can be done in linear time with respect to the size of the grammar.
Testing membership in a CFL can be done in cubic time using the CYK algorithm.
There are several undecidable problems related to context-free languages, such as universality, finiteness, and
equivalence problems.
These results provide a strong foundation for understanding the computational limitations and capabilities of context-free
languages and grammars.
87/145
In this lecture, we explore unsolvable problems in computation, which involve the limits of computation and the inability
of computers to solve certain classes of problems. We will address this concept through the following points:
This discussion will involve formal proofs, examples, and a C code example for better understanding.
#include <stdio.h>
int main() {
printf("Hello World\n");
return 0;
}
However, imagine we frame Fermat's Last Theorem in a similar context. Fermat’s Last Theorem states that there are no
three positive integers a, b, and c that satisfy the equation:
an + bn = cn
Let’s consider the following C program that "prints" a proof of Fermat’s Last Theorem:
#include <stdio.h>
#include <math.h>
int main() {
int a, b, c, n;
for (n = 3; n <= 10; n++) { // Looping over some possible values of n
for (a = 1; a <= 100; a++) {
for (b = 1; b <= 100; b++) {
for (c = 1; c <= 100; c++) {
if (pow(a, n) + pow(b, n) == pow(c, n)) {
printf("Counterexample found for n=%d: a=%d, b=%d, c=%d\n", n, a, b, c);
return 0;
}
}
}
}
}
88/145
printf("No counterexample found. Fermat's Last Theorem holds!\n");
return 0;
}
This code attempts to find a counterexample to Fermat’s Last Theorem. However, the theorem has already been proven, so
the program will never find any valid solution for n > 2. The key point here is:
No solution exists that would violate Fermat’s Last Theorem for integers n > 2. This demonstrates a computationally
unsolvable problem: The program cannot definitively prove the theorem in a reasonable amount of time since it would
have to exhaustively check all possible combinations of a, b, and c, and even then, it would not be able to prove the
result for all integers.
Formal Insight:
This illustrates how certain problems (like Fermat's Last Theorem) may not be computable in a simple algorithmic sense,
and it exemplifies how unsolvable problems are often linked to intractable computational tasks.
The Hypothesis:
We imagine a program that takes another program P as input and determines if P prints the string "Hello World". Let's call
this program the Hello World tester. It would have the following structure:
#include <stdio.h>
int main() {
// Hypothetically test a given program
if (test_hello_world("some_program.c") == 1) {
printf("It's a valid Hello World program.\n");
} else {
printf("It's not a Hello World program.\n");
}
return 0;
}
1. Suppose the program test_hello_world works as intended for all possible C programs.
89/145
2. Consider the following contradiction using the program test_hello_world :
#include <stdio.h>
int main() {
if (test_hello_world("some_program.c") == 1) {
// If the program prints "Hello World", it doesn't output anything
} else {
printf("Hello World\n");
}
return 0;
}
Thus, it causes a logical contradiction. It cannot both print and not print "Hello World", which is a paradox — it essentially
defies the definition of a Hello World program. This paradox reflects the limitations of computability.
This example demonstrates the Halting Problem in disguise: whether a program halts or produces a specific output (in this
case, "Hello World") is not always decidable.
Let’s assume we have a problem X that is undecidable (like the Halting Problem). Suppose we want to determine whether a
given C program halts when run. The Halting Problem states that there is no general algorithm that can decide whether an
arbitrary program will halt on a given input.
Let’s try to reduce the Halting Problem to a simpler problem Y. We will show how solving Y would allow us to solve the
Halting Problem.
The Halting Problem is: Given a program P and input I , does P halt on input I ?
Let’s assume we have a solver for Y that can answer whether a program halts.
Now, we reduce the Halting Problem to Y by creating a new program P ′ based on P and input I :
#include <stdio.h>
int P_prime() {
90/145
// This function runs P with input I and does something trivial afterward.
if (halting_solver(P, I)) {
return 1; // If P halts on I, return 1.
}
return 0; // Otherwise, return 0.
}
Now, solving the problem of whether P' halts is equivalent to solving the Halting Problem for the original program P . This
shows that Y is at least as hard as the Halting Problem, and since the Halting Problem is undecidable, Y must also be
undecidable.
Summary
In this lecture, we have discussed several key ideas related to unsolvable problems and computational limits:
1. Programs that Print "Hello World": We framed Fermat's Last Theorem as a Hello World program and demonstrated
that some problems cannot be solved by computers.
2. The Hypothetical "Hello World" Tester: We explored the paradox of testing whether a program is a valid Hello World
program, which leads to a contradiction and relates to the Halting Problem.
3. Reducing One Problem to Another: We introduced the concept of reductions in computation and showed how solving
one problem (like the Halting Problem) could lead to solving another, demonstrating the undecidability of certain
problems.
These examples highlight the fundamental limits of computation and underscore the importance of understanding the
boundaries of what computers can and cannot do.
In the early 1930s, mathematicians were confronted with Gödel’s Incompleteness Theorems, which demonstrated that any
sufficiently powerful formal system could not be both complete and consistent. This result shattered the dream of finding a
mechanical method (or algorithm) that could decide every mathematical question.
Around the same time, Alan Turing and Alonzo Church developed the concept of a computational model to address these
issues. Turing introduced the Turing Machine in 1936 as a theoretical construct to formalize the idea of computation.
Turing’s model would provide a rigorous definition of what it means for a problem to be computable.
91/145
Turing’s work, along with the work of Church (via the lambda calculus), showed that the class of problems solvable by a
human using a machine (in Turing's sense) is exactly the class of problems solvable by any algorithmic process. This led to
the Church-Turing Thesis, which asserts that anything computable by a machine is computable by a Turing machine.
Thus, the Turing Machine became the foundation for understanding computation and its limitations, particularly with
respect to decidability.
Tape: An infinite sequence of cells, each containing a symbol from a finite alphabet. The tape can be thought of as a
sequence of memory locations, where each location can hold a symbol. The tape extends infinitely in both directions.
Alphabet: The set of symbols that the machine can read and write, denoted by Σ. This includes a special blank symbol
⊔ to represent an empty cell.
Head: A read/write head that scans the tape. The head can move left or right, and it can also read or write a symbol in
the current tape cell.
States: The machine operates in one of a finite set of states Q, and there is a designated start state and one or more
halt states. The machine’s operation is determined by its current state and the symbol it is reading on the tape.
Transition Function: The transition function δ is a set of rules that determine the next state, the symbol to write, and
the direction to move the head, given the current state and symbol on the tape. Formally, this is written as:
δ : Q × Σ → Q × Σ × {L, R}
where:
The contents of the tape (the string of symbols currently written on the tape).
bash
92/145
(ID) q1: ...011010... (head on 1)
This tells us that the machine is in state q1, the tape contents are "...011010...", and the head is on the second symbol, which
is 1.
An instantaneous description provides a complete description of the machine’s state at any given moment during its
computation.
Directed edges between states represent transitions based on the current symbol being read.
Labels on the edges show the symbol written, the direction of head movement (L for left, R for right), and the state
transition.
rust
This Turing Machine checks whether the string has an even number of 1s by switching between two states q0 and q1. If the
machine ends in q0, it accepts the string (even number of 1s); otherwise, it rejects the string.
To illustrate halting, let’s consider a simple Turing Machine that halts if the input string contains an even number of 1s:
2. If the machine reads another 1, it returns to state q0, indicating an even number of 1s.
3. The machine halts if it reaches the end of the string with an even number of 1s.
93/145
Read the first 1, move to state q1.
The machine halts in q0, meaning the input has an even number of 1s.
However, a Turing Machine is not guaranteed to halt in all cases. For instance, consider a machine that runs in an infinite
loop or one that processes an undecidable problem. In those cases, the machine may never halt.
For example, consider a Turing Machine that recognizes the language L = {w ∣ w has an even number of 1s}. The
language of this machine is the set of all strings over {0, 1} that contain an even number of 1s.
Summary
In this lecture, we introduced the concept of the Turing Machine (TM), a central model of computation that helps define the
limits of what can be computed. We discussed its historical context in the quest to decide all mathematical questions, the
formal notation for Turing Machines, instantaneous descriptions, and transition diagrams. We also explored the concept of
halting and proved that the Halting Problem is undecidable. Finally, we defined the language of a Turing Machine, which
consists of the set of strings that the machine accepts.
Formalism:
The set of states Q in a Turing Machine M can be enlarged to include states that encode additional information. Thus,
the set of states Q′ becomes a super-set of Q, where the new states represent specific computations or data encoded
in the machine’s configuration.
94/145
This storage technique is essentially an encoding scheme where states can carry encoded data such as binary
counters, flags, or memory values that would otherwise require additional tape cells.
Example:
Consider a Turing Machine that needs to count the number of occurrences of a symbol a in a string. Rather than writing the
count to the tape, we can encode the count directly in the machine's states. Let’s suppose the machine has states
q0 , q1 , q2 , …, where q0 represents the initial state, and each subsequent state q1 , q2 , … represents the count of a's read so
far.
By storing the count in the states, the machine can track the number of a's without modifying the tape.
This technique is useful for problems where the machine needs to keep track of intermediate results or flags without
consuming tape space.
2. Multiple Tracks
Another powerful technique is to use multiple tracks on the tape. A Turing Machine traditionally has one tape with a single
head, but we can extend the model to allow for multiple tracks or multiple tapes. In this extended model, the tape is
divided into several parallel tracks, each of which holds a separate tape alphabet.
Each track can store a different piece of information, and the head can read from and write to each track independently.
Multiple tracks allow for a more sophisticated approach to computation, where different types of data can be manipulated
simultaneously.
Formalism:
A multi-track tape can be described as a set of k tapes, each with its own alphabet Σ1 , Σ2 , … , Σk , where k is the
number of tracks.
The machine’s transition function now takes into account the current state and the symbols read from each of the
tracks. Formally, the transition function δ is extended to:
where each tuple (Σ1 , Σ2 , … , Σk ) represents the symbols from the different tracks under the current head position.
Example:
Let’s consider a Turing Machine with two tracks. On the first track, it stores a binary number (e.g., 101), and on the second
track, it stores the reverse of the binary number (e.g., 101). The machine’s task is to check whether the two tracks store the
same sequence.
Initially, the head scans both tracks and starts in the initial state q0 .
The machine checks if the first and last symbols are the same. If they are, the machine proceeds to compare the next
pair of symbols, moving towards the center of the tape.
95/145
The transition function would read from both tracks at once, updating both tracks as needed and checking the symbols
simultaneously.
Using multiple tracks like this simplifies the task, as the machine can work with two separate pieces of information at once.
3. Subroutines
A subroutine in the context of a Turing Machine refers to a modular computation method where the machine can invoke a
set of predefined states (a subroutine) to perform a specific task. This technique helps in organizing the computation into
smaller, reusable parts, making complex tasks easier to manage and reason about.
While Turing Machines do not inherently have the concept of functions or procedures like in high-level programming
languages, we can simulate subroutines by creating states that perform a specific task and then returning to the main
computation after completion.
Formalism:
A subroutine is essentially a set of states and transitions designed to accomplish a specific task. Once the task is
complete, the machine transitions back to the state that invoked the subroutine.
The subroutine can be invoked by setting up an initial state in the main program that transitions to the first state of the
subroutine.
A subroutine can return control to the calling state by using a special return state or by modifying the machine's state
to indicate the subroutine has completed.
Example:
Let’s consider a Turing Machine that needs to compute the sum of two binary numbers. This task can be broken down into
smaller subroutines, such as:
1. Subroutine for Adding Two Digits: This subroutine takes two digits (one from each binary number) and adds them,
handling the carry if necessary.
2. Subroutine for Handling Carry: This subroutine deals with carry-over from one digit to the next during the addition.
The main program would invoke these subroutines at appropriate points to add digits and propagate carries. Once a
subroutine completes its task, the machine would return to the main program to continue the overall computation.
Summary
In this lecture, we explored three advanced programming techniques for Turing Machines: storage in the state, multiple
tracks, and subroutines. These techniques significantly enhance the Turing Machine’s ability to perform more complex
computations efficiently.
96/145
1. Storage in the state allows the machine to use its states as additional memory, enabling efficient tracking of
intermediate results.
2. Multiple tracks provide a way to store and manipulate multiple types of information simultaneously, enhancing the
machine’s computational power.
3. Subroutines allow the machine to perform modular computations by invoking predefined sets of states, making
complex tasks more manageable.
These techniques reflect how Turing Machines can be adapted and programmed in sophisticated ways, although they
remain abstract computational models, far removed from practical implementation.
Formal Definition:
where:
qaccept and qreject are the accept and reject states, respectively.
where Γ1 , Γ2 , … , Γk are the tape alphabets for each tape, and {L, R}k denotes the movement of each head (left or
Example:
Consider a 2-tape Turing Machine tasked with checking whether a string is a palindrome. The machine could:
1. Copy the string from tape 1 to tape 2, ensuring the second tape mirrors the first.
97/145
2. Then, it can compare symbols on both tapes simultaneously to verify if they match, moving heads in tandem.
While multitape TMs provide significant computational convenience, it is important to prove that multitape TMs are not
more powerful than single-tape TMs in terms of computational power (they can compute the same class of languages).
The idea is to use the single tape of M1 to represent all the tapes of Mk . We interleave the contents of the k tapes
Suppose that each of the k tapes holds a string of symbols. We can represent these strings on the single tape by
separating the contents of each tape with a special separator symbol (e.g., #).
Let the configuration of the k -tape machine at any point in time be represented as:
3. Simulating Transitions:
Identify the positions of the heads for each tape by searching for the special head marker ∣.
Copy the appropriate symbol from the simulated tapes on M1 's tape.
Perform the necessary state transitions and move the tape head accordingly.
After each step, the Turing machine moves to the next configuration by updating the tape and head positions,
simulating all the operations of the multitape machine.
Conclusion:
A single-tape Turing Machine can simulate a multitape Turing Machine by encoding multiple tapes on one tape and
simulating the transitions. The number of steps required for the simulation can be bounded by a polynomial function of the
number of tapes, meaning the computational power remains the same, though the single-tape machine may take more
time.
98/145
The time complexity of simulating a multitape Turing Machine using a single-tape Turing Machine is an important
consideration. When simulating a k-tape Turing Machine using a single-tape Turing Machine, we must account for the
additional time required for accessing and updating the information on multiple tapes.
Formal Construction:
Suppose Mk is a k-tape Turing Machine that runs in Tk (n) time, where n is the size of the input.
A one-tape Turing Machine M1 that simulates Mk may take T1 (n) time, where:
This quadratic increase in time is due to the need for simulating multiple tape heads on a single tape. Each step of the
multitape machine may require additional steps for managing the tape head positions and for simulating the
movement on different tapes.
Specifically, the simulation of a move on a single tape requires moving the head back and forth over the interleaved
tape configuration and reading and writing to multiple parts of the tape. Therefore, the time complexity is at most
quadratic in the number of tapes.
Example:
Consider a 2-tape Turing Machine that adds two binary numbers. The time complexity of this machine is T2 (n), where n is
the size of the input. A single-tape Turing Machine simulating the 2-tape machine might require O(n2 ) time due to the
need to traverse the interleaved configuration of both tapes.
Formal Definition:
where P(S) denotes the power set of S , meaning that at each step, the machine may transition to several possible
configurations.
Example:
Consider a Non-deterministic Turing Machine that decides whether a binary string has an even number of 1's. The NDTM
can:
This type of machine allows for parallel exploration of possible computational paths. If any path leads to an accept state, the
machine accepts the input.
99/145
Significance:
NDTMs are more powerful than DTMs in terms of the time complexity for solving certain problems. For example, the
nondeterministic polynomial time (NP) complexity class involves problems that can be solved by an NDTM in polynomial
time. However, it is important to note that the class of languages recognized by NDTMs (the class NP) is equivalent to the
class recognized by DTMs (P) in the context of deterministic simulations.
Summary
In this lecture, we explored several important extensions to the basic Turing Machine model:
1. Multitape Turing Machines: These machines provide multiple tapes and heads, simplifying computations. We showed
that multitape TMs are not more powerful than single-tape TMs in terms of the class of languages they can recognize.
2. Equivalence of One-Tape and Multitape TMs: We proved that a single-tape Turing Machine can simulate a multitape
Turing Machine with a quadratic increase in time complexity.
3. Non-Deterministic Turing Machines: NDTMs allow for multiple possible transitions at each step, enabling them to
explore multiple computational paths simultaneously. We briefly discussed the power and implications of non-
determinism in computational complexity theory.
2. Multistack Machines
3. Counter Machines
Each section will be detailed with formal definitions, examples, and proofs where applicable.
Formal Definition:
where:
100/145
Γ is the tape alphabet, which includes the blank symbol,
δ is the transition function, which now considers the head moving only to the left or right but with the restriction that
the tape is infinite in one direction,
In this model, the tape is infinite to the right but has a fixed left endpoint. That is, the tape starts at a certain position
(usually represented as the leftmost cell), but the head can only move to the right infinitely.
Example:
Imagine a semi-infinite Turing Machine that operates on a binary string. The machine can move to the right across the
string, but if it attempts to move left beyond the start of the string, it is prevented by the fixed boundary. The behavior of
this machine is similar to that of a one-way machine, but it allows the machine to process input from the left without losing
the ability to process infinitely in one direction.
A semi-infinite Turing Machine is equivalent in computational power to a standard Turing Machine. The restriction on the
tape does not reduce its ability to compute the same class of languages (i.e., recursively enumerable languages). The key
difference is that the machine may not be able to move freely in both directions, which could influence the efficiency of
certain algorithms.
2. Multistack Machines
A Multistack Machine is a restricted form of Turing Machine where the tape is replaced by several stacks, each with a single
head that can move along the stack. These stacks operate as LIFO (Last In, First Out) data structures. In a k-stack machine,
there are k stacks and the machine can perform operations on any of them, such as pushing or popping symbols, in
addition to moving between the stacks.
Formal Definition:
where:
In this machine, the stacks are represented as S1 , S2 , … , Sk , each of which has its own head and can perform operations
such as pushing a symbol onto the stack, popping a symbol from the stack, or reading the top of the stack. The heads of the
stacks can move independently of one another.
101/145
Example:
A 2-stack machine can be used to recognize the language of palindromes. One stack might store the first half of the string,
while the second stack stores the reverse of the string. As the machine moves through the input, it will pop elements from
the second stack and compare them to the symbols from the first half of the string to determine if the string is a
palindrome.
A 2-stack machine is equivalent to a Turing Machine in terms of computational power. This is a result of the fact that the
two stacks allow the machine to simulate the movement of a tape head in both directions, enabling it to recognize any
recursively enumerable language. However, a single stack is not as powerful; a single stack machine can only recognize a
strict subset of context-free languages.
3. Counter Machines
A Counter Machine is a restricted form of computation model that uses a finite number of counters instead of a tape. The
counters can hold non-negative integers, and the machine can increment, decrement, or test for zero on the counters.
Counter Machines are simpler than Turing Machines but are still useful for certain types of problems.
Formal Definition:
where:
The machine operates by reading an input, manipulating its counters based on the current state and input symbol, and
transitioning to new states. It may stop if it reaches the accept or reject states.
Example:
A counter machine with two counters can be used to recognize the language {an bn ∣n ≥ 0}. The machine will increment the
first counter for each a and decrement the second counter for each b. If both counters reach zero at the same time, the
input string is accepted; otherwise, it is rejected.
A single counter machine is not as powerful as a Turing Machine. Counter machines are limited in that they can only
recognize context-free languages or certain counting problems (languages that require counting but not arbitrary
102/145
manipulation of symbols). However, if the counter machine is extended to have multiple counters, it can simulate a Turing
Machine.
Formal Definition:
A counting machine is similar to a counter machine but may include additional operations such as:
Threshold operations: The machine can test whether the count reaches certain thresholds.
A Counting Machine can recognize certain types of languages that are beyond the capabilities of regular Turing Machines,
particularly in problems related to counting, such as counting primes or determining if a number is even or odd. These
machines are useful in situations where the computation primarily involves counting occurrences or occurrences modulo
certain numbers.
Counting Machines are more powerful than regular Counter Machines but still not as powerful as Turing Machines in terms
of general-purpose computation.
Summary
In this lecture, we explored various restricted Turing Machines:
1. Turing Machines with Semi-Infinite Tapes: These machines have a tape that is infinite in one direction, and we showed
that they are equivalent in computational power to standard Turing Machines.
2. Multistack Machines: Machines that use multiple stacks as the computational medium, and we showed that a 2-stack
machine is equivalent in power to a Turing Machine.
3. Counter Machines: Machines that use counters instead of tapes, with operations like increment, decrement, and zero-
check. We discussed their ability to recognize context-free languages and the limitations of a single-counter machine.
4. Counting Machines: A more powerful class of machines that deal with counting operations and thresholds. These
machines have applications in problems related to counting but are still less powerful than full Turing Machines.
These restricted models offer insights into computation within various constraints, helping us understand the relationship
between different computational models.
103/145
2. Simulating a Computer by a Turing Machine
We will discuss the theoretical and practical aspects of computation, comparing the abstract Turing Machine model with the
physical models of real-world computers. This discussion will involve formal descriptions, examples, and a focus on
computational complexity.
Formal Simulation:
To simulate a Turing Machine on a computer, we first need to encode the following elements:
Tape: A computer’s memory (e.g., an array or a dynamic list) can serve as the tape. The tape of a Turing Machine is
infinite in one direction, but the computer memory is finite. However, for any computation that a Turing Machine can
perform, a computer can simulate a tape of sufficient size.
Head: A Turing Machine's head can move left or right across the tape, performing read and write operations. On a
computer, this is represented by a pointer or an index that moves across the memory array.
States: The current state of the Turing Machine can be mapped to a variable in the computer’s memory.
Transition Function: The transition function δ of the Turing Machine is implemented as a set of conditional statements
(e.g., if-else or switch-case ).
Example:
Consider a Turing Machine that adds two binary numbers. The algorithm can be translated into a sequence of operations on
a computer:
1. Input Representation: The input numbers are stored in an array (acting as the tape).
2. Head Movement: The computer maintains an index or pointer to simulate the head’s movement across the tape.
3. State Transitions: The current state of the Turing Machine (such as whether it is in the "start", "add", or "halt" state) can
be tracked by a simple variable.
While the Turing Machine operates in a theoretical, infinite space, the computer's memory is finite. Nonetheless, with
sufficient memory and proper encoding, a computer can simulate any Turing Machine’s operation.
A Turing Machine that adds two binary numbers a = 1102 and b = 1012 might work as follows:
Compare the digits and write the sum, moving left if necessary.
104/145
2. A loop to simulate the head moving from right to left.
Thus, a computer can simulate a Turing Machine by mimicking these operations and managing memory accordingly.
Formal Simulation:
When simulating a real-world computer on a Turing Machine, we need to break down the computer’s operations into the
fundamental operations of the Turing Machine. These operations include:
Memory Access: The Turing Machine uses its tape to simulate the computer’s memory. A memory location can be
represented as a cell on the tape.
Arithmetic Operations: A Turing Machine can perform arithmetic by simulating addition, subtraction, and other
operations using its transition rules. For example, adding two numbers on a Turing Machine can be implemented by a
series of state transitions that simulate carrying over digits.
Control Flow: A Turing Machine can simulate a computer's program counter by transitioning between states in a
predefined sequence, based on input symbols.
The key insight here is that any computation performed by a computer can be reduced to a sequence of Turing Machine
operations.
Example:
Consider a simple computer program that increments a variable x by 1. The computer will perform:
3. Transition between states to represent the storing of the incremented value back into memory.
Equivalence:
Since a Turing Machine can simulate a computer, the class of problems solvable by a computer is equivalent to the class of
problems solvable by a Turing Machine. Both are capable of solving exactly the same problems (i.e., they can compute the
same set of functions).
105/145
3. Comparing the Running Times
In this section, we compare the running times of Turing Machines and real-world computers. While Turing Machines are
theoretically universal, real-world computers are constrained by physical factors like memory, processor speed, and
input/output operations.
For a given Turing Machine, the running time of an algorithm is typically described in terms of the number of steps the
machine takes to complete the computation. This is usually measured by:
For a Turing Machine, the time complexity is often expressed as a function of the input size n. The time complexity is
denoted as O(f (n)), where f (n) is the number of steps taken relative to the input size.
In real-world computers, the running time of an algorithm is typically expressed in terms of:
While a real computer is a finite machine with bounded resources, it is widely believed that any algorithm solvable by a
Turing Machine can be simulated by a computer with an equivalent running time, though it may require more resources
in practice.
In a Turing Machine simulation of binary addition, the number of steps is proportional to the number of bits in the
input. If the inputs have size n, the Turing Machine might require O(n) steps to add two binary numbers.
On a real computer, the addition operation is executed in constant time, assuming that the binary numbers are
represented in fixed-size memory cells. This is effectively O(1) for the addition, although memory operations could
make the total running time depend on the size of the input.
Turing Machines operate in a theoretical, infinite model, so the computation time is measured in terms of the number
of steps taken. While this is useful for understanding the inherent computational complexity, real-world computers face
practical limitations like memory size and processor speed.
A real-world computer executes operations based on its hardware architecture and is subject to finite memory and
processing power, meaning its actual performance can differ from that of a Turing Machine.
Summary
1. Simulating a Turing Machine by a Computer: A real-world computer can simulate a Turing Machine by encoding the
tape and transition functions within its memory, moving the head across the tape, and transitioning between states.
2. Simulating a Computer by a Turing Machine: A Turing Machine can simulate the operations of a computer by using its
tape for memory and simulating arithmetic and control flow with state transitions.
106/145
3. Comparing Running Times: While both Turing Machines and computers solve the same class of problems, their
running times can differ. Turing Machines are abstract and have no physical limitations, while real-world computers are
subject to memory, speed, and I/O constraints.
In conclusion, Turing Machines provide a theoretical model for computation that is equivalent in power to real-world
computers, but the practical efficiency and performance of these machines can vary. The study of running times in both
models helps us understand the complexities of computational problems and the limits of what can be computed.
0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111, …
The process of enumerating binary strings involves assigning each string a unique natural number n in a sequence:
0 → 1st string
1 → 2nd string
00 → 3rd string
01 → 4th string
10 → 5th string
11 → 6th string
and so on.
This numbering provides a method to index the strings in Σ∗ , which is essential when we attempt to reason about the
computability of certain languages. Each binary string can be identified by its index, allowing us to enumerate all strings in a
sequential manner.
107/145
M = (Q, Σ, Γ, δ, q0 , qaccept , qreject )
where:
Turing Machines are abstract machines, but for any given Turing Machine, we can represent it as a binary string. This string
acts as a code for the machine. Essentially, every Turing Machine can be encoded into a unique binary string, and thus, we
can also enumerate the set of all possible Turing Machines in the same way we enumerated binary strings.
A universal Turing machine can simulate any Turing Machine by reading its description encoded as a binary string. Thus,
for any Turing Machine M , we can associate it with a unique binary string wM that encodes the machine. The set of all
Turing Machine codes is also countable because it corresponds to the set of binary strings.
Now, let us consider the language of Turing Machines. We are interested in a specific language L, which may or may not
be recursively enumerable. Our goal is to identify whether certain languages are RE or not, using Turing Machines as the
fundamental model of computation.
Ld = {wi ∣ wi ∈
/ L(Mi )}
where wi is the i-th binary string in the enumeration of all binary strings, and L(Mi ) is the language recognized by the i-th
Turing Machine Mi .
L(Mi ) is the language of the i-th Turing Machine, i.e., the set of strings that the Turing Machine Mi accepts.
The language Ld consists of those strings wi such that wi is not accepted by the Turing Machine Mi . In other words, Ld
includes the diagonal elements where a string does not appear in the language of its corresponding Turing Machine.
Diagonalization is a technique used to construct languages that cannot be decided by any Turing Machine. The idea is to
exploit the enumeration of Turing Machines and their corresponding languages to create a language that differs from every
L(Mi ) by at least one string (the diagonal element).
By the diagonalization process, we ensure that for each i, the string wi is specifically chosen to not belong to L(Mi ). This is
a form of contradiction since, if Ld were recursively enumerable, there would be some Turing Machine that accepts exactly
108/145
4. Proof that Ld is not Recursively Enumerable
We now prove that the language Ld is not recursively enumerable. The proof uses the method of reduction and
Suppose that Ld is recursively enumerable. This means there exists a Turing Machine Md that accepts exactly the strings in
Constructing a contradiction:
1. Enumerate all binary strings w1 , w2 , w3 , …, and for each i, associate it with a Turing Machine Mi .
In either case, Ld would depend on the relationship between wi and L(Mi ). However, since Ld is constructed using the
diagonalization argument, no Turing Machine can recognize Ld because any such machine would lead to a paradoxical
situation.
Thus, there is no Turing Machine that can accept exactly the strings in Ld , meaning that Ld is not recursively enumerable.
Conclusion
In summary, we have constructed a language Ld using diagonalization and proved that it is not recursively enumerable.
The key idea is that Ld is defined to be different from every language L(Mi ) by at least one string, ensuring that it cannot
be accepted by any Turing Machine. This result demonstrates the limitations of the class of recursively enumerable
languages, highlighting the existence of languages that are fundamentally beyond the reach of Turing Machines.
1. Recursive Languages
1. Recursive Languages
109/145
A recursive language is a language that can be decided by a Turing Machine. In other words, there exists a Turing Machine
M that halts for every input and correctly decides whether any given string belongs to the language or not. More formally:
L is recursive if ∃M such that M halts on every input and M accepts strings in L and rejects strings not in L.
For example, the set of even-length strings over a binary alphabet Σ = {0, 1} is a recursive language because we can
construct a Turing Machine that counts the length of the string and halts, accepting if the length is even and rejecting if the
length is odd.
The class of recursive languages is decidable because a Turing Machine can determine membership in the language by
halting for all inputs. This implies that every recursive language is also a decidable problem.
If L is a recursive language, its complement L is also recursive. This is because if a Turing Machine can decide
membership in L, we can construct another Turing Machine that decides membership in L by reversing the accept and
reject states of the original machine.
Complement of RE Languages:
If L is recursively enumerable (RE), L is not necessarily RE. This means that while we may have a Turing Machine that
can enumerate the strings in L, there is no guarantee that we can enumerate the strings in L, or that there exists a
Turing Machine that halts for all inputs when checking membership in L.
as:
In essence, LU is the language of all Turing Machine computations: it consists of all pairs of Turing Machines and strings
such that the Turing Machine accepts the string. This language is recursively enumerable because we can construct a
Turing Machine that, given a pair ⟨M , w⟩, simulates M on input w and accepts if M accepts w .
However, while LU is RE, it is not recursive. This is because it’s related to the halting problem, which is undecidable. We can
LU is not recursive because we cannot decide for every ⟨M , w⟩ whether M halts on w (the halting problem is
undecidable).
110/145
Suppose, for contradiction, that there exists a Turing Machine MU that decides LU . This means that MU takes an input
Now, let’s use MU to decide the halting problem, which is known to be undecidable. Consider the following procedure:
If M halts on w , then M ′ will accept all inputs, including x, and MU will accept ⟨M ′ , x⟩.
If M does not halt on w , then M ′ will never halt on any input, and MU will reject ⟨M ′ , x⟩.
Thus, using MU to decide whether M ′ accepts x is equivalent to solving the halting problem for M on w . Since the halting
Conclusion
In this lecture, we have explored the concept of recursive languages, recursively enumerable (RE) languages, and their
complements. We examined the universal language, which is RE but not recursive, and proved that it is undecidable. The
key takeaway is that while some problems can be recognized by a Turing Machine (i.e., they are RE), they may still be
undecidable, meaning that no Turing Machine can decide them for all inputs. The universal language LU serves as a prime
Outline:
1. Reductions
1. Reductions
111/145
A reduction is a technique used to prove undecidability by showing that if we could decide one problem, we could use that
solution to decide another problem that is known to be undecidable. This is one of the most powerful methods in
computability theory and complexity theory.
Definition of Reductions:
If we want to show that a problem P1 is undecidable, we reduce an already-known undecidable problem P2 to P1 . The idea
is that if we could solve P1 , we could also solve P2 , but since P2 is undecidable, P1 must also be undecidable.
A many-one reduction from a problem P2 to a problem P1 is a function f such that for any input x of P2 :
Suppose we want to prove that the emptiness problem is undecidable. We will reduce the Halting Problem to the
emptiness problem.
The Halting Problem asks whether a given Turing Machine M halts on an input w .
The Emptiness Problem asks whether a given Turing Machine M ′ accepts no strings, i.e., the language L(M ′ ) is
empty.
Given an instance of the Halting Problem, which is ⟨M , w⟩, construct a new Turing Machine M ′ such that:
M ′ behaves as follows:
By checking whether L(M ′ ) is empty, we can determine whether M halts on w . Therefore, since the Halting Problem is
undecidable, the Emptiness Problem is also undecidable.
Given a Turing Machine M , decide whether L(M ) = ∅, i.e., whether M accepts no strings.
Proof of Undecidability:
To prove that this problem is undecidable, we can reduce the Halting Problem to this problem. We know that the Halting
Problem is undecidable, so we will use it to show that determining whether a Turing Machine accepts the empty language is
also undecidable.
Given an instance of the Halting Problem ⟨M , w⟩, construct a new Turing Machine M ′ such that:
112/145
M ′ behaves as follows:
If M halts on w , M ′ accepts x.
Let P be a property of languages. Then the problem of deciding whether a given Turing Machine M accepts a language
L(M ) that has property P is undecidable if P is a non-trivial property.
Non-trivial property means that there exists some Turing Machine whose language has property P , and some Turing
Machine whose language does not have property P .
Consider the problem of determining whether a Turing Machine M accepts a non-empty language, i.e., L(M ) ∅. This is
=
a non-trivial property because:
There exists a Turing Machine that accepts the empty language (e.g., one that rejects all inputs).
There exists a Turing Machine that accepts a non-empty language (e.g., one that accepts a specific string).
According to Rice’s Theorem, this problem is undecidable because it is a non-trivial property of the language of the Turing
Machine.
1. The Halting Problem: Determining whether a given Turing Machine M halts on a given input w is undecidable.
2. The Equivalence Problem: Given two Turing Machines M1 and M2 , determining whether L(M1 )
= L(M2 ) is
undecidable.
3. The Universality Problem: Given a Turing Machine M , determining whether L(M ) = Σ∗ , i.e., whether M accepts all
possible strings, is undecidable.
113/145
Proof of Undecidability (Equivalence Problem):
The equivalence problem asks whether two Turing Machines M1 and M2 recognize the same language, i.e., L(M1 )
=
L(M2 ). We can prove this is undecidable by reducing the Halting Problem to it:
Given an instance ⟨M , w⟩ of the Halting Problem, construct two Turing Machines M1 and M2 as follows:
If M1 and M2 are equivalent, then M must halt on w . If they are not equivalent, then M does not halt on w .
Thus, by checking whether M1 and M2 are equivalent, we can decide the Halting Problem, which is undecidable. Therefore,
Conclusion
In this lecture, we explored several important undecidable problems related to Turing Machines. We used reductions to
prove undecidability, explored the problem of determining whether a Turing Machine accepts the empty language, and
discussed Rice's Theorem as a tool for understanding undecidability in terms of properties of languages. We also covered
various undecidable problems related to Turing-Machine specifications. These results underscore the inherent limitations
of computation and the impossibility of deciding certain properties of Turing Machines.
Outline:
1. Definition of Post's Correspondence Problem (PCP)
{w1 , w2 , … , wn } (Set 1)
{v1 , v2 , … , vn } (Set 2)
114/145
The task is to determine whether there exists a sequence of indices i1 , i2 , … , ik such that:
w i 1 w i 2 … w i k = vi 1 vi 2 … vi k
In other words, the strings from the first set {w1 , w2 , … , wn } can be concatenated to form a string that is exactly the
same as the string formed by concatenating the strings from the second set {v1 , v2 , … , vn }, using the same indices in
both sequences.
Example 1:
w i 1 w i 2 … w i k = vi 1 vi 2 … vi k
i1 = 1, i2 = 2, i3 = 3
w1 w2 w3 = "ab""b""ba" = "abba"
v1 v2 v3 = "b""a""ab" = "abba"
Since the two concatenated strings are identical, this is a solution to the problem.
Modified Definition:
We are asked to find whether there exists a sequence of indices i1 , i2 , … , ik , such that:
w i 1 w i 2 … w i k = vi 1 vi 2 … vi k
where the sequence {i1 , i2 , … , ik } is chosen from the indices of the strings in both sets, but the set sizes m and p may
differ.
Example 2:
Let:
115/145
V = {v1 = "ab", v2 = "ba"}
We are tasked with determining whether there is a sequence of indices such that the concatenation of strings from W
matches the concatenation of strings from V . In this case, no such sequence exists because, regardless of the combination
of indices chosen, the concatenated results from W and V will not match.
We will reduce the Halting Problem to PCP. The Halting Problem asks whether a given Turing Machine M halts on an input
w. We can reduce this problem to PCP as follows:
Given an instance of the Halting Problem ⟨M , w⟩, we construct two sets W and V such that solving the corresponding
PCP will tell us whether M halts on w .
Let M be a Turing Machine and w an input. The construction is based on simulating M on w and creating strings that will
match if M halts.
1. Set W :
W = {w1 , w2 }, where:
w2 = "b" (This is another placeholder symbol for the Turing Machine to consume.)
2. Set V :
V = {v1 , v2 }, where:
v1 = "a"
v2 = "ab"
The idea here is that the strings w1 , w2 simulate the transition of the Turing Machine M on the input w . Specifically:
w1 represents the transition from the initial state of the Turing Machine to the state where it reads the first symbol of
the input w .
Now, the PCP instance asks whether we can find a sequence of indices such that the concatenation of the strings from W
matches the concatenation of the strings from V .
Key Idea:
If M halts on w , we can find such a sequence because the Turing Machine will eventually enter a halting state, and the
corresponding strings will match.
If M does not halt, no such sequence exists because the Turing Machine will keep running indefinitely, and the two
concatenated strings will never match.
116/145
Thus, by solving this PCP, we can determine whether M halts on w . Since the Halting Problem is undecidable, this reduction
implies that the PCP is also undecidable.
Conclusion
In this lecture, we introduced Post's Correspondence Problem (PCP), defined it formally, and discussed a modified version
of the problem. We then proved the undecidability of PCP through a reduction from the Halting Problem. The
undecidability of PCP highlights the complexity and limitations of computational problems, underscoring the challenges in
decision-making for Turing Machines and their behavior.
Outline:
1. Problems about Programs
Given a program P and an input w , the task is to determine whether P does not halt when run on input w . This problem is
known as the Non-Halting Problem, and it is undecidable.
Proof of Undecidability:
To prove the undecidability of the Non-Halting Problem, we will reduce the Halting Problem to it. Recall that the Halting
Problem is defined as follows: given a program P and an input w , determine if P halts on input w .
The Non-Halting Problem asks if a program P does not halt on input w . We can reduce the Halting Problem to the Non-
Halting Problem by negating the solution. Specifically:
Given an instance ⟨P , w⟩ of the Halting Problem, we construct a new program P ′ that behaves as follows:
117/145
Now, solving the Non-Halting Problem on P ′ and w gives us the negation of the solution to the Halting Problem:
Thus, by solving the Non-Halting Problem for P ′ and w , we can solve the Halting Problem. Since the Halting Problem is
undecidable, the Non-Halting Problem is also undecidable.
Given a context-free grammar G and a string w , the task is to determine whether w has more than one leftmost derivation
(or equivalently, more than one parse tree) in G. This problem is known as the Ambiguity Problem for CFGs.
Proof of Undecidability:
We will prove the undecidability of the Ambiguity Problem by reducing the Post Correspondence Problem (PCP) to it. Recall
that PCP is undecidable, so if we can reduce PCP to the Ambiguity Problem, we will show that the Ambiguity Problem is
undecidable.
We will construct a context-free grammar G that generates a string w if and only if the PCP instance has a solution.
1. PCP Instance:
Let W = {w1 , w2 , … , wn } and V = {v1 , v2 , … , vn } be two sets of strings, as defined in the PCP.
The task is to find whether there exists a sequence of indices i1 , i2 , … , ik such that:
w i 1 w i 2 … w i k = vi 1 vi 2 … vi k
2. Grammar Construction: Construct a context-free grammar G that generates a string w if and only if there exists a
solution to the PCP instance. The grammar will generate strings of the form:
and vi 1 vi 2 … vi k
… wik equals vi1 vi2 … vik .
3. Ambiguity: If the PCP instance has a solution, the grammar G will generate a string with more than one derivation, as it
can derive the same string from two different sequences of rules corresponding to the matching pairs of wi and vi .
By solving the Ambiguity Problem for this grammar G, we can solve the PCP. Since PCP is undecidable, the Ambiguity
Problem for CFGs is also undecidable.
118/145
Problem: The Complement of a List Language
Given a list language L defined by a finite set of strings {w1 , w2 , … , wn }, the task is to determine whether the
complement of L is a regular language. In other words, we want to determine if the complement of the language
consisting of all concatenations of strings from {w1 , w2 , … , wn } is a regular language.
Proof of Undecidability:
We will prove that this problem is undecidable by reducing from the Halting Problem.
1. Halting Problem: The Halting Problem asks whether a given Turing Machine M halts on a specific input w .
2. Reduction: Given an instance ⟨M , w⟩ of the Halting Problem, we will construct a list language L such that:
If M halts on w , let L be the set of all strings formed by concatenating strings from the set {w1 , w2 , … , wn }.
If M does not halt on w , let L be the set of strings that cannot be formed by concatenating strings from
{w1 , w2 , … , wn }.
Now, the complement of L will be regular if and only if M halts on w . This construction reduces the Halting Problem to the
problem of determining whether the complement of a list language is regular, proving that the problem is undecidable.
Conclusion
In this lecture, we discussed three important undecidable problems:
1. Problems about Programs: We demonstrated the undecidability of the Non-Halting Problem by reducing it from the
Halting Problem.
2. Undecidability of Ambiguity for Context-Free Grammars (CFGs): We showed the undecidability of the Ambiguity
Problem by reducing from the Post Correspondence Problem (PCP).
3. The Complement of a List Language: We proved the undecidability of determining whether the complement of a list
language is regular by reducing from the Halting Problem.
These problems highlight the inherent limitations in computational theory and underscore the challenges in determining
properties of programs and formal languages.
Outline:
1. Problems Solvable in Polynomial Time (Class P)
119/145
3. Non-Deterministic Polynomial Time (Class NP)
A problem A is in P if there exists a deterministic Turing machine M such that for all inputs w of size n, M (w) halts and
gives the correct answer (yes or no) in O(nk ) time for some constant k .
Problem Definition:
Given a graph G = (V , E), where V is the set of vertices and E is the set of edges with associated weights, the task is to
find the minimum spanning tree of the graph.
If the edge connects two vertices in different trees, add it to the MST.
The algorithm ensures the inclusion of edges with the smallest weight that do not form a cycle, which guarantees that
the MST is formed.
Sorting the edges takes O(E log E), and the union-find operations (for detecting cycles) take O(α(V )), where α is the
inverse Ackermann function. Therefore, the overall time complexity of Kruskal's algorithm is dominated by O(E log E),
which is polynomial in the size of the input.
Since Kruskal’s algorithm runs in polynomial time, it is an example of a problem solvable in class P .
120/145
Class NP consists of decision problems for which a proposed solution can be verified in polynomial time by a deterministic
Turing machine. More formally, a problem is in NP if, for every "yes" instance of the problem, there exists a certificate (a
proposed solution) that can be verified in polynomial time.
A problem A is in NP if there exists a nondeterministic Turing machine M such that for every input w of size n:
If the answer is "yes," then there exists a certificate c (of size polynomial in n) such that M (w, c) accepts in polynomial
time.
In simpler terms, if the solution exists, a verifier can check it in polynomial time.
Problem Definition:
Given a set of cities and the distances between each pair of cities, is there a tour that visits every city exactly once and
returns to the starting city such that the total length of the tour is at most k ?
To verify a proposed solution (a sequence of cities representing the tour), we can check:
Both of these checks can be done in polynomial time, so the TSP decision problem is in NP .
The best known algorithms for solving TSP in its optimization form (finding the shortest path) are not polynomial-time
algorithms; they typically take factorial time O(n!). Therefore, TSP is NP-hard, but as a decision problem (whether there
exists a tour of length at most k ), it is in NP .
that transforms any instance of problem A into an instance of problem B such that the answer to the transformed instance
of B is the same as the answer to the original instance of A.
Polynomial-time reductions are often used to show that one problem is at least as hard as another, and they play a key role
in the theory of NP-completeness.
121/145
6. NP-Complete Problems (Introductory Definition)
A problem is said to be NP-complete if it satisfies two conditions:
2. Every other problem in NP can be reduced to it in polynomial time. In other words, if a polynomial-time algorithm
exists for an NP-complete problem, then every problem in NP can also be solved in polynomial time.
1. A ∈ NP .
2. For every problem B ∈ NP , B ≤p A (i.e., B can be reduced to A in polynomial time).
The concept of NP-completeness was introduced by Stephen Cook in 1971 with Cook’s Theorem, which showed that the
Boolean satisfiability problem (SAT) is NP-complete. Many other problems, such as the Traveling Salesman Problem, the
Knapsack Problem, and the Hamiltonian Cycle Problem, have been shown to be NP-complete.
Conclusion
In this lecture, we covered the following topics related to the complexity classes P and NP :
1. Class P: Problems solvable in polynomial time. Example: Kruskal's Algorithm for finding a minimum spanning tree.
2. Class NP: Problems where a proposed solution can be verified in polynomial time. Example: The Traveling Salesman
Problem.
3. Polynomial Time Reductions: The concept of transforming one problem into another in polynomial time.
4. NP-Complete Problems: Problems that are both in NP and as hard as any other problem in NP.
The study of P and NP is foundational in understanding the computational complexity of problems and has profound
implications for fields such as cryptography, optimization, and algorithm design. The central question of whether P = NP
remains one of the most important unsolved problems in computer science.
1/2
Outline:
1. The Satisfiability Problem (SAT)
3. NP-Completeness of SAT
122/145
4. Proof of Cook's Theorem (Formal Proof)
Definition of SAT:
A Boolean formula is a logical expression involving variables, logical operators (AND, OR, NOT), and constants (True and
False). The variables in the formula can be assigned truth values (True or False), and the goal is to determine if there is an
assignment of these truth values that makes the formula evaluate to True.
A formula is in conjunctive normal form (CNF) if it is a conjunction (AND) of clauses, where each clause is a disjunction (OR)
of literals, and a literal is either a variable or its negation.
SAT Problem: Given a Boolean formula F in CNF, determine whether there exists a truth assignment to the variables of F
that makes F evaluate to True.
Example:
Consider the formula:
We want to determine if there exists a truth assignment to x1 , x2 , x3 that makes this formula True.
Variables: Denoted as x1 , x2 , … , xn , where each variable can take a truth value (True or False).
Clauses: Each clause is a disjunction (OR) of literals. A literal is a variable xi or its negation ¬xi .
Formula: A conjunction (AND) of such clauses. For example, the SAT instance F = (x1 ∨ ¬x2 ) ∧ (x2 ∨ x3 ) ∧ (¬x3 ∨
To check whether an assignment of truth values exists that satisfies the formula, one needs to evaluate the formula for
every possible assignment of the variables. Since the number of variables is n, there are 2n possible truth assignments,
which can be computationally expensive as n increases.
3. NP-Completeness of SAT
The SAT problem was the first problem proven to be NP-complete. This was done through Cook’s Theorem, which shows
that SAT is both in NP and NP-hard. To show that SAT is NP-complete, we must establish two things:
Given a Boolean formula and an assignment of truth values, it is easy to verify whether the assignment satisfies the
formula by evaluating the formula with the given truth values. This verification process takes polynomial time in the
123/145
size of the formula, so SAT is in NP.
The core of Cook’s Theorem is showing that any decision problem in NP can be transformed into a SAT instance in
polynomial time.
SAT is in NP.
Step 1: SAT is in NP
We have already established that SAT is in NP by showing that a proposed solution (a truth assignment) can be verified in
polynomial time.
To show that SAT is NP-hard, we need to prove that every problem in NP can be reduced to SAT in polynomial time. Cook’s
reduction focuses on arbitrary problems in NP, and he constructs a polynomial-time reduction from any nondeterministic
Turing machine (NDTM) computation to a SAT instance.
Concept: The key idea in Cook’s proof is that we can simulate the computation of a nondeterministic Turing machine
(NDTM) on any input string using a Boolean formula. The formula will encode the possible configurations of the NDTM, and
the satisfiability of the formula will correspond to whether there exists a sequence of valid transitions that leads to an
accepting state.
Let us consider a nondeterministic Turing machine M with input string w of length n. We need to simulate the computation
of M on input w . The computation consists of a sequence of configurations, where each configuration specifies:
Let qi be a Boolean variable that indicates whether the Turing machine is in state qi at a particular step.
Let ti,j represent the symbol in the j -th tape cell at the i-th step.
The goal is to create a formula that encodes the transitions of the Turing machine. This formula must be satisfiable if and
only if the NDTM accepts the input w . The formula will consist of:
1. Initial Configuration: A clause ensuring that the initial configuration is consistent with the given input.
2. Transition Constraints: Clauses that encode the valid transitions between configurations of the Turing machine.
3. Acceptance Conditions: A clause that ensures the NDTM enters an accepting state.
124/145
Thus, the Boolean formula will represent all possible configurations and transitions of the Turing machine. If there exists a
sequence of configurations leading from the initial state to an accepting state, the formula will be satisfiable.
Constructing the Boolean formula involves encoding the transitions of the NDTM and ensuring the satisfiability corresponds
to a valid computation. The size of the formula grows polynomially in the size of the input string w and the number of steps
the NDTM takes, making the construction of the formula polynomial in time.
Step 3: Conclusion
Since we can transform any NDTM computation into a SAT instance in polynomial time, and since we know that a solution to
SAT can be verified in polynomial time, we conclude that SAT is NP-complete.
Conclusion
In this lecture, we:
1. Discussed the Satisfiability Problem (SAT) and how it is represented in conjunctive normal form (CNF).
3. Presented Cook’s Theorem, which proves that SAT is NP-complete, by constructing a polynomial-time reduction from an
arbitrary NP problem (simulated by a nondeterministic Turing machine) to a SAT instance.
Cook’s Theorem is foundational in complexity theory, as it was the first to establish the concept of NP-completeness, leading
to the identification of many other NP-complete problems. The problem of determining whether P = NP remains one of
the most important open questions in computer science.
Outline:
1. Normal Forms for Boolean Expressions
3. NP-Completeness of CSAT
4. NP-Completeness of 3-SAT
125/145
To understand the restricted versions of SAT, we first need to discuss the concept of normal forms for Boolean expressions.
These normal forms allow us to represent logical formulas in a standardized and structured way, which is crucial for
analyzing their computational complexity.
A Boolean expression is in Conjunctive Normal Form (CNF) if it is a conjunction (AND) of clauses, where each clause is a
disjunction (OR) of literals. A literal is either a variable xi or its negation ¬xi .
Example of CNF:
In CNF, each clause is a disjunction of literals, and the entire expression is a conjunction of such clauses.
In contrast, a Boolean expression is in Disjunctive Normal Form (DNF) if it is a disjunction (OR) of conjunctions (AND) of
literals. DNF is not used directly in this lecture, but it is important to note the distinction.
Example of DNF:
(x1 ∧ x2 ) ∨ (¬x1 ∧ x3 )
There are other normal forms such as Negation Normal Form (NNF), which is a Boolean expression where negations only
appear directly in front of literals. These are not the focus of this lecture but serve as important concepts in logic and
complexity theory.
Implications (p → q ) and biconditionals (p ↔ q ) are not allowed in CNF. We replace them using the following equivalences:
p → q ≡ ¬p ∨ q
p ↔ q ≡ (p → q) ∧ (q → p)
Negations should be applied only to literals. To ensure this, we apply De Morgan’s laws:
¬(p ∧ q) ≡ ¬p ∨ ¬q
¬(p ∨ q) ≡ ¬p ∧ ¬q
We repeatedly push negations inward until they appear only in front of literals.
The next step is to apply the distributive law to get the formula into CNF:
(p ∨ (q ∧ r)) ≡ (p ∨ q) ∧ (p ∨ r)
126/145
Step 4: Simplify the Expression
Finally, we simplify the formula to remove any redundant terms, ensuring the formula is in valid CNF form. This step often
involves eliminating tautologies or redundant clauses.
1. Apply distributivity:
(x1 ∧ (x2 ∨ x3 )) ≡ (x1 ∧ x2 ) ∨ (x1 ∧ x3 )
3. NP-Completeness of CSAT
We now introduce the restricted version of SAT called CSAT (Conjunctive Satisfiability), where we restrict the formula to be
in conjunctive normal form (CNF). Specifically, CSAT is the problem of determining whether a given CNF formula has a
satisfying assignment.
Definition of CSAT:
Given a Boolean formula F in CNF, determine whether there exists a truth assignment to the variables that satisfies the
formula.
is a CSAT instance. The task is to find a truth assignment to x1 , x2 , x3 that makes the formula true.
CSAT is in NP: Given a Boolean formula in CNF and a proposed truth assignment, we can verify if the assignment
satisfies the formula in polynomial time by evaluating each clause.
CSAT is NP-hard: We reduce the general SAT problem (which is NP-complete) to CSAT in polynomial time. Since SAT can
be reduced to CSAT in polynomial time, CSAT is NP-hard.
4. NP-Completeness of 3-SAT
Next, we consider the problem of 3-SAT, a special case of SAT where each clause contains exactly three literals. The 3-SAT
problem is of particular importance because it was the first problem shown to be NP-complete through a polynomial-time
reduction from CSAT.
Definition of 3-SAT:
Given a Boolean formula in CNF where each clause contains exactly three literals, determine if there exists an assignment of
truth values to the variables that makes the formula true.
127/145
is a 3-SAT instance.
3-SAT is in NP: Given a truth assignment to the variables, we can easily check if all the clauses are satisfied, which takes
polynomial time.
3-SAT is NP-hard: To show that 3-SAT is NP-hard, we reduce CSAT to 3-SAT in polynomial time. This can be done by
transforming any CNF formula into an equivalent formula where each clause has exactly three literals. The
transformation process is as follows:
Step 1: If a clause has more than three literals, introduce new variables to break the clause into smaller clauses,
each with exactly three literals.
Step 2: If a clause has fewer than three literals, we introduce new dummy variables (such as x4 or ¬x4 ) to pad the
Conclusion
In this lecture, we:
1. Introduced normal forms for Boolean expressions, focusing on Conjunctive Normal Form (CNF).
4. Proved the NP-completeness of 3-SAT, demonstrating its NP-hardness through a polynomial-time reduction from CSAT.
The NP-completeness of 3-SAT is a foundational result in computational complexity, and it plays a central role in
understanding the difficulty of many other NP-complete problems.
Problem Definition
Formalisms
128/145
1. The Problem of Independent Sets
Problem Definition:
= (V , E), an independent set is a subset of vertices I ⊆ V such that no two vertices in I are adjacent.
Given a graph G
The Independent Set Problem is to determine whether there exists an independent set of size at least k in the graph.
Formalisms:
G = (V , E): An undirected graph where V is the set of vertices and E is the set of edges.
Independent Set: A subset I ⊆ V where for all pairs u, v ∈ I , (u, v) ∈
/ E.
Decision Problem: Given G and an integer k , is there an independent set of size at least k ?
NP-Completeness Proof:
Independent Set is in NP: Given a set I , we can verify in polynomial time if I is an independent set by checking for
each pair of vertices u, v ∈ I that (u, v) ∈
/ E.
Independent Set is NP-hard: We reduce Vertex Cover (an NP-complete problem) to Independent Set in polynomial
time. This is because:
Problem Definition:
Given a graph G = (V , E) and an integer k , the Node-Cover Problem asks whether there exists a set C ⊆ V of size at
most k such that every edge in G is incident to at least one vertex in C .
Formalisms:
NP-Completeness Proof:
Node-Cover is in NP: Given a set C , we can verify in polynomial time if C is a node cover by checking that each edge is
incident to at least one vertex in C .
Node-Cover is NP-hard: We reduce the Vertex Cover Problem to Node-Cover in polynomial time. This is because:
A vertex cover is simply a special case of a node cover where the size of the set is bounded by k .
Thus, if we can solve the Node-Cover Problem, we can solve the Vertex Cover Problem by checking whether a node
cover exists of size at most k .
129/145
Since Node-Cover is both in NP and NP-hard, it is NP-complete.
Problem Definition:
Given a directed graph G = (V , E), the Directed Hamiltonian Circuit Problem asks whether there exists a directed cycle
that visits every vertex in V exactly once.
Formalisms:
G = (V , E): A directed graph where each edge is directed from one vertex to another.
Hamiltonian Circuit: A cycle in the graph that visits every vertex exactly once and returns to the starting vertex.
NP-Completeness Proof:
Directed Hamiltonian Circuit is in NP: Given a cycle, we can verify in polynomial time if it visits every vertex exactly
once and returns to the starting vertex.
Directed Hamiltonian Circuit is NP-hard: We reduce the Hamiltonian Cycle Problem (for undirected graphs) to the
Directed Hamiltonian Circuit Problem in polynomial time. The reduction works by replacing each undirected edge (u, v)
in the Hamiltonian cycle problem with two directed edges (u, v) and (v, u) in the directed graph. This guarantees that
solving the directed version will solve the undirected version.
Problem Definition:
Undirected Hamiltonian Circuit Problem: Given an undirected graph G = (V , E), determine if there exists a cycle
that visits each vertex exactly once.
Traveling Salesman Problem (TSP): Given a complete graph G = (V , E) with weights on the edges, is there a tour
(Hamiltonian circuit) whose total weight is less than or equal to a specified bound B ?
Formalisms:
Undirected Hamiltonian Circuit: A cycle that visits every vertex in G exactly once and returns to the starting point.
TSP: A Hamiltonian cycle where the sum of the edge weights is at most B .
Undirected Hamiltonian Circuit is in NP: Given a cycle, we can verify in polynomial time if it is a Hamiltonian cycle.
Undirected Hamiltonian Circuit is NP-hard: We reduce the Hamiltonian Path Problem (a special case of Hamiltonian
Circuit) to the Undirected Hamiltonian Circuit Problem in polynomial time. This involves creating a cycle by adding an
edge between the start and end of a Hamiltonian path. This reduction proves NP-hardness.
130/145
TSP NP-Completeness:
The Traveling Salesman Problem is NP-complete as well. We can reduce the Hamiltonian Circuit Problem to TSP by setting
all edge weights to 1. If a solution to TSP exists with weight less than or equal to ∣V ∣, then it corresponds to a Hamiltonian
circuit.
Independent Set
Node-Cover
Hamiltonian Cycle reduces to Directed Hamiltonian Circuit and Undirected Hamiltonian Circuit.
Finally, as a key observation, all these NP-complete problems can be ultimately reduced to SAT, demonstrating that they all
belong to the same complexity class, NP-complete.
Conclusion:
In this lecture, we:
1. Defined and formally analyzed several additional NP-complete problems, including Independent Set, Node-Cover,
Directed Hamiltonian Circuit, and Undirected Hamiltonian Circuit.
3. Demonstrated how all these problems connect to each other and can be reduced to SAT, illustrating the
interconnectedness of NP-complete problems.
This comprehensive exploration of NP-complete problems enriches our understanding of the vast landscape of
computational complexity and how many different problems can be reduced to one another.
131/145
1. The Class of Languages Co-NP
Definition of Co-NP:
NP (Nondeterministic Polynomial time) is the class of decision problems for which a "yes" answer can be verified in
polynomial time. More formally, a language L belongs to NP if there exists a polynomial-time verifier for it, meaning
that for any string x in L, there exists a certificate (or witness) that can be verified in polynomial time.
Co-NP is defined as the class of languages whose complements are in NP. Specifically, a language L belongs to Co-NP if
and only if the complement of L, denoted L, belongs to NP. In other words, a language is in Co-NP if for every "no"
answer to a decision problem, there exists a polynomial-time verifier for that "no" answer.
Formally:
Co-NP = { L ∣ L ∈ NP }
This means that if there is a polynomial-time verifier for the complement of L, then L is in Co-NP.
Tautology Problem: Given a Boolean formula, determine whether the formula is true for all possible truth assignments.
This problem is in Co-NP because checking whether a formula is not a tautology (i.e., there exists an assignment that
makes the formula false) is an NP problem.
The Non-Emptiness Problem for CFLs: Given a context-free grammar G, determine whether the language generated
by G is non-empty. This problem is in Co-NP because checking whether a context-free grammar generates an empty
language can be done in NP by checking if there is no derivation to any terminal string.
NP vs Co-NP: The relationship between NP and Co-NP is one of the most fundamental open questions in computational
complexity. The key question is whether NP = Co-NP. If NP and Co-NP were equal, then every problem for which we can
verify a "yes" answer efficiently (NP problems) would also have a complement problem that can be verified efficiently
(Co-NP problems).
It is widely believed that NP ≠ Co-NP, although this has not been proven definitively. Most complexity theorists
conjecture that there are some problems that belong to NP but not to Co-NP, and vice versa.
NP-Complete Problems: These are problems that are both in NP and are at least as hard as any other NP problem. This
means that if we can solve an NP-complete problem in polynomial time, we can solve all NP problems in polynomial
time.
Examples: SAT, Independent Set, Traveling Salesman Problem, Hamiltonian Circuit, etc.
Co-NP-Complete Problems: These are problems that are co-NP-hard and also in Co-NP. In other words, these problems
are at least as hard as any other problem in Co-NP. If we could solve a Co-NP-complete problem in polynomial time, we
would prove that Co-NP = NP.
132/145
Tautology Problem (checking whether a Boolean formula is true for all possible inputs) is an example of a Co-NP-
complete problem.
While there is no direct equivalence between NP-complete problems and Co-NP problems, it is important to recognize that
for every NP problem, there is a corresponding Co-NP problem. For example:
SAT (NP-complete problem): Given a Boolean formula, is there an assignment of variables that makes the formula true?
TAUT (Co-NP-complete problem): Given a Boolean formula, is the formula true for all possible assignments? The
complement of SAT is TAUT because if a formula is not satisfiable (no assignment makes it true), then it is a tautology
(true for all assignments).
This shows that NP and Co-NP are conceptually related but distinct classes. The main distinction is that NP deals with
verifying "yes" answers, while Co-NP deals with verifying "no" answers.
A language L in Co-NP has a polynomial-time verifier for "no" instances of the problem.
Thus, the existence of a verifier for "no" instances of a problem corresponds to the problem being in Co-NP.
Polynomial-Time Completeness:
NP-complete problems are those problems to which every problem in NP can be reduced in polynomial time. Similarly,
Co-NP-complete problems are the hardest problems in Co-NP, to which every problem in Co-NP can be reduced in
polynomial time.
Key Observation:
There is no known polynomial-time reduction from an NP-complete problem to a Co-NP-complete problem or vice versa,
unless NP = Co-NP.
3. Conclusion
In this lecture, we:
Defined the class Co-NP, which consists of languages whose complements are in NP.
Explored the fundamental question of whether NP = Co-NP, noting that most complexity theorists believe they are
distinct but have not been proven so.
Discussed NP-complete problems and Co-NP-complete problems, highlighting how certain problems in NP have
complementary problems in Co-NP.
Examined examples like the Tautology Problem, which is Co-NP-complete, and compared them with NP-complete
problems like SAT.
This topic remains an essential part of computational complexity theory, with significant open questions, especially the one
about the relationship between NP and Co-NP.
133/145
In this lecture, we will delve into the class of problems solvable in polynomial space and explore the concept of Polynomial
Space Turing Machines (PS-TMs). We will examine the relationship of PS (polynomial space) and NPS (non-deterministic
polynomial space) with previously defined complexity classes, and distinguish between deterministic and non-
deterministic polynomial space.
A Turing machine (TM) operates on an infinite tape and can use an arbitrary amount of space to solve a problem, but we
impose a restriction on the amount of space it can use. Specifically, we are concerned with Turing machines that use at most
polynomial space with respect to the size of the input.
Polynomial Space (PS): A Turing machine is said to operate within polynomial space if the amount of tape it uses is
bounded by a polynomial function of the input size. This means that if the size of the input is n, the machine's tape
usage is bounded by O(nk ), where k is a constant.
Formally:
Let the input size be n. If a Turing machine uses space at most O(nk ) for some constant k , it is said to operate in
polynomial space. In terms of language recognition, the class of languages that can be recognized by such machines is
denoted PSPACE.
Quantified Boolean Formula (QBF): The problem of determining the truth value of a quantified Boolean formula,
where the variables in the formula are quantified using existential and universal quantifiers, is PSPACE-complete.
Generalized Reachability Problem: Given a graph and two nodes, determine if there is a path between the nodes that
satisfies certain conditions. This problem can be solved in polynomial space.
P (Polynomial Time): The class of problems that can be solved by a deterministic Turing machine in polynomial time.
This is a time-bound class.
NP (Nondeterministic Polynomial Time): The class of problems for which a solution can be verified in polynomial time
by a deterministic Turing machine, or equivalently, can be solved by a nondeterministic Turing machine in polynomial
time.
PSPACE (Polynomial Space): The class of problems that can be solved by a deterministic Turing machine in polynomial
space, irrespective of the time complexity.
NPSPACE (Nondeterministic Polynomial Space): The class of problems that can be solved by a nondeterministic Turing
machine in polynomial space.
Key Relationships:
134/145
1. PSPACE and P: It is known that P is a subset of PSPACE, i.e., P ⊆ PSPACE. This is because any problem that can be solved
in polynomial time can also be solved in polynomial space, as time complexity inherently places limits on the amount of
space required.
2. PSPACE and NP: The relationship between PSPACE and NP is more subtle. While NP problems may not necessarily
require polynomial space (they could use exponential space, for example), PSPACE includes problems that can require
exponential time but still remain within polynomial space. Thus, PSPACE is strictly larger than NP in general.
3. PSPACE and NPSPACE: A critical result in computational complexity theory is that PSPACE = NPSPACE, which is a
consequence of Savitch's Theorem. Savitch's theorem states that any problem solvable by a nondeterministic Turing
machine in polynomial space can also be solved by a deterministic Turing machine in polynomial space.
Savitch’s Theorem: For any function f (n) ≥ log n, the class of languages decidable by a nondeterministic Turing
machine in f (n)-space is the same as the class of languages decidable by a deterministic Turing machine in f (n)2 -
space.
Thus, nondeterminism does not add more computational power in terms of space when the space is polynomial,
making PSPACE = NPSPACE.
In deterministic polynomial space, a Turing machine makes a series of decisions based on its current state and input
symbol, and the computation is deterministic. The machine uses a polynomial amount of tape (space) to process the
input and produce an output.
The Quantified Boolean Formula (QBF) problem: Determining whether a given Boolean formula with quantifiers is
true or false is a PSPACE-complete problem.
Game Theory Problems: Certain problems in game theory, such as determining the winner in a generalized version
of the game of chess, can be solved in polynomial space.
In nondeterministic polynomial space, the Turing machine has the ability to make "guesses" at each step of the
computation, and then "verify" the correctness of those guesses. This allows the machine to explore multiple
computational paths simultaneously.
NPSPACE allows nondeterministic Turing machines to use polynomial space, but it is still limited by the space constraint.
Importantly, it has been proven that NPSPACE = PSPACE (by Savitch's Theorem).
Graph Reachability: Determining whether there is a path between two nodes in a graph, under certain conditions,
can be solved using NPSPACE.
Generalized Games: Some generalized game problems that involve multiple players and strategies can be
formulated as NPSPACE problems.
Savitch's theorem formalizes the equivalence of PSPACE and NPSPACE, providing a deep insight into how nondeterminism
can be simulated using deterministic algorithms in polynomial space:
135/145
Theorem: For any language decidable by a nondeterministic Turing machine in polynomial space, there exists a
deterministic Turing machine that can decide the same language in polynomial space.
This result shows that nondeterminism does not provide additional power when restricted to polynomial space.
4. Conclusion
In this lecture, we:
Introduced the concept of Polynomial Space Turing Machines (PS-TMs) and defined the complexity class PSPACE.
Discussed the relationship between PSPACE, NPSPACE, and previously defined classes like P and NP, emphasizing that
PSPACE is strictly more powerful than NP, but PSPACE = NPSPACE by Savitch's Theorem.
Explained the distinction between deterministic and nondeterministic polynomial space, emphasizing that
nondeterminism does not provide extra power in polynomial space.
The exploration of polynomial space helps in understanding the boundaries of tractability when only space usage is
constrained, rather than time. PSPACE contains many significant problems in computational complexity, including those
with exponential time complexity but polynomial space requirements.
1. PS-Completeness
Definition of PS-Completeness:
1. It is in PSPACE: The problem can be solved by a deterministic Turing machine using polynomial space. In other words,
the space required to solve the problem grows at most polynomially with the input size.
2. Every problem in PSPACE can be reduced to it in polynomial time: For any problem A in PSPACE, there exists a
polynomial-time reduction from A to the PS-complete problem P . This means that any PSPACE problem can be reduced
to P in polynomial time, showing that P is as "hard" as any other problem in PSPACE.
Thus, PS-complete problems serve as the hardest problems in PSPACE, and solving one PS-complete problem efficiently
would imply an efficient solution for all problems in PSPACE.
Definition of QBF:
136/145
A Quantified Boolean Formula (QBF) is an extension of the standard Boolean formulas. In a QBF, Boolean variables are
quantifiable, meaning they can be universally or existentially quantified. Specifically, a QBF is a Boolean formula that
includes quantifiers over the variables in the formula. These quantifiers can either be:
Existential quantifiers (∃): There exists a variable assignment that makes the formula true.
Universal quantifiers (∀): The formula must be true for all variable assignments.
This QBF expresses that there exists a value for x1 , such that for every value of x2 , the formula x1
∧ ¬x2 holds true.
The formula can be interpreted as a two-player game between the existential quantifier (which picks values for the
variables) and the universal quantifier (which must check whether the formula holds for all variable assignments).
Problem Definition:
The QBF Evaluation Problem is the problem of determining the truth value of a quantified Boolean formula. Given a QBF
formula Q consisting of Boolean variables and quantifiers, the task is to evaluate whether the formula is true or false based
on the values assigned to the variables.
For an existential quantifier (∃x), the formula is true if there exists a value of x that makes the formula true.
For a universal quantifier (∀x), the formula is true if, for every possible value of x, the formula remains true.
1. For ∃x1 , we check if there exists an assignment for x1 such that for all x2 , the formula x1
∨ x2 holds.
2. For each assignment of x1 , we then evaluate whether for all possible values of x2 , the formula holds.
For x1 = true, we find that the formula is true regardless of x2 's value, thus making the entire formula true.
To prove that the QBF problem is PS-complete, we need to show two things:
1. QBF is in PSPACE: We need to demonstrate that evaluating a QBF can be done using polynomial space. This is done by
observing that for each quantifier in the formula, the evaluation can be done recursively without requiring more than
polynomial space.
137/145
Recursive Space Evaluation: To evaluate a QBF, we traverse the formula from the outermost quantifier to the
innermost one, processing each quantifier sequentially. For each quantifier, we need only store the current partial
evaluation and the variables currently being considered, which requires polynomial space.
Space Complexity: The space required for storing the QBF evaluation at each level is proportional to the number of
variables in the formula. Since the formula has a polynomial size in the number of variables, the space required to
evaluate the QBF is also polynomial.
2. Every problem in PSPACE reduces to QBF: Next, we need to show that any problem in PSPACE can be reduced to the
QBF evaluation problem in polynomial time. This step typically involves encoding a PSPACE-complete problem (like the
Generalized Reachability Problem) into a QBF formula. If this reduction can be performed in polynomial time, and the
resulting formula can be evaluated in polynomial space, then the QBF problem is PS-complete.
Reduction from PSPACE Problems: The process of reducing a PSPACE problem to QBF involves converting the
space-bounded computations of the problem into a sequence of quantifiers and Boolean operations. These
quantifiers model the nondeterministic choices made by a machine during its computation. The Boolean variables
represent the states of the machine, and the quantifiers specify whether the machine can reach a certain state
under the given constraints.
Example of Reduction: Consider the problem of determining whether a given graph has a path from node A to
node B . This problem can be encoded into a QBF formula, where each quantifier represents a decision at each step
of the pathfinding process, and the Boolean variables represent the edges in the graph. By evaluating the QBF, we
can determine if a path exists.
Since both conditions hold—QBF is in PSPACE and any PSPACE problem can be reduced to QBF—we conclude that QBF is
PS-complete.
5. Conclusion
In this lecture, we have:
Introduced PS-completeness, which characterizes the hardest problems in the PSPACE complexity class.
Discussed the Quantified Boolean Formula (QBF) problem, which is a natural candidate for PS-completeness.
Evaluated a QBF formula and discussed how the evaluation process works by handling the existential and universal
quantifiers.
Provided a formal proof for the PS-completeness of the QBF problem, demonstrating that it is both in PSPACE and that
every problem in PSPACE can be polynomial-time reduced to QBF.
Understanding PS-completeness is crucial in theoretical computer science, as it provides insight into the structure of PSPACE
and identifies some of the most difficult problems within this class. The QBF problem is one of the key problems in this area
and serves as a benchmark for understanding the complexity of PSPACE.
138/145
1. Quicksort: An Example of a Randomized Algorithm
Quicksort Overview:
Quicksort is a well-known divide-and-conquer algorithm for sorting arrays. The key idea behind Quicksort is to select a
pivot element from the array and partition the other elements into two subarrays, according to whether they are less than
or greater than the pivot. These subarrays are then sorted recursively.
Randomized Quicksort:
In the randomized version of Quicksort, the pivot is selected randomly from the array rather than choosing a fixed pivot
(such as the first element). This random selection introduces an element of randomness in the algorithm, which affects the
runtime performance.
The main advantage of the randomized pivot selection is that it can ensure the algorithm performs well on average, even
for arrays with adversarial orderings, thus avoiding the worst-case time complexity O(n2 ) in many situations.
Expected Time Complexity: The expected time complexity of randomized Quicksort is O(n log n), where n is the
number of elements in the array. This is because each partitioning step divides the array into two parts, and on
average, the pivot splits the array in half.
Worst-Case Time Complexity: In the worst case, the algorithm could behave like a deterministic version of Quicksort,
resulting in O(n2 ) time complexity. However, this happens only with very low probability when the pivot is consistently
chosen poorly (e.g., always the smallest or largest element).
A randomized Turing machine (RTM) is a Turing machine that has access to random bits. At each step of computation, the
machine can make a random choice (such as choosing between multiple possible transitions). The randomness is typically
modeled by a special "random tape" that provides random binary values, which can be accessed during computation.
An RTM operates by reading from its input tape and the random tape, making decisions based on both the input and the
random bits. The main difference between a standard deterministic Turing machine (DTM) and an RTM is the presence of
randomness, which allows the RTM to explore multiple computational paths simultaneously.
Transition Function: The transition function δ of an RTM is similar to a DTM, but it takes an additional random input. At
each step, the RTM moves to a new state based not only on the current state and input symbol but also based on the
value of a random bit.
Computation Path: An RTM does not follow a single deterministic path; instead, it explores multiple possible paths,
each corresponding to a different sequence of random choices.
Acceptance Criteria: The RTM can accept or reject an input string based on the outcome of these multiple paths. The
machine accepts the input if at least one computation path leads to an accepting state.
139/145
Definition of a Language Recognized by an RTM:
The language of a randomized Turing machine is the set of all strings that the machine accepts with a certain probability.
For an input string w , the machine may either accept or reject it based on the sequence of random choices it makes during
computation.
The key point is that the machine does not necessarily accept the input with certainty but with high probability. Specifically,
we define the following:
Success Probability: The machine is said to accept a string w if the probability of accepting w over all possible random
paths is at least a certain threshold (often 1/2).
Error Probability: A machine is allowed to make errors, but the probability of error (i.e., rejecting an input when it is in
the language or accepting it when it is not) must be bounded by a small constant (such as 1/3).
4. The Class RP
Definition of RP:
The class RP (Randomized Polynomial time) consists of decision problems for which a randomized algorithm exists that can
decide the problem in polynomial time with the following properties:
Correct Acceptance: If the input string belongs to the language (i.e., the answer is "yes"), the machine always accepts
the string with probability 1.
Correct Rejection: If the input string does not belong to the language (i.e., the answer is "no"), the machine always
rejects the string with probability at least 1/2.
Error Bound: If the machine accepts an input that is not in the language, the error probability is bounded by 1/2.
Similarly, if the machine rejects an input that is in the language, the error probability is also bounded by 1/2.
1
RP = {L ∣ ∃ an RTM M such that M ∈ P and the error probability of M is at most }
2
Example:
An example of a problem in RP is the Primality Testing Problem (checking if a given number is prime). A well-known
randomized algorithm, such as Miller-Rabin primality test, can solve the problem in polynomial time with high probability.
5. Recognizing Languages in RP
Given a language L ∈ RP, the recognition process involves the following steps:
1. Input: The input string w is provided to the RTM.
2. Randomized Computation: The RTM performs its computation by reading the input tape and making random decisions
at each step based on the random tape.
140/145
Since the error probability is bounded, the RTM may need to repeat the process multiple times to reduce the error
probability. However, the machine always decides in polynomial time.
Definition of ZPP:
The class ZPP (Zero-error Probabilistic Polynomial time) consists of decision problems for which there exists a randomized
algorithm that runs in expected polynomial time and always produces correct answers (i.e., no error probability).
Correctness: The machine always either accepts or rejects the input with certainty, i.e., no errors occur.
Expected Polynomial Time: The expected running time of the algorithm is polynomial in the size of the input.
ZPP = {L ∣ ∃ a probabilistic algorithm A such that A always returns the correct answer in expected polynomial time}
Example:
An example of a ZPP problem is the Parity Problem, where the task is to determine whether a given sequence has an even
or odd number of 1's. A probabilistic algorithm can be designed such that it always produces the correct answer with
certainty, and the expected time complexity is polynomial.
RP ⊆ ZPP: Since ZPP guarantees no errors, any language in RP can also be recognized by a ZPP machine. In fact, every
language in RP can be transformed into a ZPP problem by repeating the randomization until the answer is certain.
ZPP ⊆ P: The class ZPP is also contained in P because the algorithm always produces the correct answer, and its
expected time is polynomial. Since expected polynomial time implies polynomial time in the worst case, ZPP is a subset
of P.
RP and NP: While RP is contained in P, it is not known whether RP is contained in NP. NP problems involve non-
deterministic choices and are not guaranteed to have zero-error solutions. Therefore, there is no clear containment
relationship between RP and NP without further research.
141/145
9. Conclusion
In this lecture, we:
Examined the concept of randomized algorithms and how they can be modeled using randomized Turing machines
(RTMs).
Introduced the classes RP and ZPP, providing formal definitions and exploring examples of problems in these classes.
Discussed the relationships between RP, ZPP, P, and NP, highlighting their connections and distinctions.
Understanding these randomized complexity classes is crucial for analyzing problems that involve probabilistic decision-
making and for designing efficient algorithms that leverage randomness.
Cryptography: Many cryptographic algorithms, such as RSA, rely on the difficulty of factoring large numbers, and
efficient primality testing is crucial for selecting large primes for cryptographic keys.
Randomized Algorithms: Primality testing is often used in the generation of random primes, which are necessary in
cryptography for secure key generation.
Mathematical Applications: Primality testing is essential for various algorithms in number theory, such as in the search
for large primes used in prime factorization and other problems.
Efficient primality testing is important because brute force checking of divisibility by every integer up to n − 1 would take
O(n) time, which is impractical for large numbers.
142/145
Multiplication: (a ⋅ b) mod m = [(a mod m) ⋅ (b mod m)] mod m
Exponentiation: (ak ) mod m is computed efficiently using modular exponentiation.
Many primality tests make use of modular arithmetic, particularly in the context of modular exponentiation, which allows
us to compute powers of numbers modulo a value efficiently. This is crucial in algorithms such as Fermat’s Little Theorem-
based tests and Miller-Rabin primality tests.
Exponentiation by Squaring:
Exponentiation by squaring is a method to compute powers of numbers modulo some modulus efficiently. It works as
follows:
By recursively applying this approach, we reduce the number of multiplications needed to compute ak mod m, making it
run in O(log k) time.
Example:
This method drastically reduces the time complexity compared to directly multiplying a by itself 13 times.
Given a number n, the algorithm tests if n is prime using the following steps:
143/145
Select a random base a for the test.
Compute ad mod n. If the result is 1, n may be prime. If the result is n − 1, then n may also be prime.
4. Perform repeated squaring for higher powers:
r
Compute higher powers of a mod n using modular exponentiation. Specifically, for each step, compute a2 ⋅d
mod n for r = 0, 1, … , s − 1. If any of these results equals n − 1, n passes this round of testing.
5. Repeat for multiple values of a:
If n passes the test for several different random values of a, the probability of n being prime increases. The
algorithm repeats for different bases to reduce the probability of error.
3
The time complexity of the Miller-Rabin test is O(k log n), where k is the number of iterations and n is the number being
tested. This is polynomial time for practical values of n, making the test very efficient for large numbers.
Example:
Let n = 17. We perform the Miller-Rabin test for random bases a to test primality. After several iterations (testing different
values of a), we conclude that 17 is prime with high probability.
One such example is the AKS primality test, which was developed as a deterministic primality test that runs in polynomial
time. Although the AKS test is deterministic, it is still a noteworthy result, as it was the first primality test proven to run in
polynomial time without reliance on randomization.
The AKS test, developed in 2002, is a deterministic algorithm for primality testing. It runs in polynomial time and has a
time complexity of O(n7.5 ).
The algorithm is based on concepts from algebraic number theory and polynomial interpolation.
It uses the fact that if a number n is prime, certain polynomial properties hold that can be verified in polynomial time.
The AKS test is important because it provides a deterministic approach to primality testing, removing the reliance on
randomization (as in Miller-Rabin) and making primality testing more predictable.
Conclusion
In this lecture, we explored the complexity of primality testing through the following key concepts:
2. The Miller-Rabin primality test, a randomized polynomial-time algorithm that tests whether a number is prime with
high probability.
144/145
3. The AKS primality test, which is a deterministic polynomial-time algorithm for primality testing.
4. Non-deterministic primality tests, and their relation to problems that can be solved by a non-deterministic machine.
Efficient primality testing is crucial for modern cryptographic systems and number-theoretic algorithms, and understanding
the complexity of these tests is essential for designing efficient algorithms in these domains.
145/145