Ai Full Notes
Ai Full Notes
According to the father of Artificial Intelligence, John McCarthy, it is “The science and engineering
of making intelligent machines, especially intelligent computer programs”.
Artificial Intelligence is a way of making a computer, a computer-controlled robot, or a software
think intelligently, in the similar manner the intelligent humans think.
AI is accomplished by studying how human brain thinks, and how humans learn, decide, and work
while trying to solve a problem, and then using the outcomes of this study as a basis of developing
intelligent software and systems.
Philosophy of AI
While exploiting the power of the computer systems, the curiosity of human, lead him to
wonder, “Can a machine think and behave like humans do?”
Thus, the development of AI started with the intention of creating similar intelligence in machines
that we find and regard high in humans.
Goals of AI
To Create Expert Systems − The systems which exhibit intelligent behavior, learn,
demonstrate, explain, and advice its users.
To Implement Human Intelligence in Machines − Creating systems that understand, think,
learn, and behave like humans.
Replicate human intelligence
Solve Knowledge-intensive tasks
An intelligent connection of perception and action
Building a machine which can perform tasks that requires human intelligence such as:
o Proving a theorem
o Playing chess
o Plan some surgical operation
o Driving a car in traffic
Creating some system which can exhibit intelligent behavior, learn new things by itself,
demonstrate, explain, and can advise to its user.
Artificial Intelligence is composed of two words Artificial and Intelligence, where Artificial
defines "man-made," and intelligence defines "thinking power", hence AI means "a man-made
thinking power."
"It is a branch of computer science by which we can create intelligent machines which can behave
like a human, think like humans, and able to make decisions."
Artificial Intelligence exists when a machine can have human based skills such as learning, reasoning,
and solving problems
With Artificial Intelligence you do not need to preprogram a machine to do some work, despite that
you can create a machine with programmed algorithms which can work with own intelligence, and
that is the awesomeness of AI.
It is believed that AI is not a new technology, and some people says that as per Greek myth, there were
Mechanical men in early days which can work and behave like humans.
Before Learning about Artificial Intelligence, we should know that what is the importance of AI and
why should we learn it. Following are some main reasons to learn about AI:
o With the help of AI, you can create such software or devices which can solve real-world
problems very easily and with accuracy such as health issues, marketing, traffic issues, etc.
o With the help of AI, you can create your personal virtual Assistant, such as Cortana, Google
Assistant, Siri, etc.
o With the help of AI, you can build such Robots which can work in an environment where
survival of humans can be at risk.
o AI opens a path for other new technologies, new devices, and new Opportunities.
What Contributes to AI?
Artificial intelligence is a science and technology based on disciplines such as Computer Science,
Biology, Psychology, Linguistics, Mathematics, and Engineering. A major thrust of AI is in the
development of computer functions associated with human intelligence, such as reasoning, learning,
and problem solving.
Out of the following areas, one or multiple areas can contribute to build an intelligent system.
What is AI Technique?
In the real world, the knowledge has some unwelcomed properties −
Applications of AI
History of AI
Here is the history of AI during 20th century −
1923 Karel Čapek play named “Rossum's Universal Robots” (RUR) opens in London, first use
of the word "robot" in English.
1945 Isaac Asimov, a Columbia University alumni, coined the term Robotics.
Alan Turing introduced Turing Test for evaluation of intelligence and published Computing
1950 Machinery and Intelligence. Claude Shannon published Detailed Analysis of Chess
Playing as a search.
John McCarthy coined the term Artificial Intelligence. Demonstration of the first running
1956
AI program at Carnegie Mellon University.
Danny Bobrow's dissertation at MIT showed that computers can understand natural
1964
language well enough to solve algebra word problems correctly.
Joseph Weizenbaum at MIT built ELIZA, an interactive problem that carries on a dialogue
1965
in English.
The Assembly Robotics group at Edinburgh University built Freddy, the Famous Scottish
1973
Robot, capable of using vision to locate and assemble models.
1979 The first computer-controlled autonomous vehicle, Stanford Cart, was built.
1985 Harold Cohen created and demonstrated the drawing program, Aaron.
1997 The Deep Blue Chess Program beats the then world chess champion, Garry Kasparov.
Interactive robot pets become commercially available. MIT displays Kismet, a robot with a
2000 face that expresses emotions. The robot Nomad explores remote regions of Antarctica and
locates meteorites.
o High Accuracy with less errors: AI machines or systems are prone to less errors and high
accuracy as it takes decisions as per pre-experience or information.
o High-Speed: AI systems can be of very high-speed and fast-decision making, because of that
AI systems can beat a chess champion in the Chess game.
o High reliability: AI machines are highly reliable and can perform the same action multiple
times with high accuracy.
o Useful for risky areas: AI machines can be helpful in situations such as defusing a bomb,
exploring the ocean floor, where to employ a human can be risky.
o Digital Assistant: AI can be very useful to provide digital assistant to the users such as AI
technology is currently used by various E-commerce websites to show the products as per
customer requirement.
o Useful as a public utility: AI can be very useful for public utilities such as a self-driving car
which can make our journey safer and hassle-free, facial recognition for security purpose,
Natural language processing to communicate with the human in human-language, etc.
Every technology has some disadvantages, and the same goes for Artificial intelligence. Being so
advantageous technology still, it has some disadvantages which we need to keep in our mind while
creating an AI system. Following are the disadvantages of AI:
o High Cost: The hardware and software requirement of AI is very costly as it requires lots of
maintenance to meet current world requirements.
o Can't think out of the box: Even we are making smarter machines with AI, but still they
cannot work out of the box, as the robot will only do that work for which they are trained, or
programmed.
o No feelings and emotions: AI machines can be an outstanding performer, but still it does not
have the feeling so it cannot make any kind of emotional attachment with human, and may
sometime be harmful for users if the proper care is not taken.
o Increase dependency on machines: With the increment of technology, people are getting
more dependent on devices and hence they are losing their mental capabilities.
o No Original Creativity: As humans are so creative and can imagine some new ideas but still
AI machines cannot beat this power of human intelligence and cannot be creative and
imaginative.
o
Task Classification of AI
The domain of AI is classified into Formal tasks, Mundane tasks, and Expert tasks.
Task Domains of Artificial Intelligence
Planing Creativity
Robotics
Locomotive
Humans learn mundane (ordinary) tasks since their birth. They learn by perception, speaking, using
language, and locomotives. They learn Formal Tasks and Expert Tasks later, in that order.
For humans, the mundane tasks are easiest to learn. The same was considered true before trying to
implement mundane tasks in machines. Earlier, all work of AI was concentrated in the mundane task
domain.
Later, it turned out that the machine requires more knowledge, complex knowledge representation,
and complicated algorithms for handling mundane tasks. This is the reason why AI work is more
prospering in the Expert Tasks domain now, as the expert task domain needs expert knowledge
without common sense, which can be easier to represent and handle.
Types of Artificial Intelligence:
Artificial Intelligence can be divided in various types, there are mainly two types of main
categorization which are based on capabilities and based on functionally of AI. Following is flow
diagram which explain the types of AI.
3. Super AI:
o Super AI is a level of Intelligence of Systems at which machines could surpass human
intelligence, and can perform any task better than human with cognitive properties. It is an
outcome of general AI.
o Some key characteristics of strong AI include capability include the ability to think, to
reason,solve the puzzle, make judgments, plan, learn, and communicate by its own.
o Super AI is still a hypothetical concept of Artificial Intelligence. Development of such systems
in real is still world changing task.
2. Limited Memory
o Limited memory machines can store past experiences or some data for a short period of time.
o These machines can use stored data for a limited time period only.
o Self-driving cars are one of the best examples of Limited Memory systems. These cars can
store recent speed of nearby cars, the distance of other cars, speed limit, and other
information to navigate the road.
3. Theory of Mind
o Theory of Mind AI should understand the human emotions, people, beliefs, and be able to
interact socially like humans.
o This type of AI machines are still not developed, but researchers are making lots of efforts
and improvement for developing such AI machines.
4. Self-Awareness
o Self-awareness AI is the future of Artificial Intelligence. These machines will be super
intelligent, and will have their own consciousness, sentiments, and self-awareness.
o These machines will be smarter than human mind.
o Self-Awareness AI does not exist in reality still and it is a hypothetical concept.
Agent Terminology
Performance Measure of Agent − It is the criteria, which determines how successful an agent
is.
Behavior of Agent − It is the action that agent performs after any given sequence of percepts.
Percept − It is agent’s perceptual inputs at a given instance.
Percept Sequence − It is the history of all that an agent has perceived till date.
Agent Function − It is a map from the precept sequence to an action.
Rationality
Rationality is nothing but status of being reasonable, sensible, and having good sense of judgment.
Rationality is concerned with expected actions and results depending upon what the agent has
perceived. Performing actions with the aim of obtaining useful information is an important part of
rationality.
Some programs operate in the entirely artificial environment confined to keyboard input, database,
computer file systems and character output on a screen.
In contrast, some software agents (software robots or softbots) exist in rich, unlimited softbots
domains. The simulator has a very detailed, complex environment. The software agent needs to
choose from a long array of actions in real time. A softbot designed to scan the online preferences of
the customer and show interesting items to the customer works in the real as well as
an artificial environment.
The most famous artificial environment is the Turing Test environment, in which one real and
other artificial agents are tested on equal ground. This is a very challenging environment as it is highly
difficult for a software agent to perform as well as a human.
Turing Test
The success of an intelligent behavior of a system can be measured with Turing Test.
Two persons and a machine to be evaluated participate in the test. Out of the two persons, one plays
the role of the tester. Each of them sits in different rooms. The tester is unaware of who is machine
and who is a human. He interrogates the questions by typing and sending them to both intelligences,
to which he receives typed responses.
This test aims at fooling the tester. If the tester fails to determine machine’s response from the human
response, then the machine is said to be intelligent.
Properties of Environment
The environment has multifold properties −
Discrete / Continuous − If there are a limited number of distinct, clearly defined, states of the
environment, the environment is discrete (For example, chess); otherwise it is continuous (For
example, driving).
Observable / Partially Observable − If it is possible to determine the complete state of the
environment at each time point from the percepts it is observable; otherwise it is only partially
observable.
Static / Dynamic − If the environment does not change while an agent is acting, then it is
static; otherwise it is dynamic.
Single agent / Multiple agents − The environment may contain other agents which may be of
the same or different kind as that of the agent.
Accessible / Inaccessible − If the agent’s sensory apparatus can have access to the complete
state of the environment, then the environment is accessible to that agent.
Deterministic / Non-deterministic − If the next state of the environment is completely
determined by the current state and the actions of the agent, then the environment is
deterministic; otherwise it is non-deterministic.
Episodic / Non-episodic − In an episodic environment, each episode consists of the agent
perceiving and then acting. The quality of its action depends just on the episode itself.
Subsequent episodes do not depend on the actions in the previous episodes. Episodic
environments are much simpler because the agent does not need to think ahead.
Problem Solving
Problem:
A problem, which can be caused for different reasons, and, if solvable, can usually
be solved in a number of different ways, is defined in a number of different ways.
State space is a set of legal positions, starting at the initial state, using the set of rules
to move from one state to another and attempting to end up in a goal state.
Searching Algorithm
Search Algorithm Terminologies:
Based on the search problems we can classify the search algorithms into uninformed (Blind
search) search and informed search (Heuristic search) algorithms.
AO* Search
Uninformed/Blind Search:
The uninformed search does not contain any domain knowledge such as closeness, the location of the
goal. It operates in a brute-force way as it only includes information about how to traverse the tree and
how to identify leaf and goal nodes. Uninformed search applies a way in which search tree is searched
without any information about the search space like initial state operators and test for the goal, so it is
also called blind search.It examines each node of the tree until it achieves the goal node.
o Breadth-first search
o Uniform cost search
o Depth-first search
o Iterative deepening depth-first search
o Bidirectional Search
Informed Search
Informed search algorithms use domain knowledge. In an informed search, problem information is
available which can guide the search. Informed search strategies can find a solution more efficiently
than an uninformed search strategy. Informed search is also called a Heuristic search.
A heuristic is a way which might not always be guaranteed for best solutions but guaranteed to find a
good solution in reasonable time.
Informed search can solve much complex problem which could not be solved in another way.
1. Greedy Search
2. A* Search
3. AO* Search
1. Breadth-first Search
2. Depth-first Search
3. Depth-limited Search
4. Iterative deepening depth-first search
5. Uniform cost search
6. Bidirectional Search
1. Breadth-first Search:
o Breadth-first search is the most common search strategy for traversing a tree or graph. This
algorithm searches breadthwise in a tree or graph, so it is called breadth-first search.
o BFS algorithm starts searching from the root node of the tree and expands all successor node
at the current level before moving to nodes of next level.
o The breadth-first search algorithm is an example of a general-graph search algorithm.
o Breadth-first search implemented using FIFO queue data structure.
Advantages:
o If there are more than one solutions for a given problem, then BFS will provide the
minimal solution which requires the least number of steps.
Disadvantages:
o It requires lots of memory since each level of the tree must be saved into memory
to expand the next level.
o BFS needs lots of time if the solution is far away from the root node.
Example:
In the below tree structure, we have shown the traversing of the tree using BFS algorithm from the
root node S to goal node K. BFS search algorithm traverse in layers, so it will follow the path which
is shown by the dotted arrow, and the traversed path will be:
1. S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K
Time Complexity: Time Complexity of BFS algorithm can be obtained by the number of nodes
traversed in BFS until the shallowest Node. Where the d= depth of shallowest solution and b is a node
at every state.
Space Complexity: Space complexity of BFS algorithm is given by the Memory size of frontier which
is O(bd).
Completeness: BFS is complete, which means if the shallowest goal node is at some finite depth, then
BFS will find a solution.
Optimality: BFS is optimal if path cost is a non-decreasing function of the depth of the node.
2. Depth-first Search:
o Depth-first search isa recursive algorithm for traversing a tree or graph data structure.
o It is called the depth-first search because it starts from the root node and follows each path to
its greatest depth node before moving to the next path.
o DFS uses a stack data structure for its implementation.
o The process of the DFS algorithm is similar to the BFS algorithm.
Note: Backtracking is an algorithm technique for finding all possible solutions using recursion.
Advantage:
o DFS requires very less memory as it only needs to store a stack of the nodes on the path from
root node to the current node.
o It takes less time to reach to the goal node than BFS algorithm (if it traverses in the right
path).
Disadvantage:
o There is the possibility that many states keep re-occurring, and there is no guarantee of
finding the solution.
o DFS algorithm goes for deep down searching and sometime it may go to the infinite loop.
Example:
In the below search tree, we have shown the flow of depth-first search, and it will follow the order
as:
It will start searching from root node S, and traverse A, then B, then D and E, after traversing E, it
will backtrack the tree as E has no other successor and still goal node is not found. After
backtracking it will traverse node C and then G, and here it will terminate as it found goal node.
Completeness: DFS search algorithm is complete within finite state space as it will expand every
node within a limited search tree.
Time Complexity: Time complexity of DFS will be equivalent to the node traversed by the
algorithm. It is given by:
Where, m= maximum depth of any node and this can be much larger than d (Shallowest
solution depth)
Space Complexity: DFS algorithm needs to store only single path from the root node, hence space
complexity of DFS is equivalent to the size of the fringe set, which is O(bm).
Optimal: DFS search algorithm is non-optimal, as it may generate a large number of steps or high
cost to reach to the goal node.
A depth-limited search algorithm is similar to depth-first search with a predetermined limit. Depth-
limited search can solve the drawback of the infinite path in the Depth-first search. In this algorithm,
the node at the depth limit will treat as it has no successor nodes further.
Advantages:
Disadvantages:
Example:
Completeness: DLS search algorithm is complete if the solution is above the depth-limit.
Optimal: Depth-limited search can be viewed as a special case of DFS, and it is also not optimal even
if ℓ>d.
Uniform-cost search is a searching algorithm used for traversing a weighted tree or graph. This
algorithm comes into play when a different cost is available for each edge. The primary goal of the
uniform-cost search is to find a path to the goal node which has the lowest cumulative cost. Uniform-
cost search expands nodes according to their path costs form the root node. It can be used to solve any
graph/tree where the optimal cost is in demand. A uniform-cost search algorithm is implemented by
the priority queue. It gives maximum priority to the lowest cumulative cost. Uniform cost search is
equivalent to BFS algorithm if the path cost of all edges is the same.
Advantages:
o Uniform cost search is optimal because at every state the path with the least cost is chosen.
Disadvantages:
o It does not care about the number of steps involve in searching and only concerned about path
cost. Due to which this algorithm may be stuck in an infinite loop.
Example:
Completeness:
Uniform-cost search is complete, such as if there is a solution, UCS will find it.
Time Complexity:
Let C* is Cost of the optimal solution, and ε is each step to get closer to the goal node. Then the
number of steps is = C*/ε+1. Here we have taken +1, as we start from state 0 and end to C*/ε.
Space Complexity:
The same logic is for space complexity so, the worst-case space complexity of Uniform-cost search
is O(b1 + [C*/ε]).
Optimal:
Uniform-cost search is always optimal as it only selects a path with the lowest path cost.
The iterative deepening algorithm is a combination of DFS and BFS algorithms. This search algorithm
finds out the best depth limit and does it by gradually increasing the limit until a goal is found.
This algorithm performs depth-first search up to a certain "depth limit", and it keeps increasing the
depth limit after each iteration until the goal node is found.
This Search algorithm combines the benefits of Breadth-first search's fast search and depth-first
search's memory efficiency.
The iterative search algorithm is useful uninformed search when search space is large, and depth of
goal node is unknown.
Advantages:
o It combines the benefits of BFS and DFS search algorithm in terms of fast search and memory
efficiency.
Disadvantages:
o The main drawback of IDDFS is that it repeats all the work of the previous phase.
Example:
Following tree structure is showing the iterative deepening depth-first search. IDDFS algorithm
performs various iterations until it does not find the goal node. The iteration performed by the
algorithm is given as:
1'st Iteration-----> A
2'nd Iteration----> A, B, C
3'rd Iteration------>A, B, D, E, C, F, G
4'th Iteration------>A, B, D, H, I, E, C, F, K, G
In the fourth iteration, the algorithm will find the goal node.
Completeness:
Time Complexity:
Let's suppose b is the branching factor and depth is d then the worst-case time complexity is O(bd).
Space Complexity:
Optimal:
IDDFS algorithm is optimal if path cost is a non- decreasing function of the depth of the node.
6. Bidirectional Search Algorithm:
Bidirectional search algorithm runs two simultaneous searches, one form initial state called as forward-
search and other from goal node called as backward-search, to find the goal node. Bidirectional search
replaces one single search graph with two small subgraphs in which one starts the search from an initial
vertex and other starts from goal vertex. The search stops when these two graphs intersect each other.
Bidirectional search can use search techniques such as BFS, DFS, DLS, etc.
Advantages:
Disadvantages:
Example:
In the below search tree, bidirectional search algorithm is applied. This algorithm divides one
graph/tree into two sub-graphs. It starts traversing from node 1 in the forward direction and starts
from goal node 16 in the backward direction.
To solve large problems with large number of possible states, problem-specific knowledge needs to
be added to increase the efficiency of search algorithms.
The uninformed search algorithms which looked through search space for all possible solutions of the
problem without having any additional knowledge about search space. But informed search algorithm
contains an array of knowledge such as how far we are from the goal, path cost, how to reach to goal
node, etc. This knowledge help agents to explore less to the search space and find more efficiently the
goal node.
The informed search algorithm is more useful for large search space. Informed search algorithm uses
the idea of heuristic, so it is also called Heuristic search.
Heuristics function: Heuristic is a function which is used in Informed Search, and it finds the most
promising path. It takes the current state of the agent as its input and produces the estimation of how
close agent is from the goal. The heuristic method, however, might not always give the best solution,
but it guaranteed to find a good solution in reasonable time. Heuristic function estimates how close a
state is to the goal. It is represented by h(n), and it calculates the cost of an optimal path between the
pair of states. The value of the heuristic function is always positive.
In the informed search we will discuss three main algorithms which are given below:
Greedy best-first search algorithm always selects the path which appears best at that moment. It is the
combination of depth-first search and breadth-first search algorithms. It uses the heuristic function and
search. Best-first search allows us to take the advantages of both algorithms. With the help of best-
first search, at each step, we can choose the most promising node. In the best first search algorithm,
we expand the node which is closest to the goal node and the closest cost is estimated by heuristic
function, i.e.
f(n)= g(n).
Advantages:
o Best first search can switch between BFS and DFS by gaining the advantages of both the
algorithms.
o This algorithm is more efficient than BFS and DFS algorithms.
Disadvantages:
o It can behave as an unguided depth-first search in the worst case scenario.
o It can get stuck in a loop as DFS.
o This algorithm is not optimal.
Example:
Consider the below search problem, and we will traverse it using greedy best-first search. At each
iteration, each node is expanded using evaluation function f(n)=h(n) , which is given in the below
table.
In this search example, we are using two lists which are OPEN and CLOSED Lists. Following are
the iteration for traversing the above example.
Expand the nodes of S and put in the CLOSED list
Time Complexity: The worst case time complexity of Greedy best first search is O(bm).
Space Complexity: The worst case space complexity of Greedy best first search is O(bm). Where, m
is the maximum depth of the search space.
Complete: Greedy best-first search is also incomplete, even if the given state space is finite.
A* search is the most commonly known form of best-first search. It uses heuristic function h(n), and
cost to reach the node n from the start state g(n). It has combined features of UCS and greedy best-
first search, by which it solve the problem efficiently. A* search algorithm finds the shortest path
through the search space using the heuristic function. This search algorithm expands less search tree
and provides optimal result faster. A* algorithm is similar to UCS except that it uses g(n)+h(n) instead
of g(n).
In A* search algorithm, we use search heuristic as well as the cost to reach the node. Hence we can
combine both costs as following, and this sum is called as a fitness number.
At each point in the search space, only those node is expanded which have the lowest value of f(n),
and the algorithm terminates when the goal node is found.
Algorithm of A* search:
Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of evaluation function (g+h),
if node n is goal node then return success and stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the closed list. For each
successor n', check whether n' is already in the OPEN or CLOSED list, if not then compute evaluation
function for n' and place into Open list.
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to the back pointer
which reflects the lowest g(n') value.
Advantages:
o A* search algorithm is the best algorithm than other search algorithms.
o A* search algorithm is optimal and complete.
o This algorithm can solve very complex problems.
Disadvantages:
o It does not always produce the shortest path as it mostly based on heuristics and approximation.
o A* search algorithm has some complexity issues.
o The main drawback of A* is memory requirement as it keeps all generated nodes in the
memory, so it is not practical for various large-scale problems.
Example:
In this example, we will traverse the given graph using the A* algorithm. The heuristic value of all
states is given in the below table so we will calculate the f(n) of each state using the formula f(n)= g(n)
+ h(n), where g(n) is the cost to reach any node from start state.
Here we will use OPEN and CLOSED list.
Solution:
Points to remember:
o A* algorithm returns the path which occurred first, and it does not search for all remaining
paths.
o The efficiency of A* algorithm depends on the quality of heuristic.
o A* algorithm expands all nodes which satisfy the condition f(n)<="" li="">
o Admissible: the first condition requires for optimality is that h(n) should be an admissible
heuristic for A* tree search. An admissible heuristic is optimistic in nature.
o Consistency: Second required condition is consistency for only A* graph-search.
If the heuristic function is admissible, then A* tree search will always find the least cost path.
Time Complexity: The time complexity of A* search algorithm depends on heuristic function, and
the number of nodes expanded is exponential to the depth of solution d. So the time complexity is
O(b^d), where b is the branching factor.
Our real-life situations can’t be exactly decomposed into either AND tree or OR tree but is always a
combination of both. So, we need an AO* algorithm where O stands for ‘ordered’. AO* algorithm
represents a part of the search graph that has been explicitly generated so far.
AO* algorithm is given as follows:
o Generate and Test variant: Hill Climbing is the variant of Generate and Test method. The
Generate and Test method produce feedback which helps to decide which direction to move in
the search space.
o Greedy approach: Hill-climbing algorithm search moves in the direction which optimizes the
cost.
o No backtracking: It does not backtrack the search space, as it does not remember the previous
states.
The state-space landscape is a graphical representation of the hill-climbing algorithm which is showing
a graph between various states of algorithm and Objective function/Cost.
On Y-axis we have taken the function which can be an objective function or cost function, and state-
space on the x-axis. If the function on Y-axis is cost then, the goal of search is to find the global
minimum and local minimum. If the function of Y-axis is Objective function, then the goal of the
search is to find the global maximum and local maximum.
Different regions in the state space landscape:
Local Maximum: Local maximum is a state which is better than its neighbor states, but there is also
another state which is higher than it.
Global Maximum: Global maximum is the best possible state of state space landscape. It has the
highest value of objective function.
Flat local maximum: It is a flat space in the landscape where all the neighbor states of current states
have the same value.
Simple hill climbing is the simplest way to implement a hill climbing algorithm. It only evaluates the
neighbor node state at a time and selects the first one which optimizes current cost and set it as
a current state. It only checks it's one successor state, and if it finds better than the current state, then
move else be in the same state. This algorithm has the following features:
The steepest-Ascent algorithm is a variation of simple hill climbing algorithm. This algorithm
examines all the neighboring nodes of the current state and selects one neighbor node which is closest
to the goal state. This algorithm consumes more time as it searches for multiple neighbors
Stochastic hill climbing does not examine for all its neighbor before moving. Rather, this search
algorithm selects one neighbor node at random and decides whether to choose it as a current state or
examine another state.
1. Local Maximum: A local maximum is a peak state in the landscape which is better than each of its
neighboring states, but there is another state also present which is higher than the local maximum.
Solution: Backtracking technique can be a solution of the local maximum in state space landscape.
Create a list of the promising path so that the algorithm can backtrack the search space and explore
other paths as well.
2. Plateau: A plateau is the flat area of the search space in which all the neighbor states of the current
state contains the same value, because of this algorithm does not find any best direction to move. A
hill-climbing search might be lost in the plateau area.
Solution: The solution for the plateau is to take big steps or very little steps while searching, to solve
the problem. Randomly select a state which is far away from the current state so it is possible that the
algorithm could find non-plateau region.
3. Ridges: A ridge is a special form of the local maximum. It has an area which is higher than its
surrounding areas, but itself has a slope, and cannot be reached in a single move.
Solution: With the use of bidirectional search, or by moving in different directions, we can improve
this problem.
Simulated Annealing:
A hill-climbing algorithm which never makes a move towards a lower value guaranteed to be
incomplete because it can get stuck on a local maximum. And if algorithm applies a random walk, by
moving a successor, then it may complete but not efficient. Simulated Annealing is an algorithm
which yields both efficiency and completeness.
In mechanical term Annealing is a process of hardening a metal or glass to a high temperature then
cooling gradually, so this allows the metal to reach a low-energy crystalline state. The same process is
used in simulated annealing in which the algorithm picks a random move, instead of picking the best
move. If the random move improves the state, then it follows the same path. Otherwise, the algorithm
follows the path which has a probability of less than 1 or it moves downhill and chooses another path.
states and goal test conform to a standard, structured and simple representation
general-purpose heuristic
A constraint satisfaction problem (or CSP) is defined by a set of variables- X1, X2, . . . , Xn, and a set
of constraints, C1, C2, . . . , Cm. Each variable Xi has a CONSTRAINTS nonempty domain Di of
possible values. Each constraint Ci involves some subset of the DOMAIN VALUES variables and
specifies the allowable combinations of values for that subset. A state of the problem is defined by an
assignment of valuesto some or all of the variables, {Xi = vi , Xj = ASSIGNMENT vj , . . .}. An
assignment that does not violate any constraints is called a consistent or legal CONSISTENT
assignment. A complete assignment is one in which every variable is mentioned, and a solution to a
CSP is a complete assignment that satisfies all the constraints. Some CSPs also require a solution that
maximizes an objective function.
Solution:
Each state in a CSP is defined by an assignment of values to some or all of the variables
An assignment that does not violate any constraints is called a consistent or legal assignment
A complete assignment is one in which every variable is assigned
A solution to a CSP is consistent and complete assignment
Allows useful general-purpose algorithms with more power than standard search algorithms
1. Prepositional Logic:
Propositional logic (PL) is the simplest form of logic where all the statements are made by
propositions. A proposition is a declarative statement which is either true or false. It is a
technique of knowledge representation in logical and mathematical form.
Example:
a) It is Sunday.
b) The Sun rises from West (False proposition)
c) 3+3= 7(False proposition)
d) 5 is a prime number.
The syntax of propositional logic defines the allowable sentences for the knowledge
representation. There are two types of Propositions:
a) Atomic Propositions
b) Compound propositions
Example:
Example:
Truth Table:
In propositional logic, we need to know the truth values of propositions in all possible scenarios. We
can combine all the possible combination with logical connectives, and the representation of these
combinations in a tabular format is called Truth table. Following are the truth table for all logical
connectives:
Truth table with three propositions:
We can build a proposition composing three propositions P, Q, and R. This truth table is made-
up of 8n Tuples as we have taken three proposition symbols.
Precedence of connectives:
Just like arithmetic operators, there is a precedence order for propositional connectors or logical
operators. This order should be followed while evaluating a propositional problem. Following is
the list of the precedence order for operators:
Precedence Operators
Logical equivalence:
Logical equivalence is one of the features of propositional logic. Two propositions are said to be
logically equivalent if and only if the columns in the truth table are identical to each other.
Let's take two propositions A and B, so for logical equivalence, we can write it as A⇔B. In
below truth table we can see that column for ¬A∨ B and A→B, are identical hence A is
Equivalent to B
Properties of Operators:
o Commutativity:
o P∧ Q= Q ∧ P, or
o P ∨ Q = Q ∨ P.
o Associativity:
o (P ∧ Q) ∧ R= P ∧ (Q ∧ R),
o (P ∨ Q) ∨ R= P ∨ (Q ∨ R)
o Identity element:
o P ∧ True = P,
o P ∨ True= True.
o Distributive:
o P∧ (Q ∨ R) = (P ∧ Q) ∨ (P ∧ R).
o P ∨ (Q ∧ R) = (P ∨ Q) ∧ (P ∨ R).
o DE Morgan's Law:
o ¬ (P ∧ Q) = (¬P) ∨ (¬Q)
o ¬ (P ∨ Q) = (¬ P) ∧ (¬Q).
o Double-negation elimination:
o ¬ (¬P) = P.
o We cannot represent relations like ALL, some, or none with propositional logic.
Example:
a. All the girls are intelligent.
b. Some apples are sweet.
o Propositional logic has limited expressive power.
o In propositional logic, we cannot describe statements in terms of their properties or
logical relationships.
First-Order logic:
o First-order logic is another way of knowledge representation in artificial intelligence. It is
an extension to propositional logic.
o FOL is sufficiently expressive to represent the natural language statements in a concise
way.
o First-order logic is also known as Predicate logic or First-order predicate logic. First-
order logic is a powerful language that develops information about the objects in a more
easy way and can also express the relationship between those objects.
o First-order logic (like natural language) does not only assume that the world contains
facts like propositional logic but also assumes the following things in the world:
o Objects: A, B, people, numbers, colors, wars, theories, squares, pits, wumpus,
......
o Relations: It can be unary relation such as: red, round, is adjacent, or n-any
relation such as: the sister of, brother of, has color, comes between
o Function: Father of, best friend, third inning of, end of, ......
o As a natural language, first-order logic also has two main parts:
a. Syntax
b. Semantics
The syntax of FOL determines which collection of symbols is a logical expression in first-order
logic. The basic syntactic elements of first-order logic are symbols. We write statements in short-
hand notation in FOL.
Variables x, y, z, a, b,....
Connectives ∧, ∨, ¬, ⇒, ⇔
Equality ==
Quantifier ∀, ∃
Atomic sentences:
o Atomic sentences are the most basic sentences of first-order logic. These sentences are
formed from a predicate symbol followed by a parenthesis with a sequence of terms.
o We can represent atomic sentences as Predicate (term1, term2, ......, term n).
Complex Sentences:
Consider the statement: "x is an integer.", it consists of two parts, the first part x is the subject
of the statement and second part "is an integer," is known as a predicate.
Universal Quantifier:
Universal quantifier is a symbol of logical representation, which specifies that the statement
within its range is true for everything or every instance of a particular thing.
o For all x
o For each x
o For every x.
Example:
Let a variable x which refers to a cat so all x can be represented in UOD as below:
It will be read as: There are all x where x is a man who drink coffee.
Existential Quantifier:
Existential quantifiers are the type of quantifiers, which express that the statement within its
scope is true for at least one instance of something.
It is denoted by the logical operator ∃, which resembles as inverted E. When it is used with a
predicate variable then it is called as an existential quantifier.
If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:
Example:
It will be read as: There are some x where x is a boy who is intelligent.
Points to remember:
o The main connective for universal quantifier ∀ is implication →.
o The main connective for existential quantifier ∃ is and ∧.
Properties of Quantifiers:
o In universal quantifier, ∀x∀y is similar to ∀y∀x.
o In Existential quantifier, ∃x∃y is similar to ∃y∃x.
o ∃x∀y is not similar to ∀y∃x.
2. Every man respects his parent. In this question, the predicate is "respect(x, y)," where
x=man, and y= parent. Since there is every man so will use ∀, and it will be represented as
follows:
∀x man(x) → respects (x, parent).
3. Some boys play cricket. In this question, the predicate is "play(x, y)," where x= boys, and y=
game. Since there are some boys so we will use ∃, and it will be represented as:
∃x boys(x) → play(x, cricket).
4. Not all students like both Mathematics and Science. In this question, the predicate is
"like(x, y)," where x= student, and y= subject. Since there are not all students, so we will
use ∀ with negation, so following representation for this:
¬∀ (x) [ student(x) → like(x, Mathematics) ∧ like(x, Science)].
5. Only one student failed in Mathematics. In this question, the predicate is "failed(x, y),"
where x= student, and y= subject. Since there is only one student who failed in Mathematics,
so we will use following representation for this:
∃(x) [ student(x) → failed (x, Mathematics) ∧∀ (y) [¬(x==y) ∧ student(y) → ¬failed
(x, Mathematics)].
The quantifiers interact with variables which appear in a suitable way. There are two types of
variables in First-order logic which are given below:
Free Variable: A variable is said to be a free variable in a formula if it occurs outside the scope
of the quantifier.
Example: ∀x ∃(y)[P (x, y, z)], where z is a free variable.
Bound Variable: A variable is said to be a bound variable in a formula if it occurs within the
scope of the quantifier.
Situation Calculus:
The idea behind situation calculus is that (reachable) states are definable in terms of the actions
required to reach them. These reachable states are called situations. What is true in a situation
can be defined in terms of relations with the situation as an argument. Situation calculus can be
seen as a relational version of the feature-based representation of actions.
Here we only consider single agents, a fully observable environment, and deterministic actions.
Situation calculus is defined in terms of situations. A situation is either
init, the initial situation, or
do(A,S), the situation resulting from doing action A in situation S, if it is possible to do
action A in situation S.
Example 14.1: Consider the domain of Figure 3.1. Suppose in the initial situation, init, the robot,
Rob, is at location o109 and there is a key k1 at the mail room and a package at storage.
do(move(rob,o109,o103), init)
is the situation resulting from Rob moving from position o109 in situation init to position o103.
In this situation, Rob is at o103, the key k1 is still at mail, and the package is at storage.
The situation
do(move(rob,o103,mail),
do(move(rob,o109,o103),
init))
is one in which the robot has moved from position o109 to o103 to mail and is currently at mail.
Suppose Rob then picks up the key, k1. The resulting situation is
do(pickup(rob,k1),
do(move(rob,o103,mail),
do(move(rob,o109,o103),
init))).
In this situation, Rob is at position mail carrying the key k1.
A situation can be associated with a state. There are two main differences between situations and
states:
Multiple situations may refer to the same state if multiple sequences of actions lead to the
same state. That is, equality between situations is not the same as equality between states.
Not all states have corresponding situations. A state is reachable if a sequence of actions
exists that can reach that state from the initial state. States that are not reachable do not
have a corresponding situation.
Some do(A,S) terms do not correspond to any state. However, sometimes an agent must reason
about such a (potential) situation without knowing if A is possible in state S, or if S is possible.
Example 14.2: The term do(unlock(rob,door1),init) does not denote a state at all, because it is
not possible for Rob to unlock the door when Rob is not at the door and does not have the key.
A static relation is a relation for which the truth value does not depend on the situation; that is,
its truth value is unchanging through time. A dynamic relation is a relation for which the truth
value depends on the situation. To represent what is true in a situation, predicate symbols
denoting dynamic relations have a situation argument so that the truth can depend on the
situation. A predicate symbol with a situation argument is called a fluent.
Example 14.3: The relation at(O,L,S) is true when object O is at location L in situation S.
Thus, at is a fluent.
The atom
at(rob,o109,init)
is true if the robot rob is at position o109 in the initial situation. The atom
at(rob,o103,do(move(rob,o109,o103), init))
is true if robot rob is at position o103 in the situation resulting from rob moving from
position o109 to position o103 from the initial situation. The atom
at(k1,mail,do(move(rob,o109,o103), init))
is true if k1 is at position mail in the situation resulting from rob moving from position o109 to
position o103 from the initial situation.
A dynamic relation is axiomatized by specifying the situations in which it is true. Typically, this
is done inductively in terms of the structure of situations.
Axioms with init as the situation parameter are used to specify what is true in the initial
situation.
A primitive relation is defined by specifying when it is true in situations of the
form do(A,S) in terms of what is true in situation S. That is, primitive relations are defined
in terms of what is true at the previous situation.
A derived relation is defined using clauses with a variable in the situation argument. The
truth of a derived relation in a situation depends on what else is true in the same situation.
Static relations are defined without reference to the situation.
Example 14.4: Suppose the delivery robot, Rob, is in the domain depicted in Figure 3.1. Rob is
at location o109, the parcel is in the storage room, and the key is in the mail room. The following
axioms describe this initial situation:
at(rob,o109,init).
at(parcel,storage,init).
at(k1,mail,init).
Notice the free S variable; these clauses are true for all situations. We cannot omit the S because
which rooms are adjacent depends on whether a door is unlocked. This can change from situation
to situation.
The between relation is static and does not require a situation variable:
between(door1,o103,lab2).
We also distinguish whether or not an agent is being carried. If an object is not being carried, we
say that the object is sitting at its location. We distinguish this case because an object being
carried moves with the object carrying it. An object is at a location if it is sitting at that location
or is being carried by an object at that location. Thus, at is a derived relation:
at(Ob,P,S)←
sitting_at(Ob,P,S).
at(Ob,P,S)←
carrying(Ob1,Ob,S)∧
at(Ob1,P,S).
Note that this definition allows for Rob to be carrying a bag, which, in turn, is carrying a book.
The precondition of an action specifies when it is possible to carry out the action. The
relation poss(A,S) is true when action A is possible in situation S. This is typically a derived
relation.
Example 14.5: An agent can always put down an object it is carrying:
poss(putdown(Ag,Obj),S) ←
carrying(Ag,Obj,S).
For the move action, an autonomous agent can move from its current position to an adjacent
position:
poss(move(Ag,P1,P2),S) ←
autonomous(Ag) ∧
adjacent(P1,P2,S)∧
sitting_at(Ag,P1,S) .
The precondition for the unlock action is more complicated. The agent must be at the correct side
of the door and carrying the appropriate key:
poss(unlock(Ag,Door),S)←
autonomous(Ag)∧
between(Door,P1,P2)∧
at(Ag,P1,S)∧
opens(Key,Door)∧
carrying(Ag,Key,S).
We do not assume that the between relation is symmetric. Some doors can only open one way.
We define what is true in each situation recursively in terms of the previous situation and of what
action occurred between the situations. As in the feature-based representation of actions, causal
rules specify when a relation becomes true and frame rules specify when a relation remains
true.
Example 14.6: The primitive unlocked relation can be defined by specifying how different
actions can affect its being true. The door is unlocked in the situation resulting from an unlock
action, as long as the unlock action was possible. This is represented using the following causal
rule:
unlocked(Door,do(unlock(Ag,Door),S)) ←
poss(unlock(Ag,Door),S).
Suppose the only action to make the door locked is to lock the door. Thus, unlocked is true in a
situation following an action if it was true before, if the action was not to lock the door, and if the
action was possible:
unlocked(Door,do(A,S))←
unlocked(Door,S)∧
A≠lock(Door) ∧
poss(A,S).
The only action that undoes the carrying predicate is the putdown action. Thus, carrying is true
after an action if it was true before the action, and the action was not to put down the object. This
is represented in the frame rule:
carrying(Ag,Obj,do(A,S)) ←
carrying(Ag,Obj,S)∧
poss(A,S)∧
A ≠putdown(Ag,Obj).
The other action that makes sitting_at true is the putdown action. An object is sitting at the
location where the agent who put it down was located:
sitting_at(Obj,Pos,do(putdown(Ag,Obj),S)) ←
poss(putdown(Ag,Obj),S)∧
at(Ag,Pos,S).
The only other time that sitting_at is true in a (non-initial) situation is when it was true in the
previous situation and it was not undone by an action. The only actions that undo sitting_at is
a move action or a pickup action. This can be specified by the following frame axiom:
sitting_at(Obj,Pos,do(A,S) ) ←
poss(A,S) ∧
sitting_at(Obj,Pos,S) ∧
∀Pos1 A≠move(Obj,Pos,Pos1) ∧
∀Ag A≠pickup(Ag,Obj) .
Note that the quantification in the body is not the standard quantification for rules. This can be
represented using negation as failure:
sitting_at(Obj,Pos,do(A,S) ) ←
poss(A,S) ∧
sitting_at(Obj,Pos,S) ∧
∼move_action(A,Obj,Pos) ∧
∼pickup_action(A,Obj) .
move_action(move(Obj,Pos,Pos1),Obj,Pos).
pickup_action(pickup(Ag,Obj),Obj).
These clauses are designed not to have a free variable in the scope of the negation.
Example 14.9: Situation calculus can represent more complicated actions than can be
represented with simple addition and deletion of propositions in the state description.
Consider the drop_everything action in which an agent drops everything it is carrying. In
situation calculus, the following axiom can be added to the definition of sitting_at to say that
everything the agent was carrying is now on the ground:
sitting_at(Obj,Pos,do(drop_everything(Ag) ,S) ) ←
poss(drop_everything(Ag),S) ∧
at(Ag,Pos,S) ∧
carrying(Ag,Obj,S) .
A frame axiom for carrying specifies that an agent is not carrying an object after
a drop_everything action.
carrying(Ag,Obj,do(A ,S) ) ←
poss(A,S) ∧
carrying(Ag,Obj,S)∧
A ≠drop_everything(Ag)∧
A ≠putdown(Ag,Obj).
Situation calculus is used for planning by asking for a situation in which a goal is true. Answer
extraction is used to find a situation in which the goal is true. This situation can be interpreted as
a sequence of actions for the agent to perform.
Example 14.10: Suppose the goal is for the robot to have the key k1. The following query asks
for a situation where this is true:
? carrying(rob,k1,S).
This query has the following answer:
S=do(pickup(rob,k1),
do(move(rob,o103,mail),
do(move(rob,o109,o103),
init))).
The preceding answer can be interpreted as a way for Rob to get the key: it moves
from o109 to o103, then to mail, where it picks up the key.
The goal of delivering the parcel (which is, initially, in the lounge, lng) to o111 can be asked
with the query
? at(parcel,o111,S).
do(pickup(rob, parcel),
Using the top-down proof procedure on the situation calculus definitions is very inefficient,
because a frame axiom is almost always applicable. A complete proof procedure, such as
iterative deepening, searches through all permutations of actions even if they are not relevant to
the goal.
Resolution
Resolution is a theorem proving technique that proceeds by building refutation proofs, i.e.,
proofs by contradictions. It was invented by a Mathematician John Alan Robinson in the year
1965.
Resolution is used, if there are various statements are given, and we need to prove a conclusion
of those statements. Unification is a key concept in proofs by resolutions. Resolution is a single
inference rule which can efficiently operate on the conjunctive normal form or clausal form.
Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known as a unit
clause.
The resolution rule for first-order logic is simply a lifted version of the propositional rule.
Resolution can resolve two clauses if they contain complementary literals, which are assumed to
be standardized apart so that they share no variables.
This rule is also called the binary resolution rule because it only resolves exactly two literals.
Example:
Where two complimentary literals are: Loves (f(x), x) and ¬ Loves (a, b)
These literals can be unified with unifier θ= [a/f(x), and b/x] , and it will generate a resolvent
clause:
Example:
In the first step we will convert all the given statements into its first order logic.
In First order logic resolution, it is required to convert the FOL into CNF as CNF form makes
easier for resolution proofs.
In this statement, we will apply negation to the conclusion statements, which will be written as
¬likes(John, Peanuts)
Now in this step, we will solve the problem by resolution tree using substitution. For the above
problem, it will be given as follows:
Hence the negation of the conclusion has been proved as a complete contradiction with the given
set of statements.
Explanation of Resolution graph:
o In the first step of resolution graph, ¬likes(John, Peanuts) , and likes(John, x) get
resolved(canceled) by substitution of {Peanuts/x}, and we are left with ¬ food(Peanuts)
o In the second step of the resolution graph, ¬ food(Peanuts) , and food(z) get resolved
(canceled) by substitution of { Peanuts/z}, and we are left with ¬ eats(y, Peanuts) V
killed(y) .
o In the third step of the resolution graph, ¬ eats(y, Peanuts) and eats (Anil, Peanuts) get
resolved by substitution {Anil/y}, and we are left with Killed(Anil) .
o In the fourth step of the resolution graph, Killed(Anil) and ¬ killed(k) get resolve by
substitution {Anil/k}, and we are left with ¬ alive(Anil) .
o In the last step of the resolution graph ¬ alive(Anil) and alive(Anil) get resolved.
Planning: the task of coming up with a sequence of actions that will achieve a goal Search-based
problem-solving agent Logical planning agent Complex/large scale problems? For the
discussion, we consider classical planning environments that are fully observable, deterministic,
finite, static and discrete (in time, action, objects and effects)
The forward and regression planners enforce a total ordering on actions at all stages of the
planning process. The CSP planner commits to the particular time that the action will be carried
out. This means that those planners have to commit to an ordering of actions that cannot occur
concurrently when adding them to a partial plan, even if there is no particular reason to put one
action before another.
The idea of a partial-order planner is to have a partial ordering between actions and only
commit to an ordering between actions when forced. This is sometimes also called a non-linear
planner, which is a misnomer because such planners often produce a linear plan.
An action, other than start or finish, will be in a partial-order plan to achieve a precondition of an
action in the plan. Each precondition of an action in the plan is either true in the initial state, and
so achieved by start, or there will be an action in the plan that achieves it.
We must ensure that the actions achieve the conditions they were assigned to achieve. Each
precondition P of an action act1 in a plan will have an action act0 associated with it such
that act0 achieves precondition P for act1. The triple ⟨act0,P,act1⟩ is a causal link. The partial
order specifies that action act0 occurs before action act1, which is written as act0 < act1. Any
other action A that makes P false must either be before act0 or after act1.
Informally, a partial-order planner works as follows: Begin with the actions start and finish and
the partial order start < finish. The planner maintains an agenda that is a set of ⟨P,A⟩ pairs,
where A is an action in the plan and P is an atom that is a precondition of A that must be
achieved. Initially the agenda contains pairs ⟨G,finish⟩, where G is an atom that must be true in
the goal state.
At each stage in the planning process, a pair ⟨G,act1⟩ is selected from the agenda, where P is a
precondition for action act1. Then an action, act0, is chosen to achieve P. That action is either
already in the plan - it could be the start action, for example - or it is a new action that is added
to the plan. Action act0 must happen before act1 in the partial order. It adds a causal link that
records that act0 achieves P for action act1. Any action in the plan that deletes P must happen
either before act0 or after act1. If act0 is a new action, its preconditions are added to the agenda,
and the process continues until the agenda is empty.
This is a non-deterministic procedure. The "choose" and the "either ...or ..." form choices that
must be searched over. There are two choices that require search:
which action is selected to achieve G and
whether an action that deletes G happens before act0 or after act1.
non-deterministic procedure
Uncertainty:
Till now, we have learned knowledge representation using first-order logic and propositional
logic with certainty, which means we were sure about the predicates. With this knowledge
representation, we might write A→B, which means if A is true then B is true, but consider a
situation where we are not sure about whether A is true or not then we cannot express this
statement, this situation is called uncertainty.
So to represent uncertain knowledge, where we are not sure about the predicates, we need
uncertain reasoning or probabilistic reasoning.
Following are some leading causes of uncertainty to occur in the real world.
Probabilistic reasoning:
We use probability in probabilistic reasoning because it provides a way to handle the uncertainty
that is the result of someone's laziness and ignorance.
In the real world, there are lots of scenarios, where the certainty of something is not confirmed,
such as "It will rain today," "behavior of someone for some situations," "A match between two
teams or two players." These are probable sentences for which we can assume that it will happen
but not sure about it, so here we use probabilistic reasoning.
In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge:
o Bayes' rule
o Bayesian Statistics
o As probabilistic reasoning uses probability and related terms, so before understanding
probabilistic reasoning, let's understand some common terms:
o Probability: Probability can be defined as a chance that an uncertain event will occur. It
is the numerical measure of the likelihood that an event will occur. The value of
probability always remains between 0 and 1 that represent ideal uncertainties.
We can find the probability of an uncertain event by using the below formula.
Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the real
world.
Prior probability: The prior probability of an event is probability computed before observing
new information.
Posterior Probability: The probability that is calculated after all evidence or information has
taken into account. It is a combination of prior probability and new information.
Conditional probability:
Conditional probability is a probability of occurring an event when another event has already
happened.
Let's suppose, we want to calculate the event A when event B has already occurred, "the
probability of A under the conditions of B", it can be written as:
If the probability of A is given and we need to find the probability of B, then it will be given as:
It can be explained by using the below Venn diagram, where B is occurred event, so sample
space will be reduced to set B, and now we can only calculate event A when event B is already
occurred by dividing the probability of P(A⋀B) by P( B ).
Example:
In a class, there are 70% of the students who like English and 40% of the students who likes
English and mathematics, and then what is the percent of students those who like English also
like mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.
Hence, 57% are the students who like English also like Mathematics.
Bayesian Networks
Bayesian belief network is key computer technology for dealing with probabilistic events and to
solve a problem which has uncertainty. We can define a Bayesian network as:
"A Bayesian network is a probabilistic graphical model which represents a set of variables and
their conditional dependencies using a directed acyclic graph."
It is also called a Bayes network, belief network, decision network, or Bayesian model.
Bayesian networks are probabilistic, because these networks are built from a probability
distribution, and also use probability theory for prediction and anomaly detection.
Real world applications are probabilistic in nature, and to represent the relationship between
multiple events, we need a Bayesian network. It can also be used in various tasks
including prediction, anomaly detection, diagnostics, automated insight, reasoning, time
series prediction, and decision making under uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and it
consists of two parts:
The generalized form of Bayesian network that represents and solve decision problems under
uncertain knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
o Causal Component
o Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi) ),
which determines the effect of the parent on that node.
Bayesian network is based on Joint probability distribution and conditional probability. So let's
first understand the joint probability distribution:
If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination of x1,
x2, x3.. xn, are known as Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability
distribution.
In general for each variable Xi, we can write the equation as:
Let's understand the Bayesian network through an example by creating a directed acyclic graph:
Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably
responds at detecting a burglary but also responds for minor earthquakes. Harry has two
neighbors David and Sophia, who have taken a responsibility to inform Harry at work when they
hear the alarm. David always calls Harry when he hears the alarm, but sometimes he got
confused with the phone ringing and calls at that time too. On the other hand, Sophia likes to
listen to high music, so sometimes she misses to hear the alarm. Here we would like to compute
the probability of Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an
earthquake occurred, and David and Sophia both called the Harry.
Solution:
o The Bayesian network for the above problem is given below. The network structure is
showing that burglary and earthquake is the parent node of the alarm and directly
affecting the probability of alarm's going off, but David and Sophia's calls depend on
alarm probability.
o The network is representing that our assumptions do not directly perceive the burglary
and also do not notice the minor earthquake, and they also not confer before calling.
o The conditional distributions for each node are given as conditional probabilities table or
CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table represent an
exhaustive set of cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if
there are two parents, then CPT will contain 4 probability values
o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can
rewrite the above probability statement using joint probability distribution:
Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
We can provide the conditional probabilities as per the below tables:
The Conditional probability of David that he will call depends on the probability of Alarm.
The Conditional probability of Sophia that she calls is depending on its Parent Node "Alarm."
From the formula of joint distribution, we can write the problem statement in the form of
probability distribution:
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using Joint
distribution.
There are two ways to understand the semantics of the Bayesian network, which is given below:
Speech
Written Text
Components of NLP:
Difficulties in NLU:
NLP Terminology:
Steps in NLP:
There are a number of algorithms researchers have developed for syntactic analysis, but we consider
only the following simple methods −
Context-Free Grammar
Top-Down Parser
Let us see them in detail –
Context-Free Grammar
It is the grammar that consists rules with a single symbol on the left-hand side of the rewrite rules.
Let us create grammar to parse a sentence −
“The bird pecks the grains”
Articles (DET) − a | an | the
Nouns − bird | birds | grain | grains
Noun Phrase (NP) − Article + Noun | Article + Adjective + Noun
= DET N | DET ADJ N
Verbs − pecks | pecking | pecked
Verb Phrase (VP) − NP V | V NP
Adjectives (ADJ) − beautiful | small | chirping
The parse tree breaks down the sentence into structured parts so that the computer can easily
understand and process it. In order for the parsing algorithm to construct this parse tree, a set of rewrite
rules, which describe what tree structures are legal, need to be constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence of other symbols.
According to first order logic rule, if there are two strings Noun Phrase (NP) and Verb Phrase (VP),
then the string combined by NP followed by VP is a sentence. The rewrite rules for the sentence are
as follows −
S → NP VP
NP → DET N | DET ADJ N
VP → V NP
Lexocon −
DET → a | the
ADJ → beautiful | perching
N → bird | birds | grain | grains
V → peck | pecks | pecking
The parse tree can be created as shown −
Now consider the above rewrite rules. Since V can be replaced by both, "peck" or "pecks", sentences
such as "The bird peck the grains" can be wrongly permitted. i. e. the subject-verb agreement error is
approved as correct.
Merit − The simplest style of grammar, therefore widely used one.
Demerits −
They are not highly precise. For example, “The grains peck the bird”, is a syntactically correct
according to parser, but even if it makes no sense, parser takes it as a correct sentence.
To bring out high precision, multiple sets of grammar need to be prepared. It may require a
completely different sets of rules for parsing singular and plural variations, passive sentences,
etc., which can lead to creation of huge set of rules that are unmanageable.
Top-Down Parser
Here, the parser starts with the S symbol and attempts to rewrite it into a sequence of terminal
symbols that matches the classes of the words in the input sentence until it consists entirely of terminal
symbols.
These are then checked with the input sentence to see if it matched. If not, the process is started over
again with a different set of rules. This is repeated until a specific rule is found which describes the
structure of the sentence.
Merit − It is simple to implement.
Demerits −
Expert Systems
Expert systems (ES) are one of the prominent research domains of AI. It is introduced by the
researchers at Stanford University, Computer Science Department.
The expert systems are the computer applications developed to solve complex problems in a particular
domain, at the level of extra-ordinary human intelligence and expertise.
Characteristics of Expert Systems:
High performance
Understandable
Reliable
Highly responsive
Advising
Instructing and assisting human in decision making
Demonstrating
Deriving a solution
Diagnosing
Explaining
Interpreting input
Predicting results
Justifying the conclusion
Suggesting alternative options to a problem
They are incapable of −
Knowledge Base
Inference Engine
User Interface
Let us see them one by one briefly –
Knowledge Base
Inference Engine
Use of efficient procedures and rules by the Inference Engine is essential in deducting a correct,
flawless solution.
In case of knowledge-based ES, the Inference Engine acquires and manipulates the knowledge from
the knowledge base to arrive at a particular solution.
In case of rule based ES, it −
Applies rules repeatedly to the facts, which are obtained from earlier rule application.
Adds new knowledge into the knowledge base if required.
Resolves rules conflict when multiple rules are applicable to a particular case.
To recommend a solution, the Inference Engine uses the following strategies −
Forward Chaining
Backward Chaining
Forward Chaining
It is a strategy of an expert system to answer the question, “What can happen next?”
Here, the Inference Engine follows the chain of conditions and derivations and finally deduces the
outcome. It considers all the facts and rules, and sorts them before concluding to a solution.
This strategy is followed for working on conclusion, result, or effect. For example, prediction of share
market status as an effect of changes in interest rates.
Backward Chaining
With this strategy, an expert system finds out the answer to the question, “Why this happened?”
On the basis of what has already happened, the Inference Engine tries to find out which conditions
could have happened in the past for this result. This strategy is followed for finding out cause or
reason. For example, diagnosis of blood cancer in humans.
User Interface
User interface provides interaction between user of the ES and the ES itself. It is generally Natural
Language Processing so as to be used by the user who is well-versed in the task domain. The user of
the ES need not be necessarily an expert in Artificial Intelligence.
It explains how the ES has arrived at a particular recommendation. The explanation may appear in
the following forms −
No technology can offer easy and complete solution. Large systems are costly, require significant
development time, and computer resources. ESs have their limitations which include −
Limitations of the technology
Difficult knowledge acquisition
ES are difficult to maintain
High development costs
Application Description
There are several levels of ES technologies available. Expert systems technologies include −
Expert System Development Environment − The ES development environment includes
hardware and tools. They are −
o Workstations, minicomputers, mainframes.
o High level Symbolic Programming Languages such as LISt Programming (LISP)
and PROgrammation en LOGique (PROLOG).
o Large databases.
Tools − They reduce the effort and cost involved in developing an expert system to large
extent.
o Powerful editors and debugging tools with multi-windows.
o They provide rapid prototyping
o Have Inbuilt definitions of model, knowledge representation, and inference design.
Shells − A shell is nothing but an expert system without knowledge base. A shell provides the
developers with knowledge acquisition, inference engine, user interface, and explanation
facility. For example, few shells are given below −
o Java Expert System Shell (JESS) that provides fully developed Java API for creating
an expert system.
o Vidwan, a shell developed at the National Centre for Software Technology, Mumbai in
1993. It enables knowledge encoding in the form of IF-THEN rules.
Robotics
Robotics is a domain in artificial intelligence that deals with the study of creating intelligent and
efficient robots.
What is Robotics?
Aspects of Robotics:
The robots have mechanical construction, form, or shape designed to accomplish a particular
task.
They have electrical components which power and control the machinery.
They contain some level of computer program that determines what, when and how a robot
does something.
AI Programs Robots
The input to an AI program is in symbols Inputs to robots is analog signal in the form of
and rules. speech waveform or images
They need general purpose computers to They need special hardware with sensors and
operate on. effectors.
Robot Locomotion:
Locomotion is the mechanism that makes a robot capable of moving in its environment. There are
various types of locomotions −
Legged
Wheeled
Combination of Legged and Wheeled Locomotion
Tracked slip/skid
Legged Locomotion
This type of locomotion consumes more power while demonstrating walk, jump, trot, hop,
climb up or down, etc.
It requires more number of motors to accomplish a movement. It is suited for rough as well
as smooth terrain where irregular or too smooth surface makes it consume more power for a
wheeled locomotion. It is little difficult to implement because of stability issues.
It comes with the variety of one, two, four, and six legs. If a robot has multiple legs then leg
coordination is necessary for locomotion.
The total number of possible gaits (a periodic sequence of lift and release events for each of the total
legs) a robot can travel depends upon the number of its legs.
If a robot has k legs, then the number of possible events N = (2k-1)!.
In case of a two-legged robot (k=2), the number of possible events is N = (2k-1)! = (2*2-1)! = 3! = 6.
Hence there are six possible different events −
In case of k=6 legs, there are 39916800 possible events. Hence the complexity of robots is directly
proportional to the number of legs.
Wheeled Locomotion
It requires fewer number of motors to accomplish a movement. It is little easy to implement as there
are less stability issues in case of more number of wheels. It is power efficient as compared to legged
locomotion.
Standard wheel − Rotates around the wheel axle and around the contact
Castor wheel − Rotates around the wheel axle and the offset steering joint.
Swedish 45o and Swedish 90o wheels − Omni-wheel, rotates around the contact point, around
the wheel axle, and around the rollers.
Ball or spherical wheel − Omnidirectional wheel, technically difficult to implement.
Slip/Skid Locomotion
In this type, the vehicles use tracks as in a tank. The robot is steered by moving the tracks with
different speeds in the same or opposite direction. It offers stability because of large contact area of
track and ground.
Components of a Robot:
Computer Vision:
This is a technology of AI with which the robots can see. The computer vision plays vital role in the
domains of safety, security, health, access, and entertainment.
Computer vision automatically extracts, analyzes, and comprehends useful information from a single
image or an array of images. This process involves development of algorithms to accomplish
automatic visual comprehension.
Hardware of Computer Vision System
This involves −
Power supply
Image acquisition device such as camera
A processor
A software
A display device for monitoring the system
Accessories such as camera stands, cables, and connectors
OCR − In the domain of computers, Optical Character Reader, a software to convert scanned
documents into editable text, which accompanies a scanner.
Face Detection − Many state-of-the-art cameras come with this feature, which enables to read
the face and take the picture of that perfect expression. It is used to let a user access the
software on correct match.
Object Recognition − They are installed in supermarkets, cameras, high-end cars such as
BMW, GM, and Volvo.
Estimating Position − It is estimating position of an object with respect to camera as in
position of tumor in human’s body.
Agriculture
Autonomous vehicles
Biometrics
Character recognition
Forensics, security, and surveillance
Industrial quality inspection
Face recognition
Gesture analysis
Geoscience
Medical imagery
Pollution monitoring
Process control
Remote sensing
Robotics
Transport
Applications of Robotics:
Both players try to win the game. So, both of them try to make the best move possible at each
turn. Searching techniques like BFS(Breadth First Search) are not accurate for this as the
branching factor is very high, so searching will take a lot of time. So, we need another search
procedures that improve –
The most common search technique in game playing is Minimax search procedure. It is depth-
first depth-limited search procedure. It is used for games like chess and tic-tac-toe.
Games have always been an important application area for heuristic algorithms. In playing
games whose state space may be exhaustively delineated, the primary difficulty is in accounting
for the actions of the opponent. This can be handled easily by assuming that the opponent uses
the same knowledge of the state space as us and applies that knowledge in a consistent effort to
win the game. Minmax implements game search under referred to as MIN and MAX.
The min max search procedure is a depth first, depth limited search procedure. The idea is to
start at the current position and use the plausible move generator to generate the set of possible
successor positions. To decide one move, it explores the possibilities of winning by looking
ahead to more than one step. This is called a ply. Thus in a two ply search, to decide the current
move, game tree would be explored two levels farther.
the second player’s move is maximizing, so maximum value of all children of one node will be
back propagated to node. Thus, the nodes B, C, D, get the values 4, 5, 6 respectively. Again as
ply 1 is minimizing, so the minimum value out of these i.e. 4 is propagated to A. then from A
move will be taken to B.
MIN MAX procedure is straightforward recursive procedure that relies on two auxiliary
procedures that are specific to the game being played.
1. MOVEGEN (position, player): the move generator which returns a list of nodes representing
the moves that can be made by player in position. We may have 2 players namely PLAYER-
TWO in a chess problem.
2. STATIC (position, player): the static evaluation function, which returns a number representing
the goodness of position from the standpoint of player.
We assume that MIN MAX returns a structure containing both results and that we have two
functions, VALUE and PATH that extract the separate components. A function LAST PLY is
taken which is assumed to evaluate all of the factors and to return TRUE if the search should be
stopped at the current level and FALSE otherwise.
MIN MAX procedure takes three parameters like a board position, a current depth of the search
and the players to move. So the initial call to compute the best move from the position
CURRENT should be
Or
When the initial call to MIN MAX returns, the best move from CURRENT is the first
element in the PATH.
2.2 Alpha- Beta (α-β) Pruning
When a number of states of a game increase and it cannot be predicted about the states, then we
can use the method pruning. Pruning is a method which is used to reduce the no. of states in a
game. Alpha- beta is one such pruning technique. The problem with minmax search is that the
number of game states it has to examine is exponential in the number of moves. Unfortunately
we cannot eliminate the exponent, but we can effectively cut it in half. Alpha-beta pruning is
one of the solutions to the problem of minmax search tree. When α-β pruning is applied to a
standard minmax tree, it returns the same move as minmax would, but prunes away branches that
cannot possibly influence the final decision.
The idea of alpha beta pruning is very simple. Alpha beta search proceeds in a depth first fashion
rather than searching the entire space. Generally two values, called alpha and beta, are created
during the search. The alpha value is associated with MAX nodes and the beta value is with MIN
values. The value of alpha can never decrease; on the other hand the value of beta never
increases. Suppose the alpha value of A MAX node is 5. The MAX node then need not
consider any transmitted value less than or equal to 5 which is associated with any MIN node
below it. Alpha is the worst that MAX can score given that MIN will also do its best. Similarly,
if a MIN has a beta value of 5, it need not further consider any MAX node below it that has a
value of 6 or more.
The general principal is that: consider a node η somewhere in the search tree, such that player
has a choice of moving to that node. If player has a better choice К either at the parent node
of η or at any choice point further up, then η will never be reached in actual play. So once we
have found out enough about η (by examining some of its descendents) to reach this conclusion,
we can prune it.
We can also say that “α” is the value of the best choice we have found so far at any choice point
along the path for MAX. Similarly “β” is the value of the best choice we have found so far at
any choice point along the path for MIN. Consider the following example
Figure
Here at MIN ply, the best value from three nodes is - 4, 5, 0. These will be back propagated
towards root and a maximizing move 5 will be taken. Now the node E has the value 8 is far
more, then accepted as it is minimizing ply. So, further node E will not be explored. In the
situation when more plies are considered, whole sub tree below E will be pruned. Similarly if
α=0, β=7, all the nodes and related sub trees having value less than 0 at maximizing ply and
more than 7 at minimizing ply will be pruned.
Alpha beta search updates the value of α and β as it goes along and prunes the remaining
branches at a node as soon as the value of the current node is known to be worse than the current
α and β value for MAX or MIN respectively. The effectiveness of alpha- beta pruning is highly
dependent on the order in which the successors are examined suppose in a search tree the
branching factor is x and depth d. the α-β search needs examining only xd/2 nodes to pick up
best move, instead of xd for MINMAX.
Figure
The above figure shows a 3x3 chessboard with each square labeled with integers 1 to 9. We
simply enumerate the alternative moves rather than developing a general move operator because
of the reduced size of the problem. Using a predicate called move in predicate calculus, whose
parameters are the starting and ending squares, we have described the legal moves on the board.
For example, move (1, 8) takes the knight from the upper left-hand corner to the middle of the
bottom row. While playing Chess, a knight can move two squares either horizontally or
vertically followed by one square in an orthogonal direction as long as it does not move off the
board.
The above predicates of the Chess Problem form the knowledge base for this problem. An
unification algorithm is used to access the knowledge base. Suppose we need to find the
positions to which the knight can move from a particular location, square 2. The goal move (z, x)
unifies with two different predicates in the knowledge base, with the substitutions {7/x} and
{9/x}. Given the goal move (2, 3), the responsible is failure, because no move (2, 3) exists in
the knowledge base.
Comments:
_ In this game a lots of production rules are applied for each move of the square on the
chessboard.
_ A lots of searching are required in this game.
_ Implementation of algorithm in the knowledge base is very important.
Puzzles(Tiles) Problem
Definition:
“It has set off a 3x3 board having 9 block spaces out of which 8 blocks having tiles bearing
number from 1 to 8. One space is left blank. The tile adjacent to blank space can move into it.
We have to arrange the tiles in a sequence for getting the goal state”.
Procedure:
The 8-puzzle problem belongs to the category of “sliding block puzzle” type of problem. The 8-
puzzle is a square tray in which eight square tiles are placed. The remaining ninth square is
uncovered. Each tile in the tray has a number on it. A tile that is adjacent to blank space can be
slide into that space. The game consists of a starting position and a specified goal position. The
goal is to transform the starting position into the goal position by sliding the tiles around. The
control mechanisms for an 8-puzzle solver must keep track of the order in which operations are
performed, so that the operations can be undone one at a time if necessary. The objective of the
puzzles is to find a sequence of tile movements that leads from a starting configuration to a goal
configuration such as two situations given below.
The state of 8-puzzle is the different permutation of tiles within the frame. The operations are the
permissible moves up, down, left, right. Here at each step of the problem a function f(x) will be
defined which is the combination of g(x) and h(x).
i.e. F(x)=g(x) + h (x)
Where g _x_: how many steps in the problem you have already done or the current state from the
initial state.
h _x_: Number of ways through which you can reach at the goal state from the current state or
Or
F(x)=g(x) + h (x)
h _x_is the heuristic estimator that compares the current state with the goal state note down how
many states are displaced from the initial or the current state. After calculating the f (x) value at
each step finally take the smallest f (x) value at every step and choose that as the next current
state to get the goal state.
Step 2:
In this step, from the tray C three states can be drawn. The empty position will contain either 5 or
3 or 6. So for three different values three different states can be obtained. Then calculate each of
their f (x) and
Step 3:
The tray F can have 4 different states as the empty positions can be filled with b4 values i.e.2, 4,
5, 8.
Step 4:
In the step-3 the tray I has the smallest f (n) value. The tray I can be implemented in 3 different
states because the empty position can be filled by the members like 7, 8, 6.
Hence, we reached at the goal state after few changes of tiles in different positions of the trays.
Comments:
This problem requires a lot of space for saving the different trays.
Time complexity is more than that of other problems.
The user has to be very careful about the shifting of tiles in the trays.
Very complex puzzle games can be solved by this technique.