ML unit-5
ML unit-5
UNIT-V
Reinforcement Learning; Overview of reinforcement learning, Getting Lost Example.
Markov Chain Monte Carlo Methods: Sampling, Proposal Distribution, Markov Chain Monte
Carlo.
Graphical Models: Bayesian Networks, Markov Random Fields, Hidden Markov Models,
Tracking Methods.
Reinforcement Learning:
Reinforcement learning (RL) is a type of machine learning where an agent learns to
make decisions by interacting with an environment.
The agent receives feedback in the form of rewards or penalties based on the actions it
takes, and its goal is to maximize the cumulative reward over time.
1. Agent:
o The learner or decision-maker that interacts with the environment.
2. Environment:
o The external system the agent interacts with. It provides feedback based on the
agent's actions.
3. State:
o A representation of the current situation of the environment. The agent
perceives the environment through states.
4. Action:
o The set of all possible moves the agent can make in the environment.
5. Reward:
o Feedback from the environment based on the agent's actions. Positive rewards
incentivize desirable actions, while negative rewards (or penalties) discourage
undesirable actions.
6. Policy:
o A strategy used by the agent to determine the next action based on the current
state. It can be deterministic or stochastic.
7. Value Function:
o A function that estimates the expected cumulative reward of states or state-
action pairs, helping the agent to make decisions that maximize long-term
rewards.
1
R22 Machine Learning Lecture Notes
1. Exploration:
o The agent tries out different actions to discover their effects and gather
information about the environment.
2. Exploitation:
o The agent uses its knowledge to choose actions that it believes will maximize
the reward.
3. Balance:
o Effective RL requires balancing exploration and exploitation to ensure the
agent learns the optimal policy.
1. Q-Learning:
o A model-free algorithm where the agent learns a value function Q(s,a), which
represents the expected utility of taking action a in state s and following the
optimal policy thereafter.
2. SARSA (State-Action-Reward-State-Action):
o Similar to Q-Learning, but updates the Q-value based on the action actually
taken, considering the policy followed by the agent.
1. Environment:
o The maze consists of a grid with walls, open spaces, and an exit.
o The robot starts at a random position and must find the exit.
2. State:
o The current position of the robot in the maze, represented by coordinates (x,
y).
3. Actions:
o The robot can move up, down, left, or right.
4. Rewards:
o Positive reward for reaching the exit.
o Negative reward for hitting a wall.
2
R22 Machine Learning Lecture Notes
Reinforcement learning is a powerful approach to building intelligent systems that can adapt
and improve through experience, opening up possibilities across a wide range of applications.
1. Markov Chain:
A sequence of random variables where the next state depends only on the current state
(the Markov property).
The chain has a stationary distribution that it converges to over time.
2. Monte Carlo:
3
R22 Machine Learning Lecture Notes
1. Initialization:
o Start with an initial state (or set of states) from the target distribution.
2. Iteration:
o Propose a new state based on a proposal distribution.
o Accept or reject the new state based on a criterion (e.g., Metropolis-Hastings
algorithm)
3. Convergence:
o After many iterations, the distribution of the states will approximate the target
distribution.
Common Algorithms
1. Metropolis-Hastings Algorithm:
o Proposes new states and accepts or rejects them based on the acceptance ratio
o Widely used for its simplicity and flexibility.
2. Gibbs Sampling:
Samples each variable in turn, conditional on the current values of the other
variables.
Useful when the conditional distributions are easier to sample from.
Applications of MCMC
1. Bayesian Inference:
o Estimating posterior distributions of parameters when the likelihood and prior
are known.
o Useful for hierarchical models and complex data structures.
2. Statistical Physics:
3. Machine Learning:
Sampling:
Sampling is a technique used to select a subset of data from a larger population, allowing for
the analysis and inference of population characteristics without examining the entire dataset.
4
R22 Machine Learning Lecture Notes
Types of Sampling
1. Probability Sampling:
o Description: Every member of the population has a known, non-zero chance
of being selected.
o Examples:
Simple Random Sampling: Every member of the population has an
equal chance of being selected.
Systematic Sampling: Selects every k-th member from a list after a
random start.
Stratified Sampling: Divides the population into strata (groups) and
samples from each stratum.
Cluster Sampling: Divides the population into clusters and randomly
selects entire clusters.
2. Non-Probability Sampling:
o Description: Not every member of the population has a known or equal
chance of being selected.
o Examples:
Convenience Sampling: Samples are selected based on their
availability or ease of access.
Judgmental (Purposive) Sampling: Samples are selected based on
the researcher’s judgment.
Quota Sampling: Ensures representation by selecting samples to meet
certain quotas.
Snowball Sampling: Current subjects recruit future subjects from their
acquaintances.
Proposal Distribution:
A proposal distribution is a fundamental component in Markov Chain Monte Carlo
(MCMC) methods.
It is used to generate new candidate samples from a target probability distribution,
especially when direct sampling is not feasible.
A proposal distribution, denoted as q(x′∣x), is a probability distribution used to propose
new candidate states x' given the current state x.
The new candidate state is then accepted or rejected based on a criterion designed to
ensure that the sequence of samples converges to the target distribution π(x).
Markov Chain Monte Carlo Algorithms:
Metropolis-Hastings Algorithm:
5
R22 Machine Learning Lecture Notes
Use Case: Widely applicable and flexible for various target distributions.
Gibbs Sampling:
Description: Samples each variable in turn from its conditional distribution given the
current values of the other variables.
Process:
1. Initialize all variables.
2. Sample each variable xi from p(xi∣other variables).
3. Repeat until convergence.
Use Case: Effective when conditional distributions are easier to sample from.
Example: Ideal for Bayesian networks and hierarchical models.
Graphical Models:
Graphical models are a powerful framework for representing complex dependencies
among variables in a visual and mathematical way.
Bayesian Networks:
Bayesian Networks (BNs) are a type of probabilistic graphical model that uses directed
acyclic graphs (DAGs) to represent a set of variables and their conditional
dependencies.
They are particularly powerful for modeling complex systems where understanding the
relationships between variables is crucial.
6
R22 Machine Learning Lecture Notes
Joint Probability:
Joint probability is a probability of two or more events happening together. For
example, the joint probability of two events A and B is the probability that both events
occur, P(A∩B).
P(A ∩ B) = P(A) · P(B)
P(A ∩ B) = P(A | B) · P(B)
Conditional Probability:
Conditional probability defines the probability that event B will occur, given that event
A has already occurred.
Example:
Burglary ‘B’ –
Fire ‘F’ –
Alarm ‘A’ –
7
R22 Machine Learning Lecture Notes
B F P (A=T) P (A=F)
T T 0.95 0.05
T F 0.94 0.06
F T 0.29 0.71
F F 0.001 0.999
Person ‘P1’ –
A P (P1=T) P (P1=F)
T 0.95 0.05
F 0.05 0.95
Person ‘P2’ –
A P (P2=T) P (P2=F)
T 0.80 0.20
F 0.01 0.99
= 0.00075
Applications
8
R22 Machine Learning Lecture Notes
1. Nodes (Vertices):
o Each node represents a random variable.
o Nodes can represent observed data, hidden variables, or any entities in the
model.
2. Edges (Links):
o Undirected edges between nodes indicate direct dependencies.
o Unlike Bayesian Networks, MRFs use undirected edges to capture the
symmetrical nature of relationships.
3. Clique Potentials (Factors):
o Potential functions are associated with cliques (fully connected subgraphs) of
the graph.
o They represent the local dependencies among the variables in a clique.
o These potential functions are often denoted as ψ
Applications
9
R22 Machine Learning Lecture Notes
The hidden states are the underlying variables that generate the observed data, but
they are not directly observable.
The observations are the variables that are measured and observed.
The Hidden Markov Model (HMM) is the relationship between the hidden states and the
observations using two sets of probabilities: the transition probabilities and the emission
probabilities.
The transition probabilities describe the probability of transitioning from one hidden
state to another.
The state space is the set of all possible hidden states, and the observation space is the set of
all possible observations.
These are the probabilities of transitioning from one state to another. This forms the
transition matrix, which describes the probability of moving from one state to another.
These are the probabilities of generating each observation from each state. This forms the
emission matrix, which describes the probability of generating each observation from each
state.
10
R22 Machine Learning Lecture Notes
The parameters of the state transition probabilities and the observation likelihoods are
estimated using the Baum-Welch algorithm, or the forward-backward algorithm. This is done
by iteratively updating the parameters until convergence.
Given the observed data, the Viterbi algorithm is used to compute the most likely sequence
of hidden states. This can be used to predict future observations, classify sequences, or detect
patterns in sequential data.
The performance of the HMM can be evaluated using various metrics, such as accuracy,
precision, recall, or F1 score.
Tracking Methods:
Tracking methods in machine learning, often referred to as object tracking, involve
techniques used to locate and follow an object's position over time in a sequence of
frames or images.
These methods have applications in various fields, including computer vision, robotics,
surveillance, and augmented reality.
Kalman Filter:
The Kalman filter is an optimal estimator for linear systems with Gaussian noise.
It provides a recursive solution to the linear quadratic estimation problem, efficiently
processing noisy measurements to produce an estimate of the system's state.
Components:
11
R22 Machine Learning Lecture Notes
Algorithm:
1. Prediction:
o Predict the next state
o Predict the error covariance
2. Update:
o Compute the Kalman gain
o Update the state estimate
o Update the error covariance
Applications:
Particle Filter:
The particle filter, or Sequential Monte Carlo (SMC) method, is used for non-linear,
non-Gaussian systems.
It represents the posterior distribution of the state using a set of random samples
(particles) and weights.
Components
1. Particles:
o A set of samples representing possible states.
2. Weights:
12
R22 Machine Learning Lecture Notes
o Importance weights for each particle, representing the likelihood given the
observations.
Algorithm:
1. Initialization:
o Generate an initial set of particles from the prior distribution.
o Initialize weights
2. Prediction:
o Propagate particles according to the state transition model
3. Update:
o Update weights based on the measurement likelihood
o Normalize weights
4. Resampling:
o Resample particles based on their weights to avoid degeneracy.
Applications:
13
R22 Machine Learning Lecture Notes
Comparison:
Kalman Filter:
o Assumes linear dynamics and Gaussian noise.
o Computationally efficient.
o Optimal for linear systems.
Particle Filter:
o Handles non-linear and non-Gaussian systems.
o More computationally intensive.
o Provides a flexible framework for complex systems.
*****
14