0% found this document useful (0 votes)

232 views

ML unit-5

Uploaded by

yenagandula.narendra2904

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

232 views

ML unit-5

Uploaded by

yenagandula.narendra2904

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

R22 Machine Learning Lecture Notes

UNIT-V
Reinforcement Learning; Overview of reinforcement learning, Getting Lost Example.
Markov Chain Monte Carlo Methods: Sampling, Proposal Distribution, Markov Chain Monte
Carlo.
Graphical Models: Bayesian Networks, Markov Random Fields, Hidden Markov Models,
Tracking Methods.
Reinforcement Learning:
 Reinforcement learning (RL) is a type of machine learning where an agent learns to
make decisions by interacting with an environment.
 The agent receives feedback in the form of rewards or penalties based on the actions it
takes, and its goal is to maximize the cumulative reward over time.

Key Components of Reinforcement Learning

1. Agent:
o The learner or decision-maker that interacts with the environment.
2. Environment:
o The external system the agent interacts with. It provides feedback based on the
agent's actions.
3. State:
o A representation of the current situation of the environment. The agent
perceives the environment through states.
4. Action:
o The set of all possible moves the agent can make in the environment.
5. Reward:
o Feedback from the environment based on the agent's actions. Positive rewards
incentivize desirable actions, while negative rewards (or penalties) discourage
undesirable actions.
6. Policy:
o A strategy used by the agent to determine the next action based on the current
state. It can be deterministic or stochastic.
7. Value Function:
o A function that estimates the expected cumulative reward of states or state-
action pairs, helping the agent to make decisions that maximize long-term
rewards.

1
R22 Machine Learning Lecture Notes

The Learning Process

1. Exploration:
o The agent tries out different actions to discover their effects and gather
information about the environment.
2. Exploitation:
o The agent uses its knowledge to choose actions that it believes will maximize
the reward.
3. Balance:
o Effective RL requires balancing exploration and exploitation to ensure the
agent learns the optimal policy.

Algorithms in Reinforcement Learning

1. Q-Learning:
o A model-free algorithm where the agent learns a value function Q(s,a), which
represents the expected utility of taking action a in state s and following the
optimal policy thereafter.
2. SARSA (State-Action-Reward-State-Action):
o Similar to Q-Learning, but updates the Q-value based on the action actually
taken, considering the policy followed by the agent.

Getting Lost Example:

 "getting lost" example using reinforcement learning to understand how an agent learns
to navigate an environment, avoid pitfalls, and reach its goal.
 We can imagine a scenario where an agent (like a robot) is placed in a maze and needs
to find its way to the exit.

Scenario: Robot Navigating a Maze

1. Environment:
o The maze consists of a grid with walls, open spaces, and an exit.
o The robot starts at a random position and must find the exit.
2. State:
o The current position of the robot in the maze, represented by coordinates (x,
y).
3. Actions:
o The robot can move up, down, left, or right.
4. Rewards:
o Positive reward for reaching the exit.
o Negative reward for hitting a wall.

2
R22 Machine Learning Lecture Notes

Applications of Reinforcement Learning:

Reinforcement learning is a powerful approach to building intelligent systems that can adapt
and improve through experience, opening up possibilities across a wide range of applications.

1. Game Playing: RL agents have achieved superhuman performance in games like

chess, Go, and video games (e.g., AlphaGo, OpenAI Five).
2. Robotics: Training robots to perform complex tasks such as walking, grasping
objects, and navigating environments.
3. Autonomous Vehicles: Learning to drive safely and efficiently in various traffic
conditions.
4. Healthcare: Optimizing treatment strategies, personalized medicine, and managing
clinical trials.
5. Finance: Algorithmic trading, portfolio management, and risk assessment.

Markov Chain Monte Carlo Methods:

 Markov Chain Monte Carlo (MCMC) methods are a class of algorithms used to sample
from complex probability distributions, especially when direct sampling is difficult.
 These methods are widely used in Bayesian statistics, computational physics, and other
fields where dealing with high-dimensional integrals is necessary.
Key Concepts of MCMC Methods:

1. Markov Chain:

 A sequence of random variables where the next state depends only on the current state
(the Markov property).
 The chain has a stationary distribution that it converges to over time.

2. Monte Carlo:

 A technique that uses random sampling to estimate numerical results. 

 In MCMC, Monte Carlo methods are used to generate samples from the probability
distribution.

3
R22 Machine Learning Lecture Notes

How MCMC Works:

1. Initialization:
o Start with an initial state (or set of states) from the target distribution.
2. Iteration:
o Propose a new state based on a proposal distribution.
o Accept or reject the new state based on a criterion (e.g., Metropolis-Hastings
algorithm)

3. Convergence:

o After many iterations, the distribution of the states will approximate the target
distribution.

Common Algorithms

1. Metropolis-Hastings Algorithm:
o Proposes new states and accepts or rejects them based on the acceptance ratio
o Widely used for its simplicity and flexibility.
2. Gibbs Sampling:

 Samples each variable in turn, conditional on the current values of the other
variables.
 Useful when the conditional distributions are easier to sample from.

Applications of MCMC

1. Bayesian Inference:
o Estimating posterior distributions of parameters when the likelihood and prior
are known.
o Useful for hierarchical models and complex data structures.
2. Statistical Physics:

 Simulating the behavior of physical systems at the atomic or molecular level.

 Estimating properties like magnetization or phase transitions.

3. Machine Learning:

 Training models with complex likelihood functions.

 Bayesian neural networks and other probabilistic models.

4. Ecology and Evolutionary Biology:

 Estimating parameters of population dynamics models.

 Studying the evolution of traits under different selection pressures.

Sampling:
Sampling is a technique used to select a subset of data from a larger population, allowing for
the analysis and inference of population characteristics without examining the entire dataset.

4
R22 Machine Learning Lecture Notes

Types of Sampling

1. Probability Sampling:
o Description: Every member of the population has a known, non-zero chance
of being selected.
o Examples:
 Simple Random Sampling: Every member of the population has an
equal chance of being selected.
 Systematic Sampling: Selects every k-th member from a list after a
random start.
 Stratified Sampling: Divides the population into strata (groups) and
samples from each stratum.
 Cluster Sampling: Divides the population into clusters and randomly
selects entire clusters.
2. Non-Probability Sampling:
o Description: Not every member of the population has a known or equal
chance of being selected.
o Examples:
 Convenience Sampling: Samples are selected based on their
availability or ease of access.
 Judgmental (Purposive) Sampling: Samples are selected based on
the researcher’s judgment.
 Quota Sampling: Ensures representation by selecting samples to meet
certain quotas.
 Snowball Sampling: Current subjects recruit future subjects from their
acquaintances.

Proposal Distribution:
 A proposal distribution is a fundamental component in Markov Chain Monte Carlo
(MCMC) methods.
 It is used to generate new candidate samples from a target probability distribution,
especially when direct sampling is not feasible.
 A proposal distribution, denoted as q(x′∣x), is a probability distribution used to propose
new candidate states x' given the current state x.
 The new candidate state is then accepted or rejected based on a criterion designed to
ensure that the sequence of samples converges to the target distribution π(x).
Markov Chain Monte Carlo Algorithms:
Metropolis-Hastings Algorithm:

 Description: A general algorithm that generates a candidate sample from a proposal

distribution and accepts or rejects it based on an acceptance probability.
 Process:
o Initialize at a state x0.
o Generate a candidate state x′ from the proposal distribution q(x′∣x).
o Calculate the acceptance probability:

5
R22 Machine Learning Lecture Notes

4. Accept x′ with probability α, otherwise, stay at x.

 Use Case: Widely applicable and flexible for various target distributions. 

Gibbs Sampling:

 Description: Samples each variable in turn from its conditional distribution given the
current values of the other variables. 
 Process:
1. Initialize all variables.
2. Sample each variable xi from p(xi∣other variables).
3. Repeat until convergence.
 Use Case: Effective when conditional distributions are easier to sample from. 
 Example: Ideal for Bayesian networks and hierarchical models. 

Graphical Models:
 Graphical models are a powerful framework for representing complex dependencies
among variables in a visual and mathematical way.

Bayesian Networks:
 Bayesian Networks (BNs) are a type of probabilistic graphical model that uses directed
acyclic graphs (DAGs) to represent a set of variables and their conditional
dependencies.
 They are particularly powerful for modeling complex systems where understanding the
relationships between variables is crucial.

6
R22 Machine Learning Lecture Notes

Joint Probability:
 Joint probability is a probability of two or more events happening together. For
example, the joint probability of two events A and B is the probability that both events
occur, P(A∩B).
P(A ∩ B) = P(A) · P(B)
P(A ∩ B) = P(A | B) · P(B)
Conditional Probability:
 Conditional probability defines the probability that event B will occur, given that event
A has already occurred.

Example:

Burglary ‘B’ –

 P (B=T) = 0.001 (‘B’ is true i.e burglary has occurred)

 P (B=F) = 0.999 (‘B’ is false i.e burglary has not occurred)

Fire ‘F’ –

 P (F=T) = 0.002 (‘F’ is true i.e fire has occurred)

 P (F=F) = 0.998 (‘F’ is false i.e fire has not occurred)

Alarm ‘A’ –

7
R22 Machine Learning Lecture Notes

B F P (A=T) P (A=F)
T T 0.95 0.05
T F 0.94 0.06
F T 0.29 0.71
F F 0.001 0.999

Person ‘P1’ –

A P (P1=T) P (P1=F)
T 0.95 0.05
F 0.05 0.95

Person ‘P2’ –

A P (P2=T) P (P2=F)
T 0.80 0.20
F 0.01 0.99

P ( P1, P2, A, ~B, ~F)

= P (P1/A) * P (P2/A) * P (A/~B~F) * P (~B) * P (~F)

= 0.95 * 0.80 * 0.001 * 0.999 * 0.998

= 0.00075

Applications

 Medical Diagnosis: Modeling diseases and symptoms to assist in diagnostic

reasoning. 
 Machine Learning: Feature selection, classification, and regression.
 Natural Language Processing: Dependency parsing and language modeling.
 Econometrics: Understanding relationships between economic indicators.

Markov Random Fields:

 Markov Random Fields (MRFs), also known as Markov Networks, are a type of
probabilistic graphical model that represents the dependencies among a set of random
variables using an undirected graph.
 They are particularly useful for modeling scenarios where the exact direction of
dependency is not well-defined or when symmetrical relationships between variables
exist.

8
R22 Machine Learning Lecture Notes

 A Markov Random Field or Markov Network is a class of graphical models with an

undirected graph between random variables.
 The structure of this graph decides the dependence or independence between the
random variables.

Fig. Markov Random Field with four random variables

Components of Markov Random Fields

1. Nodes (Vertices):
o Each node represents a random variable.
o Nodes can represent observed data, hidden variables, or any entities in the
model.
2. Edges (Links):
o Undirected edges between nodes indicate direct dependencies.
o Unlike Bayesian Networks, MRFs use undirected edges to capture the
symmetrical nature of relationships.
3. Clique Potentials (Factors):
o Potential functions are associated with cliques (fully connected subgraphs) of
the graph.
o They represent the local dependencies among the variables in a clique.
o These potential functions are often denoted as ψ

Applications

 Image Processing: Image segmentation, denoising, and restoration.

 Natural Language Processing: Part-of-speech tagging, named entity recognition.
 Computer Vision: Object recognition, scene labeling.
 Bioinformatics: Modeling spatial dependencies in protein structures.

9
R22 Machine Learning Lecture Notes

Hidden Markov Models:

 The hidden Markov Model (HMM) is a statistical model that is used to describe the
probabilistic relationship between a sequence of observations and a sequence of hidden
states.
 It is often used in situations where the underlying system or process that generates the
observations is unknown or hidden, hence it has the name “Hidden Markov Model.”
 It is used to predict future observations or classify sequences, based on the underlying
hidden process that generates the data.

An HMM consists of two types of variables: hidden states and observations.

 The hidden states are the underlying variables that generate the observed data, but
they are not directly observable.
 The observations are the variables that are measured and observed.

The Hidden Markov Model (HMM) is the relationship between the hidden states and the
observations using two sets of probabilities: the transition probabilities and the emission
probabilities.

 The transition probabilities describe the probability of transitioning from one hidden
state to another.

 The emission probabilities describe the probability of observing an output given a

hidden state.

Hidden Markov Model Algorithm:

Step 1: Define the state space and observation space

The state space is the set of all possible hidden states, and the observation space is the set of
all possible observations.

Step 2: Define the initial state distribution

This is the probability distribution over the initial state.

Step 3: Define the state transition probabilities

These are the probabilities of transitioning from one state to another. This forms the
transition matrix, which describes the probability of moving from one state to another.

Step 4: Define the observation likelihoods:

These are the probabilities of generating each observation from each state. This forms the
emission matrix, which describes the probability of generating each observation from each
state.

10
R22 Machine Learning Lecture Notes

Step 5: Train the model

The parameters of the state transition probabilities and the observation likelihoods are
estimated using the Baum-Welch algorithm, or the forward-backward algorithm. This is done
by iteratively updating the parameters until convergence.

Step 6: Decode the most likely sequence of hidden states

Given the observed data, the Viterbi algorithm is used to compute the most likely sequence
of hidden states. This can be used to predict future observations, classify sequences, or detect
patterns in sequential data.

Step 7: Evaluate the model

The performance of the HMM can be evaluated using various metrics, such as accuracy,
precision, recall, or F1 score.
Tracking Methods:
 Tracking methods in machine learning, often referred to as object tracking, involve
techniques used to locate and follow an object's position over time in a sequence of
frames or images.
 These methods have applications in various fields, including computer vision, robotics,
surveillance, and augmented reality.
Kalman Filter:
 The Kalman filter is an optimal estimator for linear systems with Gaussian noise.
 It provides a recursive solution to the linear quadratic estimation problem, efficiently
processing noisy measurements to produce an estimate of the system's state.
Components:

1. State Vector (xt):

o Represents the state of the system at time t.
2. State Transition Model (A):
o Describes how the state evolves over time.
o xt=Axt−1+But+wtxt where ut is the control input and wt is the process noise.
3. Measurement Model (H):
o Relates the state to the observations.
o zt=Hxt+vtzt = where zt is the measurement and vt is the measurement noise.
4. Covariance Matrices (Q and R):
o Q: Process noise covariance.
o R: Measurement noise covariance.

11
R22 Machine Learning Lecture Notes

Algorithm:

1. Prediction:
o Predict the next state
o Predict the error covariance
2. Update:
o Compute the Kalman gain
o Update the state estimate
o Update the error covariance

Applications:

 Navigation Systems: GPS and inertial navigation. 

 Control Systems: Robotics, aerospace.
 Economics: Estimating trends and cycles.

Particle Filter:
 The particle filter, or Sequential Monte Carlo (SMC) method, is used for non-linear,
non-Gaussian systems.
 It represents the posterior distribution of the state using a set of random samples
(particles) and weights.
Components

1. Particles:
o A set of samples representing possible states.
2. Weights:

12
R22 Machine Learning Lecture Notes

o Importance weights for each particle, representing the likelihood given the
observations.

Algorithm:

1. Initialization:
o Generate an initial set of particles from the prior distribution.
o Initialize weights
2. Prediction:
o Propagate particles according to the state transition model
3. Update:
o Update weights based on the measurement likelihood
o Normalize weights
4. Resampling:
o Resample particles based on their weights to avoid degeneracy.

Applications:

 Robotics: Localization and mapping (SLAM).

 Computer Vision: Object tracking. 
 Finance: Filtering in stochastic volatility models. 

13
R22 Machine Learning Lecture Notes

Comparison:

 Kalman Filter:
o Assumes linear dynamics and Gaussian noise.
o Computationally efficient.
o Optimal for linear systems.
 Particle Filter:
o Handles non-linear and non-Gaussian systems.
o More computationally intensive.
o Provides a flexible framework for complex systems.

*****

Security Questions PDF
No ratings yet
Security Questions PDF
156 pages
DAA-2020-21 Final Updated Course File
No ratings yet
DAA-2020-21 Final Updated Course File
49 pages
Quiz Application in C#
100% (1)
Quiz Application in C#
9 pages
CP4292-Multicore Lab
No ratings yet
CP4292-Multicore Lab
39 pages
Question Bank - OS
No ratings yet
Question Bank - OS
6 pages
Unit 3 - FSW - Important Ques With Ans
No ratings yet
Unit 3 - FSW - Important Ques With Ans
36 pages
Instant Ebooks Textbook Cognitive Computing Theory and Applications 1st Edition Venkat N. Gudivada Download All Chapters
100% (6)
Instant Ebooks Textbook Cognitive Computing Theory and Applications 1st Edition Venkat N. Gudivada Download All Chapters
84 pages
Web application security lab manual word
No ratings yet
Web application security lab manual word
27 pages
Embedded System & IoT
No ratings yet
Embedded System & IoT
27 pages
02 - What Is Full Stack Web Development
No ratings yet
02 - What Is Full Stack Web Development
8 pages
MC4102 OOSE Question bank
No ratings yet
MC4102 OOSE Question bank
4 pages
DPCO Unit 1 - New
No ratings yet
DPCO Unit 1 - New
78 pages
3.multicore Architecture and Programming
0% (1)
3.multicore Architecture and Programming
3 pages
Unit I
No ratings yet
Unit I
53 pages
cs8086 Soft Computing
No ratings yet
cs8086 Soft Computing
14 pages
P.prabu (28x61c) CCS334 BDA - Unit 4
No ratings yet
P.prabu (28x61c) CCS334 BDA - Unit 4
28 pages
Internet & World Wide Web HOW To PROGRAM - Lecture Notes, Study Materials and Important Questions Answers
No ratings yet
Internet & World Wide Web HOW To PROGRAM - Lecture Notes, Study Materials and Important Questions Answers
15 pages
ui&ux .new
100% (1)
ui&ux .new
35 pages
Jerusalem College of Engineering: ACADEMIC YEAR 2021 - 2022
No ratings yet
Jerusalem College of Engineering: ACADEMIC YEAR 2021 - 2022
40 pages
CS8791-CC Unit-II
No ratings yet
CS8791-CC Unit-II
75 pages
CCS354 NS-UNIT-2 KEY MANAGEMENT & AUTHENTICATION Full
No ratings yet
CCS354 NS-UNIT-2 KEY MANAGEMENT & AUTHENTICATION Full
60 pages
Security Trends, Legal, Ethical and Professional Aspects of Security
No ratings yet
Security Trends, Legal, Ethical and Professional Aspects of Security
3 pages
Os Lab Manual AI&DS
No ratings yet
Os Lab Manual AI&DS
64 pages
Intern Report
No ratings yet
Intern Report
27 pages
Design and Analysis of Algorithm (Lab)
No ratings yet
Design and Analysis of Algorithm (Lab)
27 pages
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 4 - Logical Reasoning
No ratings yet
Artificial Intelligence - AL3391 - Important Questions With Answer - Unit 4 - Logical Reasoning
8 pages
Cs3391 Oops Unit 1 Notes Eduengg
No ratings yet
Cs3391 Oops Unit 1 Notes Eduengg
60 pages
Crowd Sourcing Analytics
100% (1)
Crowd Sourcing Analytics
27 pages
AD3461 ML lab manual
No ratings yet
AD3461 ML lab manual
32 pages
MCQ
No ratings yet
MCQ
11 pages
Question Bank 1to11
No ratings yet
Question Bank 1to11
19 pages
Deep Learning for Vision Lab Manual 2024
100% (1)
Deep Learning for Vision Lab Manual 2024
25 pages
FDP Brochure PDF
100% (1)
FDP Brochure PDF
2 pages
Deep Learning - AD3501 - Notes - Unit 2 - Convolutional Neural Networks
No ratings yet
Deep Learning - AD3501 - Notes - Unit 2 - Convolutional Neural Networks
36 pages
KCG College of Technology Karapakkam Chennai-600 097
No ratings yet
KCG College of Technology Karapakkam Chennai-600 097
3 pages
A Model For Network Security
No ratings yet
A Model For Network Security
1 page
Paper Presentation
No ratings yet
Paper Presentation
2 pages
DS CS3301 Unit-4 Notes
No ratings yet
DS CS3301 Unit-4 Notes
52 pages
Ccs374 Web Application Security
No ratings yet
Ccs374 Web Application Security
20 pages
SPINS: Security Protocols For Sensor Networks
No ratings yet
SPINS: Security Protocols For Sensor Networks
29 pages
Programming in C - CS3251 - HandWritten Notes - Un_250316_200237
No ratings yet
Programming in C - CS3251 - HandWritten Notes - Un_250316_200237
38 pages
Lab Record-Cs3401 Algorithms
No ratings yet
Lab Record-Cs3401 Algorithms
79 pages
CS3492 Database Management Systems Apr May 2024 Question Paper Download
No ratings yet
CS3492 Database Management Systems Apr May 2024 Question Paper Download
2 pages
2022 Dec. ITT401-A
No ratings yet
2022 Dec. ITT401-A
2 pages
Streamprocessing Labmanual
No ratings yet
Streamprocessing Labmanual
48 pages
CSE 5th Semester - UI and UX Design - CCS370 - Hand Written Notes - Unit 2 - Foundations of UI Design
No ratings yet
CSE 5th Semester - UI and UX Design - CCS370 - Hand Written Notes - Unit 2 - Foundations of UI Design
25 pages
Cloud Computing Lab Manual-New
No ratings yet
Cloud Computing Lab Manual-New
150 pages
JNTUH Mobile Application Development Syllabi
No ratings yet
JNTUH Mobile Application Development Syllabi
2 pages
Question Bank of Applied Machine Learning
No ratings yet
Question Bank of Applied Machine Learning
2 pages
Model Driven Engineering (MDE) : ITC-708 by Dr. Mir Sajjad Hussain Talpur Dated: 08-2-2021
0% (1)
Model Driven Engineering (MDE) : ITC-708 by Dr. Mir Sajjad Hussain Talpur Dated: 08-2-2021
17 pages
Eee-Easwari Engineering College
No ratings yet
Eee-Easwari Engineering College
369 pages
Advanced Databases - Unit - V - PPT
No ratings yet
Advanced Databases - Unit - V - PPT
71 pages
Placement Portal Management System
No ratings yet
Placement Portal Management System
29 pages
21CSE354T - Full Stack Web Development Question Bank (1)
100% (1)
21CSE354T - Full Stack Web Development Question Bank (1)
9 pages
Unit 2 - Week 1: Introduction To Clouds, Virtualization and Virtual Machine
No ratings yet
Unit 2 - Week 1: Introduction To Clouds, Virtualization and Virtual Machine
48 pages
Cs8662 Mad Qs Set
No ratings yet
Cs8662 Mad Qs Set
2 pages
4.7.1 - Data Warehousing Mining & Business Intelligence
No ratings yet
4.7.1 - Data Warehousing Mining & Business Intelligence
3 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
ML_UNIT5
No ratings yet
ML_UNIT5
15 pages
Unit-5 Alt
No ratings yet
Unit-5 Alt
15 pages
MyNotesUnit5
No ratings yet
MyNotesUnit5
12 pages
ML unit-1
No ratings yet
ML unit-1
15 pages
ML unit-3
No ratings yet
ML unit-3
23 pages
ML unit-2
100% (1)
ML unit-2
28 pages
ML unit-4
No ratings yet
ML unit-4
17 pages
ISL88731 A
No ratings yet
ISL88731 A
23 pages
Freq Show Raspberry Pi RTL SDR Scanner
No ratings yet
Freq Show Raspberry Pi RTL SDR Scanner
12 pages
B650M Project Zero
No ratings yet
B650M Project Zero
1 page
Eclipse t2810c User Manual
100% (2)
Eclipse t2810c User Manual
25 pages
Gesture Control&Fire Fighter Robot
No ratings yet
Gesture Control&Fire Fighter Robot
32 pages
G1315 DAD Manual PDF
No ratings yet
G1315 DAD Manual PDF
318 pages
myCANAL FR NEW 2022 [YASHVIR GAMING].svb 2
No ratings yet
myCANAL FR NEW 2022 [YASHVIR GAMING].svb 2
8 pages
2 Exploring Art Chapter 2
No ratings yet
2 Exploring Art Chapter 2
64 pages
TPSDI-TROMBAY-BROCHURE-ENGLISH Pagewise A6b7a38439
No ratings yet
TPSDI-TROMBAY-BROCHURE-ENGLISH Pagewise A6b7a38439
8 pages
PED08 Prelim 2019 2020 2nd Semester
No ratings yet
PED08 Prelim 2019 2020 2nd Semester
4 pages
FAQs For Ex-Employees
No ratings yet
FAQs For Ex-Employees
6 pages
Event Emitter
No ratings yet
Event Emitter
1 page
ASUS F3Sc F3Sv PDF
No ratings yet
ASUS F3Sc F3Sv PDF
74 pages
Salesforce Associate - Frequently Asked Questions With Answers (1)
No ratings yet
Salesforce Associate - Frequently Asked Questions With Answers (1)
14 pages
Characterization of The Chemical Effects of Ceria Slurries For Dielectric Chemical Mechanical Polishing
No ratings yet
Characterization of The Chemical Effects of Ceria Slurries For Dielectric Chemical Mechanical Polishing
148 pages
1.3.pdf (3)
No ratings yet
1.3.pdf (3)
19 pages
Inovance It6000 Hmi To Connect Other PLC Quick Guide English 20 4 20
No ratings yet
Inovance It6000 Hmi To Connect Other PLC Quick Guide English 20 4 20
23 pages
Mpstme Prospectus 2011
No ratings yet
Mpstme Prospectus 2011
76 pages
Guidanc CTspection
No ratings yet
Guidanc CTspection
17 pages
A+ Course Syllabus
No ratings yet
A+ Course Syllabus
4 pages
Deep Learning - Handwritten Digit Recognition Using Python
No ratings yet
Deep Learning - Handwritten Digit Recognition Using Python
46 pages
Tle 2062
No ratings yet
Tle 2062
79 pages
Mail Merging ECDL Exercise Toytown
No ratings yet
Mail Merging ECDL Exercise Toytown
3 pages
TBS Case Study Train Loadout
No ratings yet
TBS Case Study Train Loadout
1 page
Optimal Adaptive Control - Lewis - Full Book
No ratings yet
Optimal Adaptive Control - Lewis - Full Book
302 pages
Presion Negativa Smith PDF
No ratings yet
Presion Negativa Smith PDF
24 pages
19th Jan 2025 Digital Classified Doon Weekly Market Epaper-min
No ratings yet
19th Jan 2025 Digital Classified Doon Weekly Market Epaper-min
31 pages
Mobikwik Integration Guide
0% (1)
Mobikwik Integration Guide
36 pages
CH10-COA10e - Computer Arithmetic PDF
No ratings yet
CH10-COA10e - Computer Arithmetic PDF
56 pages

ML unit-5

Uploaded by

ML unit-5

Uploaded by

R22 Machine Learning Lecture Notes

Key Components of Reinforcement Learning

The Learning Process

Algorithms in Reinforcement Learning

Getting Lost Example:

Scenario: Robot Navigating a Maze

Applications of Reinforcement Learning:

1. Game Playing: RL agents have achieved superhuman performance in games like

Markov Chain Monte Carlo Methods:

 A technique that uses random sampling to estimate numerical results. 

How MCMC Works:

 Simulating the behavior of physical systems at the atomic or molecular level.

 Training models with complex likelihood functions.

4. Ecology and Evolutionary Biology:

 Estimating parameters of population dynamics models.

 Description: A general algorithm that generates a candidate sample from a proposal

4. Accept x′ with probability α, otherwise, stay at x.

 P (B=T) = 0.001 (‘B’ is true i.e burglary has occurred)

 P (F=T) = 0.002 (‘F’ is true i.e fire has occurred)

P ( P1, P2, A, ~B, ~F)

= P (P1/A) * P (P2/A) * P (A/~B~F) * P (~B) * P (~F)

= 0.95 * 0.80 * 0.001 * 0.999 * 0.998

 Medical Diagnosis: Modeling diseases and symptoms to assist in diagnostic

Markov Random Fields:

 A Markov Random Field or Markov Network is a class of graphical models with an

Fig. Markov Random Field with four random variables

Components of Markov Random Fields

 Image Processing: Image segmentation, denoising, and restoration.

Hidden Markov Models:

An HMM consists of two types of variables: hidden states and observations.

 The emission probabilities describe the probability of observing an output given a

Hidden Markov Model Algorithm:

Step 1: Define the state space and observation space

Step 2: Define the initial state distribution

This is the probability distribution over the initial state.

Step 3: Define the state transition probabilities

Step 4: Define the observation likelihoods:

Step 5: Train the model

Step 6: Decode the most likely sequence of hidden states

Step 7: Evaluate the model

1. State Vector (xt):

 Navigation Systems: GPS and inertial navigation. 

 Robotics: Localization and mapping (SLAM).

You might also like