UNit 1 Introduction To ML
UNit 1 Introduction To ML
by
Prof. Aishwarya D S
Course Outcomes
2
Machine learning (ML) is a type of artificial intelligence (AI) that allows software
applications to “self-learn” from training data and improve over time, to
become more accurate at predicting outcomes without being explicitly programmed
to do so.
Machine learning algorithms use historical data as input to predict new output values.
Machine learning algorithms are able to detect patterns in data and learn
from them, in order to make their own predictions
If you are searching some item in amazon… next time… without your request… your
choice will be listed.
Deep Learning
◻ It is the subset of ML, which mimic human brain.
machine learning, it is a track of AI which uses historical data to learn the hidden pattern that already exists in
data and generate insights useful for solving a business problem. The beauty of machine learning lies in the
fact that it can learn these patterns without being explicitly programmed, and it keeps improving with the
experience.
The machine learning algorithm is fed with training samples to achieve that goal. The algorithm has an
objective function that it wants to achieve. Unless the algorithm achieves the objective function up to a certain
level, it is trained repetitively. After the training process ends, the model is tested against unseen data called
test data to generate insights.
The Role of Objective Functions in AI In AI, objective functions are instrumental in driving the learning process of
machine learning and deep learning models. They provide the framework for assessing and optimizing model
performance, thereby enabling the models to converge towards desired outcomes during training.
19
20
Types of Machine Learning:
? Supervised Learning
? Unsupervised Learning
? Semi supervised Learning
? Reinforcement Learning
• Classification is the task of assigning a class label to an input pattern. The
class label indicates one of a given set of classes. The classification is carried
out with the help of a model obtained using a learning procedure. According
to the type of learning used, there are two categories of classification.
supervised learning and unsupervised learning.
• Supervised learning makes use of a set of examples which already have the
class labels assigned to them.
• Unsupervised learning attempts to find inherent structures in the data.
• Semi-supervised learning makes use of a small number of labeled data and a
large number of unlabeled data to learn the classifier.
1. Supervised Learning
Below are some popular Regression algorithms which come under supervised
learning:
• Linear Regression
• Regression Trees
• Non-Linear Regression
• Bayesian Linear Regression
• Polynomial Regression
28
Classification
•
Grouped based on the behavior of the data.
•
Used when we want to group our dataset on the basis
of inherent similarities in the data.
38
Supervised learning algorithms are trained using labeled data. Unsupervised learning algorithms are trained using unlabeled
data.
Supervised learning model takes direct feedback to check if it is Unsupervised learning model does not take any feedback.
predicting correct output or not.
Supervised learning model predicts the output. Unsupervised learning model finds the hidden patterns in data.
In supervised learning, input data is provided to the model In unsupervised learning, only input data is provided to the
along with the output. model.
The goal of supervised learning is to train the model so that it The goal of unsupervised learning is to find the hidden patterns
can predict the output when it is given new data. and useful insights from the unknown dataset.
Supervised learning needs supervision to train the model. Unsupervised learning does not need any supervision to train
the model.
46
Stock Market Prediction Financial News and Charts Predicted Market ups and Downs
Image Processing Example
• Sorting Fish: incoming fish are sorted
according to species using optical sensing
(sea bass or salmon?)
• Problem Analysis:
▪set up a camera and take some sample
images to extract features
▪ Consider features such as length, lightness,
width, number and shape of fins, position of
mouth, etc.
Preprocessing
Examples:
• Noise removal
• Image enhancement
• Separate touching
or occluding fish
58
• Extract boundary of each
fish
Feature Extraction
Histogram of “length”
threshold l*
• Even though sea bass is longer than salmon on the average, there are
many examples of fish where this observation does not hold.
Add Another Feature
62
• If the feature space cannot be perfectly separated by a
straight line, a more complex boundary might be used.
(non-linear)
• Alternatively a simple decision boundary such as straight
line might be used even if it did not perfectly separate
the classes, provided that the error rates were acceptably
low.
Hyper planes and Hyper surfaces
67
License Plate Recognition
68
Biometric Recognition
69
Face Detection/Recognition
Detection
Matching
Recognition
70
Fingerprint Classification
71
Autonomous Systems
72
Medical Applications
73
Land Cover Classification
(using aerial or satellite images)
74
Statistical Decision Theory
◻ Decision theory, in statistics, a set of
quantitative methods for reaching optimal
decisions.
Example for Statistical Decision Theory
• For example in the 9th game the home team on average, scored 10.8 fewer points in
previous games than the visiting team, on average and also the home team lost.
• When the teams have about the same apg s, the outcome is less certain. For example,
in the 10th game , the home team on average scored 0.4 fewer points than the
visiting team, on average, but the home team won the match.
• Similarly 12th game, the home team had an apg 1.1. less than the visiting team on
average and the team lost.
Histogram of dapg
Won
Prediction
• Each sample has a corresponding feature vector (dapg,dwp), which determines its position in the plot.
• Note that the feature space can be classified into two decision regions by a straight line, called a linear decision
boundary. (refer line equation). Prediction of this line is logistic regression.
• If the sample lies above the decision boundary, the home team would be classified as the winner and it is
below the decision boundary it is classified as loser.
85
Decision region and Decision Boundary
• Since the point (dapg, dwp) = (-4.6,-36.7) lies below the decision
boundary, we predict that the home team will lose the game.
◻ If the feature space cannot be perfectly separated by a straight line,
a more complex boundary might be used. (non-linear)
93
Unable to predict outcome but in the long run can one can determine that
each outcome will occur 1/6 of the time.
Use symmetry. Each side is the same. One side should not occur more
frequently than another side in the long run. If the die is not balanced this
may not be true.
Example
The number of arrangements in which 4 women can sit in 4 places = 4P4 = 4!/(4 – 4)! =
4!/0! = 24/1 = 24
That means the number of ways they can be seated = 5P5 = 5!/(5 – 5)! = 5!/0! = 120/1 =
120
The order of
the choice is
not
important!
114
The order of
the choice is
not important!
Solution:
Combination of a four-member team with at least one boy are:
{(BGGG), (BBGG), (BBBG), (BBBB)}
Number of ways one boy and three girls can be selected = 6C1 × 4C3 = 6 × 4 = 24
Number of ways two boys and two girls can be selected = 6C2 × 4C2 = 15 × 6 = 90
Number of ways three boys and one girl can be selected = 6C3 × 4C1 = 20 × 4 = 80
S
E
Examples
={ , , }
Special Events
A∪ B
A B
The event A ∪ B occurs if the event A occurs or
the event and B occurs or both occurs.
A∪ B
A B
Intersection
A∩B
A B
The event A ∩ B occurs if the event A occurs and
the event and B occurs .
A∩B
A B
Complement
A
The event occurs if the event A does not
occur
A
Mutually Exclusive
A B
If two events A and B are mutually exclusive then:
A B
RULES OF PROBABILITY
ADDITIVE RULE
RULE FOR COMPLEMENTS
Additive rule (General case)
if A ∩ B = φ
(A and B mutually exclusive)
Example:
Bangalore and Mohali are two of the cities competing for the National
university games. (There are also many others).
There is a 35% chance that Mohali will be amongst the final 5 and
an 8% chance that both Bangalore and Mohali will be amongst the final 5.
What is the probability that Bangalore or Mohali will be amongst the final 5.
Solution:
Let A = the event that Bangalore is amongst the final 5.
Let B = the event that Mohali is amongst the final 5.
Also there is one card that is both a spade and an ace. So the probability of that is
Probability of a single card being both Spade and Ace = 1/52.
or
Complement
Let A be any event, then the complement of A (denoted by ) defined by:
A
The event occurs if the event A does not occur
A
Logic:
A
What Is Conditional Probability?
If we’re told that event B has occurred then the sample space is restricted to B.
The event A can now only occur if the outcome is in A ∩ B. Hence the new probability
of A is:
A
B
A∩B
An Example
We can obtain the probability of rain given high pressure, directly from the
data.
P(L) =P(R and T and L)+P(R and Tc and L) + P(Rc and T and L) +
P(Rc and Tc and L)
=1/12+1/24+1/24+1/16
=11/48.
c. We can find P(R|L) using
P(R|L)=P(R∩L)/P(L)
We have already found P(L)=11/48 and we can find P(R∩L) similarly
by adding the probabilities of the outcomes that belong to R∩L.
In particular,
P(R∩L) =P(R,T,L)+P(R,Tc,L)
=1/12+1/24
=1/8
Thus we obtain
P(R|L) =P(R∩L)/P(L)
=(1/8)/(11/48)
=6/11.
Random Variables
Example:?
• The distribution function for the number of heads from two flips of a coin.
• The random variable k is defined to be the total number of heads that occur
when a fair coin is flipped two times.
• This random variable can have only 3 values 0,1,2, so it is discrete.
• Distribution function is (T, T), (T, H), (H, T), (H, H)
k P(k)
0 1/4
1 2/4
2 1/4
• Two types of random variables
–Discrete random variables (countable set of possible
outcomes)
–Continuous random variable (unbroken chain of
possible outcomes)
• Discrete
(pmf)
random variables are understood in terms of their probability mass function
• pmf ≡ a mathematical function that assigns probabilities to all possible outcomes for a
discrete random variable.
◻ Two types of random variables
? Discrete random variables
? Continuous random variable
Discrete random variables
◻ If the random variable values lies between two certain fixed numbers then it
is called continuous random variable. The result can be finite or infinite.
◻ If X is the random value and it’s values lies between a and b then,
Assumptions:
● Random experiment is performed repeatedly with a fixed and
finite number of trials. The number is denoted by ‘n’
● There are two mutually exclusive possible outcome on each
trial, which are know as “Success” and “Failure”.
● Success is denoted by ‘p’ and failure is denoted by ‘q’. and
p+q=1 or q=1-p.
● The outcome of any give trail does not affect the outcomes of
the subsequent trail. That means all trials are independent.
● The probability of success and failure (p&q) remains constant
for all trials. If it does not remain constant then it is not
binomial distribution.
● Example: ?
173
OR
n = number of trials
P(x) =
1-p = probability of
failure
X=# p = probability
successes
of success
out of n
trials
Binomial Probability Distribution
◻ When the random variable of interest can take any value in an interval, it is called
continuous random variable.
? Every continuous random variable has an infinite, uncountable number of possible
values (i.e., any value in an interval).
• Examples Temperature on a given day, Length, height, intensity of light falling on a given
region.
◻ The length of time it takes a truck driver to go from New York City to Miami.
◻ The depth of drilling to find oil.
◻ The weight of a truck in a truck-weighing station.
◻ The amount of water in a 12-ounce bottle.
For each of these, if the variable is X, then x>0 and less than some maximum value possible,
but it can take on any value within this range.
Difference between discrete and Continuous value?
•
Continuous Uniform Distribution
◻ For Uniform distribution, f(x) is constant over the possible value
of x.
◻ Area looks like a rectangle.
◻ For the area in continuous distribution we need to do integration
of the function.
◻ However in this case it is the area of rectangle.
◻ Example to time taken to wash the clothes in a washing machine.
(for a standard condition)
190
191
NORMAL DISTRIBUTION
f x 2πσ
( )
=
• If x and y are continuous, then the probability density function is used over the
region R, where x and y is applied is used.
• It is given by:
• Where the integral is taken over the region R. This integral represents a
volume in the xyp-space.
Probability distributions can be used to describe the
population, just as we described samples .
– Shape: Symmetric, skewed, mound-shaped…
– Outliers: unusual or unlikely measurements
– Center and spread: mean and standard deviation. A population mean is
called μ and a population standard deviation is called σ.
Let x be a discrete random variable with probability distribution p(x). Then
the mean, variance and standard deviation of x are given as
Moments of Random Variables
211
In positive
Skewness,
Mean is > median
and
Median>mode
And it is reverse
in case of –ve
skewness
Moment 4 : To know the Kurtosis
D
Normal Distribution
◻ Consider an example of x values:
◻ 4,5,5,6,6,6,7,7,8
◻ Mode, Median and mean all will be equal
◻ = Mode is 6
◻ = Median is 6
◻ = Mean is also 6
Positive Skew
◻ Consider an example of x values:
◻ 5,5,5,6,6,7,8,9,10
◻ (It is an example for Normal Distribution)
◻ = Mode is 5
◻ = Median is 6
◻ = Mean is also 6.8
Let X be a discrete random variable having support Rx = <1, 2> and the pmf is
222
Let X be a discrete random variable having support x = <1, 2> and the pmf is