0% found this document useful (0 votes)
32 views

UNit 1 Introduction To ML

Uploaded by

rahuljssstu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

UNit 1 Introduction To ML

Uploaded by

rahuljssstu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 225

20CS610 Machine Learning

by
Prof. Aishwarya D S
Course Outcomes
2

◻ After completing this course, students should be able to:

◻ CO1: Understand the basic concepts of Machine Learning.


◻ CO2: Formulate machine learning problems corresponding
to different applications.
◻ CO3: Understand the unsupervised learning techniques..
◻ CO4: Understanding the pre-processing activity done for
Machine learning algorithm.
◻ CO5: Understand neural networks implemented for various
applications.
Text Books
3

◻ Pattern Recognition and Image Analysis, Earl


Gose,
Richard Johnsonbaugh Steve Jos, Pearson 2015
◻ Pattern Classification, Richard O. Duda, Peter E.
Hart, David G. Stork, Second edition, Wiley
publication.
◻ Ethem Alpaydin (2014). Introduction
to Machine Learning, Third Edition,
MIT Press. The textbook website is
https://www.cmpe.boun.edu.tr/~ethem/i2ml3e/
Reference Books
4

◻ 1. Machine Learning, Tom M. Mitchell,


McGraw-Hill Publishers, 1997.
◻ 2. Pattern Recognition and Machine Learning,
Christopher M. Bishop, Springer Publishers, 2011.
◻ 3. Kevin Murphy, Machine Learning: A
Probabilistic Perspective, MIT Press, 2012.
◻ 4. Understanding Machine Learning , Shai
Shalev-Shwartz and Shai Bendavid, Cambridge
University press. 2017.
UNIT – 1 Introduction & Bayesian Decision Theory:
1

◻ Introduction: What Is Machine Learning?


Applications of Machine Learning, Types of
Machine Learning, Statistical Decision Theory and
Analysis
◻ Probability: Introduction, Basics of Probability,
Combination, Permutation, union, Intersection.
Complement, Conditional Probability, Random
Variables, Binomial Distribution, Normal
distribution, Joint Distributions and Densities,
Moments of Random Variables
MACHINE LEARNING – UNIT 1
Introduction
◻ Artificial Intelligence (AI)
◻ Machine Learning (ML)
◻ Deep Learning (DL)
◻ Data Science
Artificial Intelligence
◻ Artificial intelligence is intelligence demonstrated
by machines, as opposed to natural intelligence
displayed by animals including humans.
Machine Learning
◻ Machine Learning – Statistical Tool to explore the data.

Machine learning (ML) is a type of artificial intelligence (AI) that allows software
applications to “self-learn” from training data and improve over time, to
become more accurate at predicting outcomes without being explicitly programmed
to do so.

Machine learning algorithms use historical data as input to predict new output values.

Machine learning algorithms are able to detect patterns in data and learn
from them, in order to make their own predictions

If you are searching some item in amazon… next time… without your request… your
choice will be listed.
Deep Learning
◻ It is the subset of ML, which mimic human brain.

◻ Three popular Deep Learning Techniques are:


? ANN – Artificial Neural Network:
? CNN- Convolution Neural Network
? RNN- Recurrent Neural Network
Summary:
12
13
What is Machine Learning?
◻ Learning : is any process by which a system
improves performance from experience.
◻ Example: We call experienced doctor, Experienced
Teacher, Experienced driver

◻ Machine Learning is the study of algorithms that


• improve their performance P
• at some task T
• with experience E.
A well-defined learning task is given by <P,T,E>
Defining the Learning Task
Improve on task T, with respect to performance metric P,
based on experience E
◻T: Recognizing hand-written words
◻P: Percentage of words correctly classified
◻E: Database of human-labeled images of handwritten words

◻T: Driving on four-lane highways using vision sensors


◻P: Average distance traveled before a human-judged error
◻ E: A sequence of images and steering commands recorded
while observing a human driver.
◻ A Machine Learning system learns from historical data, builds
the prediction models, and whenever it receives new data,
predicts the output for it.
◻ The accuracy of predicted output depends upon the amount of
data, as the huge amount of data helps to build a better model
which predicts the output more accurately.
17

machine learning, it is a track of AI which uses historical data to learn the hidden pattern that already exists in
data and generate insights useful for solving a business problem. The beauty of machine learning lies in the
fact that it can learn these patterns without being explicitly programmed, and it keeps improving with the
experience.
The machine learning algorithm is fed with training samples to achieve that goal. The algorithm has an
objective function that it wants to achieve. Unless the algorithm achieves the objective function up to a certain
level, it is trained repetitively. After the training process ends, the model is tested against unseen data called
test data to generate insights.
The Role of Objective Functions in AI In AI, objective functions are instrumental in driving the learning process of
machine learning and deep learning models. They provide the framework for assessing and optimizing model
performance, thereby enabling the models to converge towards desired outcomes during training.
19
20
Types of Machine Learning:
? Supervised Learning
? Unsupervised Learning
? Semi supervised Learning
? Reinforcement Learning
• Classification is the task of assigning a class label to an input pattern. The
class label indicates one of a given set of classes. The classification is carried
out with the help of a model obtained using a learning procedure. According
to the type of learning used, there are two categories of classification.
supervised learning and unsupervised learning.
• Supervised learning makes use of a set of examples which already have the
class labels assigned to them.
• Unsupervised learning attempts to find inherent structures in the data.
• Semi-supervised learning makes use of a small number of labeled data and a
large number of unlabeled data to learn the classifier.
1. Supervised Learning

• Supervised learning: classification is seen as supervised


learning from examples.
–Supervision: The data (observations, measurements, etc.) are
labeled with pre-defined classes. It is like that a “teacher” gives
the classes (supervision).
–It is called supervised learning because the process of an
algorithm learning from the training dataset can be
thought of as a teacher supervising the learning process.
–Test data are classified into these classes too.
–Example: Classify the mails as span or non span based on
redecided parameters.
Supervised Learning

◻ The aim of a supervised learning algorithm is to find


a mapping function to map the input variable(x)
with the output variable(y).
◻ In the real-world, supervised learning can be used
for Risk Assessment, Image classification, Fraud
Detection, spam filtering, etc.
25
◻ Types of supervised Machine learning
Algorithms:
Supervised learning can be further divided into two types of problems:
Regression

◻ Regression algorithms are used if there is a relationship between the


input variable and the output variable. It is used for the prediction of
continuous variables, such as Weather forecasting, Market Trends, etc.

Below are some popular Regression algorithms which come under supervised
learning:
• Linear Regression
• Regression Trees
• Non-Linear Regression
• Bayesian Linear Regression
• Polynomial Regression
28
Classification

◻ Classification algorithms are used when the output variable is categorical,


which means there are two classes such as Yes-No, Male-Female, True-false,
etc.

Some of the Classification algorithms which come under supervised learning:


are:
• Random Forest
• Decision Trees
• Logistic Regression
• Support vector Machines
30
Regression vs classification
◻ Regression models are used to predict a continuous
value.

◻ Predicting prices of a house given the features of


house like size, price etc is one of the common
examples of Regression. It is a supervised technique.

◻ Classification applied on discrete values.


◻ Example: age vs. height., temperature of a city, house
price
Regression vs classification
2. Unsupervised learning
33

• Unsupervised learning (clustering)


– Class labels of the data are unknown.
– Given a set of data, the task is to establish the existence of classes
or clusters in the data.

◻ Unsupervised learning is a type of machine learning in which


models are trained using unlabeled dataset and are
allowed to act on that data without any supervision.
34
35
Types of unsupervised Algorithm
• Clustering: Clustering is a method of grouping the
objects into clusters such that objects with most
similarities remains into a group and has less or no
similarities with the objects of another group.


Grouped based on the behavior of the data.


Used when we want to group our dataset on the basis
of inherent similarities in the data.
38

• Association: An association rule is an unsupervised learning


method which is used for finding the relationships between
variables in the large database.
• It determines the set of items that occurs together in the
dataset.
• Used when we are trying to explain the dependence
relationship between the data that are usually seen together.
• In simple words, we may want to explain the dependence of
how Y happens after X happens, i.e.: X -> Y.
• A typical example of Association rule is Market Basket
Analysis.
Unsupervised Learning algorithms:

◻ Below is the list of some popular unsupervised learning algorithms:


• K-means clustering
• KNN (k-nearest neighbors)
• Hierarchal clustering
• Anomaly detection
• Neural Networks
• Principle Component Analysis
• Independent Component Analysis
• Apriori algorithm
• Singular value decomposition
Supervised Learning Unsupervised Learning

Supervised learning algorithms are trained using labeled data. Unsupervised learning algorithms are trained using unlabeled
data.

Supervised learning model takes direct feedback to check if it is Unsupervised learning model does not take any feedback.
predicting correct output or not.

Supervised learning model predicts the output. Unsupervised learning model finds the hidden patterns in data.

In supervised learning, input data is provided to the model In unsupervised learning, only input data is provided to the
along with the output. model.

The goal of supervised learning is to train the model so that it The goal of unsupervised learning is to find the hidden patterns
can predict the output when it is given new data. and useful insights from the unknown dataset.

Supervised learning needs supervision to train the model. Unsupervised learning does not need any supervision to train
the model.

Supervised learning can be categorized Unsupervised Learning can be classified


in Classification and Regression problems. in Clustering and Associations problems.
Supervised vs. unsupervised learning
3. Semi supervised learning

It makes use of a small number of labeled data and a large number


of unlabeled data to learn.

The model uses labeled data as an input to make inferences


about the unlabeled data.
4. Reinforcement Learning
43

◻ Reinforcement learning is a machine learning


training method based on rewarding desired
behaviors.
◻ In general, a reinforcement learning agent is able to
perceive and interpret its environment, take actions
and learn through trial and error.
◻ Since there is no training data, machines learn
from their own mistakes and choose the actions
that lead to the best solution or maximum reward.
Reinforcement – Learning from the environment
44
Life cycle of Machine learning
Learning

46

• The classifier to be designed is built using input samples which is


a mixture of all the classes.
• The classifier learns how to discriminate between samples of
different classes.
• If the Learning is offline i.e. Supervised method then, the classifier
is first given a set of training samples and the optimal decision
boundary found, and then the classification is done.
• If the learning is online then there is no teacher and no training
samples (Unsupervised). The input samples are the test samples
itself. The classifier learns and classifies at the same time.
Training and Testing data

◻ Two types of data set in supervised classifier.


? Training set : 70 to 80% of the available data will be used
for training the system.
? In Supervised classification Training data is the data
you use to train an algorithm or machine learning
model to predict the outcome you design your model
to predict.
48

? Testing set : around 20-30% will be used for testing


the system. Test data is used to measure the
performance, such as accuracy or efficiency, of the
algorithm you are using to train the machine.
? Testing is the measure of quality of your algorithm.
? Many a times even after 80% testing, failures can be
see during testing, reason being not good
representation of the test data in the training set.
◻ Unsupervised classifier does not use training
data
Model Selection?
◻ It is the process of selecting the optimal model from the set of
candidate models, for a given data.
◻ Data used in learning:
? Training Set (Usually 60%)
? Validation set – Cross Validation (20%) : Validated on different models on the
same training set.
? A validation set is a set of data used to train artificial intelligence (AI) with the goal of
finding and optimizing the best model to solve a given problem
? Test data (20%) : Unseen data
◻ Model Selection is finding the optimal model which minimizes both
bias and variance.
◻ Bias is the error during training and Variance is the error during
testing
Features and classes
◻ Properties or attributes used to classify the objects are
called features.
◻ A collection of “similar” (not necessarily same) objects
are grouped together as one “class”.
◻ For example:

◻ All the above are classified as character T


◻ Classes are identified by a label.
◻ Most of the pattern recognition tasks are first done by
humans and automated later.
Samples or patterns
◻ The individual items or objects or situations to be
classified will be referred as samples or patterns
or data.
◻ The set of data is called “Data Set”.
• Pattern is anything which has a regular sequence of occurrence. A
pattern can be either through visualization or it can be derived
mathematically by applying algorithms.
• The automated recognition of pattern and regularities in DATA
• Patterns include repeated trends in various forms of data

• Examples : Speech pattern, print on clothes, design of outfits,


jewelry pattern, Sound wave, tree species, fingerprint, face, barcode,
QR-code, handwriting, or character image etc.
53

• Pattern recognition is the process of recognizing


patterns by using a machine learning algorithm.
Pattern recognition can be defined as the classification of
data based on knowledge already gained or on
statistical information extracted from patterns and/or
their representation. Classification of data.
• One of the important aspects of pattern recognition is its
application potential.
• Examples: Speech recognition, speaker identification,
multimedia document recognition (MDR), automatic
medical diagnosis.
The data inputs for pattern recognition can be words or texts,
images, or audio files.
Hence, pattern recognition is broader compared to computer
vision that focuses on image recognition.
In a typical pattern recognition application, the raw data is
processed and converted into a form that is convenient for a
machine to use.
Pattern recognition involves the classification and cluster of
patterns.
Definition of Pattern Recognition
• Pattern recognition is defined as the study of how machines can observe
the environment, learn to distinguish various patterns of interest from their
background, and make logical decisions about the categories of the patterns.
During recognition, the given objects are assigned to a specific category.
• In general, pattern recognition can be described as an information reduction,
information mapping, or information labeling process.
• In computer science, pattern recognition refers to the process of matching
information already stored in a database with incoming data based on
their attributes or features.
Input Data and Output Response for Various Applications

Task of Classification Input Data Output Response

Character Recognition Optical Signals or Strokes Name of the character

Speech Recognition Acoustic Waveforms Name of the word

Speaker Recognition Voice Name of the speaker

Weather Prediction Weather Maps Weather Forecasts

Medical Diagnosis Symptoms Disease

Stock Market Prediction Financial News and Charts Predicted Market ups and Downs
Image Processing Example
• Sorting Fish: incoming fish are sorted
according to species using optical sensing
(sea bass or salmon?)

• Problem Analysis:
▪set up a camera and take some sample
images to extract features
▪ Consider features such as length, lightness,
width, number and shape of fins, position of
mouth, etc.
Preprocessing

A critical step for reliable feature extraction!

Examples:

• Noise removal

• Image enhancement

• Separate touching
or occluding fish
58
• Extract boundary of each
fish
Feature Extraction

• How to choose a good set of features?


–Discriminative features

–Invariant features (e.g., invariant to geometric


transformations such as translation, rotation and scale)
• Are there ways to automatically learn which features
are best ? 59
Feature Extraction (cont’d)

Histogram of “length”

threshold l*

• Even though sea bass is longer than salmon on the average, there are
many examples of fish where this observation does not hold.
Add Another Feature

Lightness is a better feature than length because it reduces the


misclassification error.
Multiple Features

• To improve recognition accuracy, we might need to use more than one


features.
– Single features might not yield the best performance.
– Using combinations of features might yield better performance.

62
• If the feature space cannot be perfectly separated by a
straight line, a more complex boundary might be used.
(non-linear)
• Alternatively a simple decision boundary such as straight
line might be used even if it did not perfectly separate
the classes, provided that the error rates were acceptably
low.
Hyper planes and Hyper surfaces

For two category case, a positive value of discriminant


function decides class 1 and a negative value decides the
other.
• If the number of dimensions is three. Then the decision
boundary will be a plane or a 3-D surface. The decision
regions become semi-infinite volumes
• If the number of dimensions increases to more than
three, then the decision boundary becomes a hyper-
plane or a hyper-surface. The decision regions become
semi-infinite hyperspaces.
Decision Theory
• Can we do better than a linear classifier?
EXAMPLES FOR MACHINE
LEARNING APPLICATIONS
Handwriting Recognition

67
License Plate Recognition

68
Biometric Recognition

69
Face Detection/Recognition

Detection

Matching

Recognition

70
Fingerprint Classification

Important step for speeding up identification

71
Autonomous Systems

Obstacle detection and avoidance


Object recognition

72
Medical Applications

Skin Cancer Detection

73
Land Cover Classification
(using aerial or satellite images)

Many applications including “precision” agriculture.

74
Statistical Decision Theory
◻ Decision theory, in statistics, a set of
quantitative methods for reaching optimal
decisions.
Example for Statistical Decision Theory

• Consider Hypothetical Basket ball Association:


• The prediction could be based on the difference between the
home team’s average number of points per game (apg) and
the visiting team’s ‘apg’ for previous games.
• The training set consists of scores of previously played games,
with each home team is classified as winner or loser
• Now the prediction problem is : given a game to be played,
predict the home team to be a winner or loser using the feature
‘dapg’,
• Where dapg = Home team apg – Visiting team apg
• The figure shown in the previous slide, lists 30 games and gives the value of dapg
for each game and tells whether the home team won or lost.
• Notice that in this data set the team with the higher apg usually wins.

• For example in the 9th game the home team on average, scored 10.8 fewer points in
previous games than the visiting team, on average and also the home team lost.

• When the teams have about the same apg s, the outcome is less certain. For example,
in the 10th game , the home team on average scored 0.4 fewer points than the
visiting team, on average, but the home team won the match.

• Similarly 12th game, the home team had an apg 1.1. less than the visiting team on
average and the team lost.
Histogram of dapg

• Histogram is a convenient way to describe the data.


• To form a histogram, the data from a single class are
grouped into intervals.
• Over each interval rectangle is drawn, with height
proportional to number of data points falling in that interval.
In the example interval is chosen to have width of two units.
• General observation is that, the prediction is not accurate
with single feature ‘dapg’
Lost

Won
Prediction

• To predict normally a threshold value T is used.


• ‘dapg’ > T consider to be win
• ‘dapg’ < T consider to be lost

• T is called decision boundary or threshold.


• If T=-1, four samples in the original data are misclassified.
– Here 3 winners are called losers and one loser is called winner.
• If T=0.8, results in no samples from the loser class being misclassified as winner, but 5
samples from the winner class would be misclassified as loser.
• IF T=-6.5, results no samples from the winner class being misclassified as losers, but 7
samples from the loser would be misclassified as winners.
• By inspection, we see that when a decision boundary is used to classify the samples the
minimum number of samples that are misclassified is four.
• In the above observations, the minimum number of samples misclassified is 4 when T=0.8
Using Additional Feature dwp

• To make it more accurate let us consider two features.


• Additional features often increases the accuracy of classification.
• Along with ‘dapg’ another feature ‘dwp’ is considered.

• wp= winning percentage of a team in previous games


• dwp = difference in winning percentage between teams
• dwp = Home team wp – visiting team wp
• Now observe the results on a scatterplot

• Each sample has a corresponding feature vector (dapg,dwp), which determines its position in the plot.
• Note that the feature space can be classified into two decision regions by a straight line, called a linear decision
boundary. (refer line equation). Prediction of this line is logistic regression.
• If the sample lies above the decision boundary, the home team would be classified as the winner and it is
below the decision boundary it is classified as loser.
85
Decision region and Decision Boundary

• Our goal of Machine learning is to reach an optimal decision


rule to categorize the incoming data into their respective
categories.
• The decision boundary separates points belonging to one class
from points of other.
• The decision boundary partitions the feature space into decision
regions.
• The nature of the decision boundary is decided by the
discriminant function which is used for decision. It is a
function of the feature vector.
Prediction with two parameters.

• Consider the following : springfield (Home team)

• dapg= home team apg – visiting team apg = 98.3-102.9 = -4.6


• dwp = Home team wp – visiting team wp = -21.4-58.1 = -36.7

• Since the point (dapg, dwp) = (-4.6,-36.7) lies below the decision
boundary, we predict that the home team will lose the game.
◻ If the feature space cannot be perfectly separated by a straight line,
a more complex boundary might be used. (non-linear)

◻ Alternatively a simple decision boundary such as straight line


might be used even if it did not perfectly separate the classes,
provided that the error rates were acceptably low.
◻ Having the model shown in previous slide, we can
use it for any type of recognition and classification.
◻ It can be
? speaker recognition
? Speech recognition
? Image classification
? Video recognition and so on…
◻ It is now very important to learn:
? Different techniques to extract the features
? Then in the second stage, different methods to
recognize the pattern and classify
■ Some of them use statistical approach
■ Few uses probabilistic model using mean and variance etc.
■ Other methods are - neural network, deep neural
networks
■ And mixture of some of the above
When do we use Machine Learning?

◻ML is used when:


• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech
recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data
(genomics)
Probabilistic Model in Machine Learning

93

◻ A Probabilistic model in machine learning is a


mathematical representation of a real-world
process that incorporates uncertain or random
variables.
◻ The goal of probabilistic modelling is to estimate
the probabilities of the possible outcomes of a
system based on data or prior knowledge.
94

◻ These models make predictions based on


probability distributions, rather than absolute
values, allowing for a more nuanced and accurate
understanding of complex systems.
◻ Probabilistic models are an essential component of
machine learning, which aims to learn patterns
from data and make predictions on new,
unseen data
PROBABILITY:
INTRODUCTION TO
PROBABILITY
PROBABILITIES OF EVENTS
What is covered?
◻ Basics of Probability
◻ Combination
◻ Permutation
◻ Examples for the above
◻ Union
◻ Intersection
◻ Complement
What is a probability
◻ Probability is the branch of mathematics concerning
numerical descriptions of how likely an event is to
occur

◻ The probability of an event is a number between 0 and


1, where, roughly speaking, 0 indicates that the event
is not going to happen and 1 indicates event happens
all the time.
Experiment

◻ The term experiment is used in probability theory


to describe a process for which the outcome is not
known with certainty.

Example of experiments are:


Rolling a fair six sided die.
Randomly choosing 5 apples from a lot of 100
apples.
Event
◻ An event is an outcome of an experiment. It is
denoted by capital letter. Say E1,E2… or
A,B….and so on

◻ For example toss a coin, H and T are two events.

◻ The event consisting of all possible outcomes of a


statistical experiment is called the “Sample Space”.
Ex: { E1,E2…}
Examples
Sample Space of Tossing a coin = {H,T}
Tossing 2 Coins = {HH,HT,TH,TT}
Random phenomena
– Unable to predict the outcomes, but in the long-run, the outcomes
exhibit statistical regularity.
Examples
1. Tossing a coin – outcomes S ={Head, Tail}

Unable to predict on each toss whether is Head or Tail.


In the long run can predict that 50% of the time heads will occur and 50% of the
time tails will occur
2. Rolling a die – outcomes
S ={ , , , , , }

Unable to predict outcome but in the long run can one can determine that
each outcome will occur 1/6 of the time.
Use symmetry. Each side is the same. One side should not occur more
frequently than another side in the long run. If the die is not balanced this
may not be true.
Example

◻ The die toss:


◻ Simple events: Sample space:
11 E1
S ={E1, E2, E3, E4, E5, E6}
22 E2
S
33 E3 •E1 •E3
44 E4 •E5

55 E5 •E2 •E4 •E6


66 E6
Frequency of an event
104

◻ Frequency of occurrence is measured by:


◻ (after the event has occurred)

• If we let n get infinitely large,


The Probability of an Event

◻ The probability of an event A measures “how often”


A will occur. We write P(A).

◻ P(A) must be between 0 and 1.


? If event A can never occur, P(A) = 0. If event A always occurs
when the experiment is performed, P(A) =1.

? Then P(A) + P(not A) = 1.


? So P(not A) = 1-P(A)

◻ The sum of the probabilities for all simple events in S


equals 1.
Example 1

Toss a fair coin twice. What is the probability


of observing at least one head?

1st Coin 2nd Coin Ei


P(Ei) HH HH 1/4
P(at
H
H P(atleast
least11head)
head)
TT H
HH 1/4
H ==P(E
TT 1/4 P(E11))++P(E
P(E22))++P(E
P(E33))
HH
TT 1/4 ==1/4
1/4++1/4
1/4++1/4
1/4==3/4
3/4
TT HH
TT
Example 2
A bowl contains three colour Ms ®, one red, one blue
and one green. A child selects two M&Ms at
random. What is the probability that at least one is
red?

1st M&M 2nd M&M Ei P(Ei)


m RB
m RB 1/6
m RG
RG 1/6 P(at
P(atleast
least11red)
red)
m BR
BR 1/6
m
m ==P(RB)
P(RB)++P(BR)+
P(BR)+
BG
BG 1/6
m P(RG)
P(RG)++P(GR)
P(GR)
m GB
GB 1/6
m GR
GR 1/6
==4/6
4/6==2/3
2/3
Example 3
The sample space of throwing a pair of dice is
Example 3
Event Simple events Probability

Dice add to 3 (1,2),(2,1) 2/36


Dice add to 6 (1,5),(2,4),(3,3), 5/36
(4,2),(5,1)
Red die show 1 (1,1),(1,2),(1,3), 6/36
(1,4),(1,5),(1,6)
Green die show 1 (1,1),(2,1),(3,1), 6/36
(4,1),(5,1),(6,1)
Permutations

◻ The number of ways you can arrange


n distinct objects, taking them r at a
time is

Example: How many 3-digit lock


combinations can we make from the
numbers 1, 2, 3, and 4?

The order of the choice is


important!
Examples

Example: A lock consists of five parts and can be


assembled in any order. A quality control engineer wants
to test each order for efficiency of assembly. How many
orders are there?

The order of the choice is


important!
It is required to seat 5 men and 4 women in a row so that the women occupy the even
places. How many such arrangements are possible?
112
Solution:

Given: 5 men and 4 women


Total number of people = 9
The women occupy even places, which means they will be sitting in the 2nd, 4th, 6th and
8th places, whereas the men will be sitting in the 1st, 3rd, 5th, 7th and 9th places.

The number of arrangements in which 4 women can sit in 4 places = 4P4 = 4!/(4 – 4)! =
4!/0! = 24/1 = 24

5 men can occupy 5 seats in 5 ways.

That means the number of ways they can be seated = 5P5 = 5!/(5 – 5)! = 5!/0! = 120/1 =
120

Therefore, the total numbers of possible sitting arrangements = 24 × 120 = 2880


Combinations

◻ The number of distinct combinations of n


distinct objects that can be formed, taking
them r at a time is

Example: Three members of a 5-person committee


must be chosen to form a subcommittee. How many
different subcommittees could be formed?

The order of
the choice is
not
important!
114

• A box contains six M&Ms®, four red


and two green. A child selects two M&Ms at random.
What is the probability that exactly one is red?
Example
m
m m
m m m

The order of
the choice is
not important!

4 × 2 =8 ways to choose 1 red and


1 green M&M. P(exactly one red) =
8/15
116

◻ A team of four has to be selected from 6 boys and


4 girls. How many different ways a team can be
selected if at least one boy must be there in the
team?
117

Solution:
Combination of a four-member team with at least one boy are:
{(BGGG), (BBGG), (BBBG), (BBBB)}

Number of ways one boy and three girls can be selected = 6C1 × 4C3 = 6 × 4 = 24

Number of ways two boys and two girls can be selected = 6C2 × 4C2 = 15 × 6 = 90

Number of ways three boys and one girl can be selected = 6C3 × 4C1 = 20 × 4 = 80

Number of ways four boys can be selected = 6C4 = 15

Total number of ways to form such a team = 24 + 90 + 80 + 15 = 209.


◻ So formula for Permutation is : (order is relevant)

◻ Formula for Combination is: (Order is not relevant)


EVENT RELATIONS
An Event , E
The event, E, is any subset of the sample space, S. i.e. any set of
outcomes (not necessarily all outcomes) of the random phenomena
Venn diagram
S
E
The event, E, is said to have occurred if after the
outcome has been observed the outcome lies in E.

S
E
Examples

1. Rolling a die – outcomes


S ={ , , , , , }
={1, 2, 3, 4, 5, 6}

E = the event that an even number is rolled


= {2, 4, 6}

={ , , }
Special Events

The Null Event, is also called as empty event represented


by - φ
φ = { } = the event that contains no outcomes

The Entire Event, The Sample Space - S


S = the event that contains all outcomes
3 Basic Event relations

1. Union if you see the word or,


2. Intersection if you see the word and,
3. Complement if you see the word not.
Union

Let A and B be two events, then the union of A


and B is the event (denoted by A∪ B) defined by:
A ∪ B = {e| e belongs to A or e belongs to B}

A∪ B

A B
The event A ∪ B occurs if the event A occurs or
the event and B occurs or both occurs.

A∪ B

A B
Intersection

Let A and B be two events, then the intersection


of A and B is the event (denoted by A∩B) defined
by:
A ∩ B = {e| e belongs to A and e belongs to B}

A∩B

A B
The event A ∩ B occurs if the event A occurs and
the event and B occurs .

A∩B

A B
Complement

Let A be any event, then the complement of A


(denoted by ) defined by:

= {e| e does not belongs to A}

A
The event occurs if the event A does not
occur

A
Mutually Exclusive

Two events A and B are called mutually


exclusive if:

A B
If two events A and B are mutually exclusive then:

1. They have no outcomes in common.


They can’t occur at the same time. The outcome of the
random experiment can not belong to both A and B.

A B
RULES OF PROBABILITY

ADDITIVE RULE
RULE FOR COMPLEMENTS
Additive rule (General case)

When calculating the probability of either one of two events from


occurring, it is as simple as adding the probability of each event
and then subtracting the probability of both of the events
occurring:

P[A ∪ B] = P[A] + P[B] – P[A ∩ B]


or
P[A or B] = P[A] + P[B] – P[A and B]
The additive rule (Mutually exclusive events) if A ∩ B = φ
P[A ∪ B] = P[A] + P[B]
i.e.
P[A or B] = P[A] + P[B]

if A ∩ B = φ
(A and B mutually exclusive)
Example:
Bangalore and Mohali are two of the cities competing for the National
university games. (There are also many others).

The organizers are narrowing the competition to the final 5 cities.

There is a 20% chance that Bangalore will be amongst the final 5.

There is a 35% chance that Mohali will be amongst the final 5 and
an 8% chance that both Bangalore and Mohali will be amongst the final 5.

What is the probability that Bangalore or Mohali will be amongst the final 5.
Solution:
Let A = the event that Bangalore is amongst the final 5.
Let B = the event that Mohali is amongst the final 5.

Given P[A] = 0.20, P[B] = 0.35, and P[A ∩ B] = 0.08

What is P[A ∪ B]?


Note: “and” ≡ ∩, “or” ≡ ∪ .
Find the probability of drawing an ace or a spade from a deck
of cards.
There are 52 cards in a deck; 13 are spades, 4 are aces.

Probability of a single card being spade is: 13/52 = 1/4.


Probability of drawing an Ace is : 4/52 = 1/13.

Also there is one card that is both a spade and an ace. So the probability of that is
Probability of a single card being both Spade and Ace = 1/52.

Let A = Event of drawing a spade .


Let B = Event drawing Ace.

Given P[A] =1/4, P[B] =1/13, and P[A ∩ B] = 1/52

P[A ∪ B] = 1/4 + 1/13 – 1/52 = ?


Rule for complements

or
Complement
Let A be any event, then the complement of A (denoted by ) defined by:

= {e| e does not belongs to A}

A
The event occurs if the event A does not occur

A
Logic:

A
What Is Conditional Probability?

◻ Conditional probability is defined as the likelihood of an


event or outcome occurring, based on the occurrence of
a previous event or outcome.

◻ Conditional probability is calculated by multiplying the


probability of the preceding event by the updated
probability of the succeeding, or conditional, event.

◻ Bayes' theorem is a mathematical formula used in


calculating conditional probability.
Definition

Suppose that we are interested in computing the probability of


event A and we have been told event B has occurred.
Then the conditional probability of A given B is defined to be:

Similarly, P[B|A] = P[A ∩ B] /P[A]


• From the previous two expressions
P[A ∩ B] = P[B].P[A|B]
And P[A ∩ B] = P[A].P[B|A]
Can also be used to calculate P[A ∩ B]
146

The Multiplication Rule


• In many of the cases, P(A) may not depend on
whether B has occurred. We say that the event A is
independent of B if P(A) = P(A|B).
• An important consequence of the definition of
independence is multiplication rule, which is
obtained by substituting P(A) for P(A|B) in the
above expressions.
• P[A ∩ B] = P[A].P[B] whenever A is independent
of B
Rationale:

If we’re told that event B has occurred then the sample space is restricted to B.
The event A can now only occur if the outcome is in A ∩ B. Hence the new probability
of A is:

A
B

A∩B
An Example

The academy awards is soon to be shown.

For a specific married couple the probability that the


husband watches the show is 80%, the probability that his
wife watches the show is 65%, while the probability that
they both watch the show is 60%.
If the husband is watching the show, what is the
probability that his wife is also watching the show
Solution:
The academy awards is soon to be shown.
Let B = the event that the husband watches the show
P[B]= 0.80
Let A = the event that his wife watches the show
P[A]= 0.65 and
P[A ∩ B]= 0.60 = They both watching the shoe together
Another example
◻ There are 100 Students in a class.
◻ 40 Students likes Apple
? Consider this event as A, So probability of occurrence of A is 40/100 = 0.4
◻ 30 Students likes Orange.
? Consider this event as B, So probability of occurrence of B is 30/100=0.3

◻ Remaining Students does like either Apple nor Orange


◻ 20 Students likes Both Apple and Orange, So probability of Both A and B occurring
is = A intersect B = 20/100 = 0.2

◻ What is the probability of A in B, means what is the probability that A is occurring


given B :

Chances that student who likes orange also likes the apple
P(A|B) = 0.2/0.3 = 0.67

P(A|B) indicates that A


occurring in the sample space
40 20 30 of B.

Here we are not considering


the entire sample space of 100
students, but only 30 students.
Example : Calculating the conditional probability of rain given that the barometric
pressure is high.
Weather record shows that high barometric pressure (defined as being over 760
mm of mercury) occurred on 160 of the 200 days in a data set, and it rained on 20
of the 160 days with high barometric pressure. If we let R denote the event “rain
occurred” and H the event “ High barometric pressure occurred” and use the
frequentist approach to define probabilities.
153

P(H) = 160/200 = 0.8


and P(R and H) = 20/160 = 0.10

We can obtain the probability of rain given high pressure, directly from the
data.

P(R|H) = 20/160 = 0.125

Using conditional probability


P(R|H) = P(R and H)/P(H) = 0.10/0.8 = 0.125.

Conditional probability of that rain has occurred due to high pressure.


Example : In my town, it's rainy one third of the days. Given that it is rainy,
there will be heavy traffic with probability 1/2, and given that it is not rainy,
there will be heavy traffic with probability 1/4. If it's rainy and there is heavy
traffic, I arrive late for work with probability 1/2. On the other hand, the
probability of being late is reduced to 1/8 if it is not rainy and there is no heavy
traffic. In other situations (rainy and no traffic, not rainy and traffic) the
probability of being late is 0.25. You pick a random day.
• What is the probability that it's not raining and there is heavy traffic and I am
not late?
• What is the probability that I am late?
• Given that I arrived late at work, what is the probability that it rained that
day?
Let R be the event that it's rainy, T be the event that there is heavy traffic,
and L be the event that I am late for work. As it is seen from the problem
statement, we are given conditional probabilities in a chain format. Thus, it is
useful to draw a tree diagram for this problem. In this figure, each leaf in the
tree corresponds to a single outcome in the sample space. We can calculate the
probabilities of each outcome in the sample space by multiplying the
probabilities on the edges of the tree that lead to the corresponding outcome
a. The probability that it's not raining and there is heavy traffic and I am not
late can be found using the tree diagram which is in fact applying the chain
rule:
P(Rc∩T∩Lc) =P(Rc)P(T|Rc)P(Lc|Rc∩T)
=2/3⋅1/4⋅3/4
=1/8.
b. The probability that I am late can be found from the tree. All we need to do
is sum the probabilities of the outcomes that correspond to me being late. In
fact, we are using the law of total probability here.

P(L) =P(R and T and L)+P(R and Tc and L) + P(Rc and T and L) +
P(Rc and Tc and L)
=1/12+1/24+1/24+1/16
=11/48.
c. We can find P(R|L) using
P(R|L)=P(R∩L)/P(L)
We have already found P(L)=11/48 and we can find P(R∩L) similarly
by adding the probabilities of the outcomes that belong to R∩L.
In particular,
P(R∩L) =P(R,T,L)+P(R,Tc,L)
=1/12+1/24
=1/8

Thus we obtain
P(R|L) =P(R∩L)/P(L)
=(1/8)/(11/48)
=6/11.
Random Variables

Random variable takes a random value, which is real and can be


finite or infinite and it is generated out of random experiment.
In probability, a real-valued function, defined over the sample space of a
random experiment, is called a random variable.

The random value is generated out of a function.

Example:?
• The distribution function for the number of heads from two flips of a coin.
• The random variable k is defined to be the total number of heads that occur
when a fair coin is flipped two times.
• This random variable can have only 3 values 0,1,2, so it is discrete.
• Distribution function is (T, T), (T, H), (H, T), (H, H)

k P(k)

0 1/4

1 2/4

2 1/4
• Two types of random variables
–Discrete random variables (countable set of possible
outcomes)
–Continuous random variable (unbroken chain of
possible outcomes)

• Discrete
(pmf)
random variables are understood in terms of their probability mass function

• pmf ≡ a mathematical function that assigns probabilities to all possible outcomes for a
discrete random variable.
◻ Two types of random variables
? Discrete random variables
? Continuous random variable
Discrete random variables

◻ If the variable value is finite or infinite but


countable, then it is called discrete random
variable.

◻ Example of tossing two coins and to get the count


of number of heads is an example for discrete
random variable.

◻ Sample space of real values is fixed.


Example: Random variable
• Example : Tossing two coins or ice cream purchase in the following slides are examples for discrete random variable.
Probability of customers purchasing more than 3 ice creams:

• Consider P(X), where X>3

• Then it will be sum of P(X=4) + P(X=5) + P(X=6)=


0.04+0.04+0.02=0.10
• 10 percent of the customers will purchase more than 3 ice creams.

• However Summation of i=1 to n P(Xi)=1


Continuous Random Variable

◻ If the random variable values lies between two certain fixed numbers then it
is called continuous random variable. The result can be finite or infinite.

◻ Sample space of real values is not fixed, but it is in a range.

◻ If X is the random value and it’s values lies between a and b then,

It is represented by : a <= X <= b

Example: Temperature, age, weight, height…etc. ranges between specific


range.
Measuring the amount of rainfall in a city over a year or the average height of a random group of 25 people.
Probability distribution

◻ Frequency distribution is a listing of the observed


frequencies of all the output of an experiment
that actually occurred when experiment was done.

◻ Where as a probability distribution is a listing of


the probabilities of all possible outcomes that
could result if the experiment were done.
(distribution with expectations).
Broad classification of Probability distribution

◻ Discrete probability distribution


? Binomial distribution
? Poisson distribution

◻ Continuous Probability distribution


? Normal distribution
Discrete Probability Distribution:
Binomial Distribution

◻ A binomial distribution can be thought of as


simply the probability of a SUCCESS or
FAILURE outcome in an experiment or survey
that is repeated multiple times. (When we have
only two possible outcomes)
◻ Example, a coin toss has only two possible
outcomes: heads or tails and taking a test could
have two possible outcomes: pass or fail.
The occurrence of a head denotes success, and the occurrence of a tail denotes failure.
◻ Probability of getting a head = 0.5 = Probability of getting a tail since there are only two
possible outcomes.
Assumptions of Binomial distribution
(It is also called as Bernoulli’s Distribution)

Assumptions:
● Random experiment is performed repeatedly with a fixed and
finite number of trials. The number is denoted by ‘n’
● There are two mutually exclusive possible outcome on each
trial, which are know as “Success” and “Failure”.
● Success is denoted by ‘p’ and failure is denoted by ‘q’. and
p+q=1 or q=1-p.
● The outcome of any give trail does not affect the outcomes of
the subsequent trail. That means all trials are independent.
● The probability of success and failure (p&q) remains constant
for all trials. If it does not remain constant then it is not
binomial distribution.
● Example: ?
173

● For example tossing a coin the probability of


getting head or getting a red ball from a pool
of colored balls, here every time after the ball
is taken out it is again replaced to the pool.
● With this assumption let see the formula
Formula for Binomial Distribution

OR

Where P is success and


q is failure
Binomial distribution, generally
Note the general pattern emerging 🡪 if you have only two possible outcomes (call them 1/0 or yes/no or
success/failure) in n independent trials.
Total number of successes X obtained in trials is called a binomial random variable
then the probability of exactly X “successes” P(x) is as follows

n = number of trials

P(x) =
1-p = probability of
failure
X=# p = probability
successes
of success
out of n
trials
Binomial Probability Distribution

■ A fixed number of observations (trials), n


■ e.g., 15 tosses of a coin; 20 patients; 1000 people surveyed
■ A binary outcome
■ e.g., head or tail in each toss of a coin; disease or no disease
■ Generally called “success” and “failure”
■ Probability of success is p, probability of failure is 1 – p
■ Constant probability for each observation
■ e.g., Probability of getting a tail is the same each time we toss the coin
◻ Consider a pen manufacturing company
◻ 10% of the pens are defective

◻ (i)Find the probability that at least 2 pens are


defective in a box of 12
◻ So n=12,
Binomial Distribution: Illustration with example

◻ Consider a pen manufacturing company


◻ 10% of the pens are defective
◻ p=10% = 10/100 = 1/10
◻ q= (1-q) =90/100 = 9/10
◻ X>=2
◻ P(X>=2) = 1- [P(X<2)]
◻ = 1-[P(X=0) +P(X=1)]

◻ (i)Find the probability that exactly 2 pens are defective in a box of 12


◻ So n=12,
◻ p=10% = 10/100 = 1/10
◻ q= (1-q) =90/100 = 9/10
◻ X=2
Binomial distribution: Another example:

◻ If I toss a coin 20 times, what’s the probability


of getting exactly 10 heads?
180
Binomial distribution: example

• If I toss a coin 20 times, what’s the probability of


getting of getting 2 or fewer heads?
182
The Binomial Distribution: another
example

◻ Say 40% of the class is


female.
◻ What is the probability
that 6 of the first 10
students walking in will
be female?
184
Continuous Random Variable

• If the random variable values lies between two certain fixed


numbers then it is called continuous random variable. The result
can be finite or infinite.
• If X is the random value and it’s values lies between a and b then,
It is represented by : a <= X <= b

Example: Temperature, age, weight, height…etc. ranges between


specific range.
Continuous Probability Distributions

◻ When the random variable of interest can take any value in an interval, it is called
continuous random variable.
? Every continuous random variable has an infinite, uncountable number of possible
values (i.e., any value in an interval).

• Examples Temperature on a given day, Length, height, intensity of light falling on a given
region.
◻ The length of time it takes a truck driver to go from New York City to Miami.
◻ The depth of drilling to find oil.
◻ The weight of a truck in a truck-weighing station.
◻ The amount of water in a 12-ounce bottle.
For each of these, if the variable is X, then x>0 and less than some maximum value possible,
but it can take on any value within this range.
Difference between discrete and Continuous value?

Continuous Uniform Distribution
◻ For Uniform distribution, f(x) is constant over the possible value
of x.
◻ Area looks like a rectangle.
◻ For the area in continuous distribution we need to do integration
of the function.
◻ However in this case it is the area of rectangle.
◻ Example to time taken to wash the clothes in a washing machine.
(for a standard condition)
190
191
NORMAL DISTRIBUTION

◻ The most often used continuous probability distribution is the


normal distribution; it is also known as Gaussian distribution
◻ Its graph called the normal curve is the bell-shaped curve.
◻ Such a curve approximately describes many phenomenon occur in
nature, industry and research.
◻ Physical measurement in areas such as meteorological
experiments, rainfall studies and measurement of manufacturing
parts are often more than adequately explained with normal
distribution.
NORMAL DISTRIBUTION

The normal (or Gaussian) distribution, is a very commonly used


(occurring) function in the fields of probability theory, and has wide
applications in the fields of:
- Pattern Recognition;
- Machine Learning;
- Artificial Neural Networks and Soft computing;
- Digital Signal (image, sound , video etc.) processing
- Vibrations, Graphics etc.
194

◻ In a normal distribution, mean (average), median


(midpoint), and mode (most frequent observation)
are equal.
◻ These values represent the peak or highest point.
The distribution then falls symmetrically around
the mean, the width of which is defined by the
standard deviation.
◻ indicates that values near the mean occur more
frequently than the values that are farther away
from the mean.
The probability distribution of the normal variable depends upon the two parameters 𝜇 and 𝜎
– The parameter μ is called the mean or expectation of the distribution.
– The parameter σ is the standard deviation; and variance is thus σ^2.
– Few terms:
• Mode: Repeated terms
• Median : middle data (if there are 9 data, the 5th one is the median)
• Mean : is the average of all the data points
• SD- standard Deviation, indicates how much the data is deviated from the mean.
– Low SD indicates that all data points are placed close by
– High SD indicates that the data points are distributed and are not close by.
• SD given by the formula (S)
• Where S is sample SD
The probability distribution of the normal variable depends upon the two parameters 𝜇 and 𝜎
– The parameter μ is called the mean or expectation of the distribution.
– The parameter σ is the standard deviation; and variance is thus σ^2.

– standard deviation is a measure of the amount of variation or dispersion of a set of values.


– A low standard deviation indicates that the values tend to be close to the mean (
expected value) of the set,
– a high standard deviation indicates that the values are spread out over a wider range.
• The density of the normal variable 𝑥 with mean 𝜇 and variance 𝜎 2 is
1 (𝑥−𝜇)2 ൗ2

𝑒
2𝜎 −∞ < 𝑥 < ∞
𝜎
where 𝜋 = 3.14159
2𝜋
… and 𝑒 = 2.71828 … . ., the Naperian
constant f(x)
𝜎
The Normal distribution 1 −(
x−μ 2
)

(mean μ, standard deviation σ)


e 2 σ 2

f x 2πσ
( )
=

A plot of normal distribution (or


bell-shaped curve) where each
band has a width of 1 standard
deviation – See also:
68–95– 99.7 rule.

Standard Normal Distribution : In the above equation probability is


computed for particular value of x. If you want a range then it has to
be integrated.
For Standard Normal
distribution:
• For standard normal distribution, the area under the given
range is given by:
Problem: Normal distribution
• Consider an electrical circuit in which the voltage is normally
distributed with mean 120 and standard deviation of 3. What is
the probability that the next reading will be between 119 and
121 volts?
202
204
Another problem
206
Difference between PDF and PMF
Joint Distributions and
Densities
• The joint random variables (x,y) signifies that ,
simultaneously, the first feature has the value x and
the second feature has the value y.

• If the random variables x and y are discrete, the joint


distribution function of the joint random variable (x,y)
is the probability of P(x,y) that both x and y occur.
Joint distribution in continuous random variable

• If x and y are continuous, then the probability density function is used over the
region R, where x and y is applied is used.
• It is given by:

• Where the integral is taken over the region R. This integral represents a
volume in the xyp-space.
Probability distributions can be used to describe the
population, just as we described samples .
– Shape: Symmetric, skewed, mound-shaped…
– Outliers: unusual or unlikely measurements
– Center and spread: mean and standard deviation. A population mean is
called μ and a population standard deviation is called σ.
Let x be a discrete random variable with probability distribution p(x). Then
the mean, variance and standard deviation of x are given as
Moments of Random Variables
211

Moments are very useful in statistics because


they tell us much about our data.
• In mathematics, the moments of a function
are quantitative measures related to the
shape of the function's graph.
• The “moments” of a random variable (or of its
distribution) are expected values of powers or
related functions of the random variable.
• If the function represents mass, then the first moment is the center of the
mass, and the second moment is the rotational inertia. The mathematical
concept is closely related to the concept of moment in physics.

• If the function is a probability distribution, then there are four commonly


used moments in statistics
• The first moment is the expected value - measure of center of the data
• The second central moment is the variance - spread of our data about the
mean
• The third standardized moment is the skewness - the shape of the distribution
• The fourth standardized moment is the kurtosis - measures the peakedness or
flatness of the distribution.
213

◻ Skewness refers to the degree of symmetry, or


more precisely, the degree of lack of
symmetry. Distributions, or data sets, are said to
be symmetric if they appear the same on both
sides of a central point.
◻ Kurtosis refers to the proportion of data that is
heavy-tailed or light-tailed in comparison with a
normal distribution
Moment 3: To know the Skewness

In positive
Skewness,
Mean is > median
and
Median>mode

And it is reverse
in case of –ve
skewness
Moment 4 : To know the Kurtosis
D
Normal Distribution
◻ Consider an example of x values:
◻ 4,5,5,6,6,6,7,7,8
◻ Mode, Median and mean all will be equal

◻ = Mode is 6
◻ = Median is 6
◻ = Mean is also 6
Positive Skew
◻ Consider an example of x values:
◻ 5,5,5,6,6,7,8,9,10
◻ (It is an example for Normal Distribution)
◻ = Mode is 5
◻ = Median is 6
◻ = Mean is also 6.8
Let X be a discrete random variable having support Rx = <1, 2> and the pmf is

using this compute Solution:

A central moment is a moment of a probability distribution of


random variables about the random variables Mean (Expected Value).

The kth central moment of X


Formula for Computing Kth Central moment of Random variable

The kth central moment of X


Expected Values of Discrete Random
Variables

• The variance of a discrete random variable x is

• The standard deviation of a discrete random variable x is

222
Let X be a discrete random variable having support x = <1, 2> and the pmf is

Using this compute mean (first order moment)

First order moment is the mean.


Solution:
• Example : Let X be a discrete random variable having support Rx = <1, 2, 3> and pmf is as listed
below, find the 3rd movement.

• The third moment of can be computed as shown:


For example computation of 3rd Order moment

◻ The third central moment of can be computed as follows:


◻ Here X value is 1 and 2 and Probability is ¾ and ¼ respectively. Consider Mean is
5/4
END OF UNIT 1

You might also like