0% found this document useful (0 votes)

9 views80 pages

Lecture 2 ML_Maths

Uploaded by

ujjawaltomar77

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views80 pages

Lecture 2 ML_Maths

Uploaded by

ujjawaltomar77

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 80

Essential Mathematics

for Machine Learning

Dr. Dinesh Kumar Vishwakarma
Professor,
Department of Information Technology,
Delhi Technological University, Delhi-110042
[email protected]
http://www.dtu.ac.in/Web/Departments/InformationTechnology/faculty/dkvishw
akarma.php
Areas of math essential to machine learning

 Machine learning is part of both statistics and computer

science
– Probability
– Statistical inference
– Validation
– Estimates of error, confidence intervals
 Linear algebra
– Hugely useful for compact representation of linear
transformations on data
– Dimensionality reduction techniques
 Optimization theory

2
Why worry about the math?

 There are lots of easy-to-use machine learning

packages out there.
 After this course, you will know how to apply
several of the most general-purpose algorithms.

HOWEVER
 To get really useful results, you need good
mathematical intuitions about certain general
machine learning principles, as well as the inner
workings of the individual algorithms.

3
Why worry about the math?

These intuitions will allow you to:

– Choose the right algorithm(s) for the problem
– Make good choices on parameter settings,
validation strategies
– Recognize over- or underfitting
– Troubleshoot poor / ambiguous results
– Put appropriate bounds of confidence /
uncertainty on results
– Do a better job of coding algorithms or
incorporating them into more complex
analysis pipelines
Notation

 aA set membership: a is member of set A

 |B| cardinality: number of items in set B
 || v || norm: length of vector v
  summation
  integral
  the set of real numbers
 n real number space of dimension n
n = 2 : plane or 2-space
n = 3 : 3- (dimensional) space
n > 3 : n-space or hyperspace

5
Notation
 x, y, z, vector (bold, lower case)
u, v
 A, B, X matrix (bold, upper case)
 y = f( x ) function (map): assigns unique value in
range of y to each value in domain of x
 dy / dx derivative of y with respect to single
variable x
 y = f( x ) function on multiple variables, i.e. a
vector of variables; function in n-space
 y / xi partial derivative of y with respect to
element i of vector x

6
The concept of probability

Intuition:
 In some process, several outcomes are possible.
When the process is repeated a large number of
times, each outcome occurs with a characteristic
relative frequency, or probability. If a particular
outcome happens more often than another
outcome, we say it is more probable.

7
The concept of probability

Arises in two contexts:

 In actual repeated experiments.
– Example: You record the color of 1000 cars driving
by. 57 of them are green. You estimate the
probability of a car being green as 57 / 1000 = 0.0057.
 In idealized conceptions of a repeated process.
– Example: You consider the behavior of an unbiased
six-sided die. The expected probability of rolling a 5 is
1 / 6 = 0.1667.
– Example: You need a model for how people’s heights
are distributed. You choose a normal distribution
(bell-shaped curve) to represent the expected relative
probabilities.
8
Probability spaces

A probability space is a random process or experiment with

three components:
– Ω, the set of possible outcomes O
 number of possible outcomes = | Ω | = N
– F, the set of possible events E
 an event comprises 0 to N outcomes
 number of possible events = | F | = 2N
– P, the probability distribution
functionmapping each outcome and event to real number
between 0 and 1 (the probability of O or E)
probabilityof an event is sum of probabilities of possible
outcomes in event

9
Axioms of probability

1. Non-negativity:
for any event E  F, p( E )  0

2. All possible outcomes:

p( Ω ) = 1

3. Additivity of disjoint events:

for all events E, E’  F where E ∩ E’ = ,
p( E U E’ ) = p( E ) + p( E’ )

10
Types of probability spaces

Define | Ω | = number of possible outcomes

 Discrete space | Ω | is finite

– Analysis involves summations (  )

 Continuous space | Ω | is infinite

– Analysis involves integrals (  )

11
Example of discrete probability space

Single roll of a six-sided die

– 6 possible outcomes: O = 1, 2, 3, 4, 5, or 6
– 26 = 64 possible events
 example: E = ( O  { 1, 3, 5 } ), i.e. outcome is odd
– If die is fair, then probabilities of outcomes are equal
p( 1 ) = p( 2 ) = p( 3 ) =
p( 4 ) = p( 5 ) = p( 6 ) = 1 / 6
 example: probability of event E = ( outcome is odd ) is
p( 1 ) + p( 3 ) + p( 5 ) = 1 / 2

12
Example of discrete probability space

Three consecutive flips of a coin

– 8 possible outcomes: O = HHH, HHT, HTH, HTT,
THH, THT, TTH, TTT
– 28 = 256 possible events
 example: E = ( O  { HHT, HTH, THH } ), i.e. exactly two flips
are heads
example: E = ( O  { THT, TTT } ), i.e. the first and third flips
are tails
– If coin is fair, then probabilities of outcomes are equal
p( HHH ) = p( HHT ) = p( HTH ) = p( HTT ) =

p( THH ) = p( THT ) = p( TTH ) = p( TTT ) = 1 / 8

 example: probability of event E = ( exactly two heads ) is
p( HHT ) + p( HTH ) + p( THH ) = 3 / 8

13
Example of continuous probability space

Height of a randomly chosen Indian male

– Infinite number of possible outcomes: O has some
single value in range 2 feet to 8 feet
– Infinite number of possible events
example: E = ( O | O < 5.5 feet ), i.e. individual chosen is less
than 5.5 feet tall
– Probabilities of outcomes are not equal, and are
described by a continuous function, p( O )

p(O)

Height
14
Example of continuous probability space

Height of a randomly chosen Indian male

– Probabilities of outcomes O are not equal, and are
described by a continuous function, p( O )
– p( O ) is a relative, not an absolute probability
 p( O ) for any particular O is zero
 ∫ p( O ) from O = - to  (i.e. area under curve) is 1

 example: p( O = 5’8” ) > p( O = 6’2” )

 example: p( O < 5’6” ) = (  p( O ) from O = - to 5’6” )  0.25

15
Probability distributions

 Discrete: probability mass function (pmf)

example:
sum of two
fair dice

 Continuous: probability density function (pdf)

probability

example:
waiting time between
eruptions of Old Faithful
(minutes)

16
Probability Distribution functions

• A probability function maps the possible values

of x against their respective probabilities of
occurrence, p(x)
• p(x) is a number from 0 to 1.0.
• The area under a probability function is always
1.
Discrete example: roll of a die

p(x)

1/6

x
1 2 3 4 5 6

 P(x)  1
all x
Probability Mass Function (pmf)

x p(x)
1 p(x=1)=1/6

2 p(x=2)=1/6

3 p(x=3)=1/6

4 p(x=4)=1/6

5 p(x=5)=1/6

6 p(x=6)=1/6
1.0
Cumulative distribution function (CDF)

1.0 F(x)
5/6
2/3
1/2
1/3
1/6
1 2 3 4 5 6 x
Cumulative distribution function

x P(x≤A)
1 P(x≤1)=1/6

2 P(x≤2)=2/6

3 P(x≤3)=3/6

4 P(x≤4)=4/6

5 P(x≤5)=5/6

6 P(x≤6)=6/6
Examples

1. What’s the probability that you roll a 3 or less?

P(x≤3)=1/2

2. What’s the probability that you roll a 5 or higher?

P(x≥5) = 1 – P(x≤4) = 1-2/3 = 1/3
Practice Problem

Which of the following are probability functions?

a. f(x)=.25 for x=9,10,11,12

b. f(x)= (3-x)/2 for x=1,2,3,4

c. f(x)= (x2+x+1)/25 for x=0,1,2,3

Answer (a)

a. f(x)=.25 for x=9,10,11,12

x f(x)
9 .25 Yes, probability
function!
10 .25
11 .25

12 .25
1.0
Answer (b)

b. f(x)= (3-x)/2 for x=1,2,3,4

x f(x)
Though this sums to 1,
1 (3-1)/2=1.0 you can’t have a negative
probability; therefore, it’s
2 (3-2)/2=.5 not a probability
function.
3 (3-3)/2=0

4 (3-4)/2=-.5
Answer (c)

c. f(x)= (x2+x+1)/25 for x=0,1,2,3

x f(x)
0 1/25
1 3/25 Doesn’t sum to 1. Thus,
2 7/25 it’s not a probability
function.
3 13/25
24/25
Random variables

 A random variable X is a function that associates a number x with

each outcome O of a process
– Common notation: X( O ) = x, or just X = x
 Basically a way to redefine (usually simplify) a probability space to a
new probability space
– X must obey axioms of probability (over the possible values of x)
– X can be discrete or continuous
 Example: X = number of heads in three flips of a coin
– Possible values of X are 0, 1, 2, 3
– p( X = 0 ) = p( X = 3 ) = 1 / 8 p( X = 1 ) = p( X = 2 ) = 3 / 8
– Size of space (number of “outcomes”) reduced from 8 to 4
 Example: X = average height of five randomly chosen Indian men
– Size of space unchanged (X can range from 2 feet to 8 feet), but
pdf of X different than for single man

27
Multivariate probability distributions

 Scenario
– Several random processes occur (doesn’t matter
whether in parallel or in sequence)
– Want to know probabilities for each possible
combination of outcomes
 Can describe as joint probability of several random
variables

– Example: two processes whose outcomes are

represented by random variables X and Y. Probability
that process X has outcome x and process Y has
: as
outcome y is denoted
p( X = x, Y = y )
28
Example of multivariate distribution

joint probability: p( X = minivan, Y = European ) = 0.1481

29
Multivariate probability distributions

 Marginal probability
– Probability distribution of a single variable in a
joint distribution
– Example: two random variables X and Y:
p( X = x ) = b=all values of Y p( X = x, Y = b )
 Conditional probability
– Probability distribution of one variable given
that another variable takes a certain value
– Example: two random variables X and Y:
p( X = x | Y = y ) = p( X = x, Y = y ) / p( Y = y )

30
Example of marginal probability

marginal probability: p( X = minivan ) = 0.0741 + 0.1111 + 0.1481 = 0.3333

31
Example of conditional probability
conditional probability: p( Y = European | X = minivan ) =
0.1481 / ( 0.0741 + 0.1111 + 0.1481 ) = 0.4433

0.2

0.15
probability

0.1

0.05

American
sport
Asian SUV
European minivan
Y = manufacturer X = model type
sedan

32
Continuous multivariate distribution

 Same concepts of joint, marginal, and conditional

probabilities apply (except use integrals)
 Example: three-component Gaussian mixture in two
dimensions

33
Expected value

• A value calculated before the experiment.

• Consider a discrete random variable X, with
possible values x = x1, x2, … xn
 Probabilities p( X = xi ) that X takes on the
various values of xi
 A function yi = f( xi ) defined on X
 The expected value of f is the probability-
weighted “average” value of f( xi ):
𝐸 𝑓 = ෍ 𝑝 𝑥𝑖 . 𝑓(𝑥𝑖 )
𝑖
Expected value is used when we want to calculate the mean of a probability
distribution.
E.g.

• Let X represent the outcome of a roll of an

unbiased six-sided die.
• The possible values for X are 1, 2, 3, 4, 5, and 6,
each having the probability of occurrence of 1/6.
• The expectation value (or expected value)
of X is:
1 1 1 1 1 1
• 𝐸 𝑋 = ∗1+ ∗2+ ∗3+ ∗4+ ∗5+ ∗
6 6 6 6 6 6
6 = 3.5
Practice questions

Q1. Suppose an individuals thinks that if they quit their job and work for
themselves that there is a 60% chance they could earn $20,000 in their first
year, a 30% chance they could earn $60,000, and a 10% chance they would
earn $0. Calculate the expected value for their income in the first year of
entrepreneurship.
Solution: Expected value = 0.6*$20,000 + 0.3*$60,000 + 0.1*$0 = $30,000

Q2. You are a financial analyst in a development company. Your manager

just asked you to assess the viability of future development projects and
select the most promising one. According to estimates, Project A, upon
completion, shows a probability of 0.4 to achieve a value of $2 million and
a probability of 0.6 to achieve a value of $500,000. Project B shows a
probability of 0.3 to be valued at $3 million and a probability of 0.7 to be
valued at $200,000 upon completion. Which project will be the most
promising to select?
Practice questions

EV (Project A) = [0.4 × $2,000,000] + [0.6 × $500,000] = $1,100,000

EV (Project B) = [0.3 × $3,000,000] + [0.7 × $200,000] = $1,040,000

The EV of Project A is greater than the EV of Project B. Therefore, your

company should select Project A.
Mean

• A calculated value after experiment. Observed

value are required.
• Suppose that in a sequence of ten rolls of the
die, if the outcomes are 5, 2, 6, 2, 2, 1, 2, 3, 6,
1, then the average (arithmetic mean) value is
calculated using mean.
• Which is defined using following formula.
1 𝑛
• 𝜇 = σ𝐼=1 𝑥𝑖
𝑁
5+2+6+2+2+1+2+3+6+1
• 𝑚𝑒𝑎𝑛 = = 3.0
10

Mean is typically used when we want to calculate the average value of a given
sample
38
Variance of a discrete random variable
 The variance of a random variable X is defined by
2
 𝑉𝑎𝑟 𝑋 = 𝐸 𝑋 − 𝜇 2= σ𝑖 𝑥𝑖 − 𝜇 . 𝑝 𝑥𝑖 , 𝜇 =
𝐸 𝑋 .
 Average value of squared deviation of X = xi from
mean , taking into account probability of the
various xi
– Most common measure of “spread” of a
distribution
–  is the standard deviation= 𝑉𝑎𝑟 𝑋
– Compare to formula for variance of an actual
sample
39
Example
Common forms of expected value (3)
 Covariance

– Measures tendency for x and y to deviate from their means in

same (or opposite) directions at same time
no covariance

high (positive)
covariance
 Compare to formula for covariance of actual samples

41
Correlation

 Pearson’s correlation coefficient is covariance normalized

by the standard deviations of the two variables
cov(x, y)
corr(x, y) 
 x y
– Always lies in range -1 to 1
– Only reflects linear dependence between variables
Linear dependence
with noise

Linear dependence
without noise

Various nonlinear
dependencies

42
Types of Relationship

Linear relationships Curvilinear relationships

Y Y

1 2

X X

Y Y
4
3

X X
43
Types of Relationship…

Strong relationships Weak relationships

Y Y

X X

Y Y

X X
44
Types of Relationship…

No relationship

X
Example
1
σ𝑛𝑖=1(𝑥𝑖 − 𝜇𝑥 )(𝑦𝑖 − 𝜇𝑦 )
𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑋, 𝑌 = 𝑁 − 1
𝜎𝑥 . 𝜎𝑦

46
Practice question

Q1. Calculate and analyze the correlation coefficient between the number
of study hours and the number of sleeping hours of different students using
the following data.
Complement rule

Given: event A, which can occur or not

p( not A ) = 1 - p( A )

areas represent relative probabilities

48
Product rule

Given: events A and B, which can co-occur (or not)

p( A, B ) = p( A | B )  p( B )
(same expression given previously to define conditional probability)

areas represent relative probabilities

49
Example of product rule

 Probability that a man has white hair (event A) and

is over 65 (event B)
– p( B ) = 0.18
– p( A | B ) = 0.78
– p( A, B ) = p( A | B )  p( B ) =
0.78  0.18 = 0.14

50
Rule of total probability

Given: events A and B, which can co-occur (or not)

p( A ) = p( A, B ) + p( A, not B )
(same expression given previously to define marginal probability)

areas represent relative probabilities

51
Independence

Given: events A and B, which can co-occur (or not)

p( A | B ) = p( A ) or p( A, B ) = p( A )  p( B )

(not A, not B) (not A, B)

B
(A, not B) A ( A, B )

areas represent relative probabilities

52
Examples of independence / dependence

 Independence:
– Outcomes on multiple rolls of a die
– Outcomes on multiple flips of a coin
– Height of two unrelated individuals
– Probability of getting a king on successive draws from
a deck, if card from each draw is replaced
 Dependence:
– Height of two related individuals
– Duration of successive eruptions of Old Faithful
– Probability of getting a king on successive draws from
a deck, if card from each draw is not replaced

53
Example of independence vs. dependence

 Independence: All manufacturers have identical product

mix. p( X = x | Y = y ) = p( X = x ).
 Dependence: American manufacturers love SUVs,
Europeans manufacturers don’t.

54
Bayes rule

A way to find conditional probabilities for one variable when

conditional probabilities for another variable are known.
p( B | A ) = p( A | B )  p( B ) / p( A )
where p( A ) = p( A, B ) + p( A, not B )
p(B ∩ A) = p(B |A) . p(A) if p(A) > 0

55
Bayes rule

posterior probability  likelihood  prior probability

p( B | A ) = p( A | B )  p( B ) / p( A )

56
Example of Bayes rule

 Marie is getting married tomorrow at an outdoor ceremony in the

desert. In recent years, it has rained only 5 days each year.
Unfortunately, the weatherman is forecasting rain for tomorrow. When
it actually rains, the weatherman has forecast rain 90% of the time.
When it doesn't rain, he has forecast rain 10% of the time. What is the
probability it will rain on the day of Marie's wedding?
 Event A: The weatherman has forecast rain.
 Event B: It rains.
 We know:
– p( B ) = 5 / 365 = 0.0137 [ It rains 5 days out of the year. ]
– p( not B ) = 360 / 365 = 0.9863
– p( A | B ) = 0.9 [ When it rains, the weatherman has forecast
rain 90% of the time. ]
– p( A | not B ) = 0.1 [When it does not rain, the weatherman has
forecast rain 10% of the time.]

57
Example of Bayes rule, cont’d.

 We want to know p( B | A ), the probability it will rain on

the day of Marie's wedding, given a forecast for rain by
the weatherman. The answer can be determined from
Bayes rule:
1. p( B | A ) = p( A | B )  p( B ) / p( A )
2. p( A ) = p( A | B )  p( B ) + p( A | not B )  p( not B ) =
(0.9)(0.014) + (0.1)(0.986) = 0.111
3. p( B | A ) = (0.9)(0.0137) / 0.111 = 0.111

 The result seems unintuitive but is correct. Even when the

weatherman predicts rain, it only rains only about 11% of
the time. Despite the weatherman's gloomy prediction, it
is unlikely Marie will get rained on at her wedding.

58
Probabilities: when to add, when to multiply

 ADD: When you want to allow for occurrence of

any of several possible outcomes of a single
process. Comparable to logical OR.

 MULTIPLY: When you want to allow for

simultaneous occurrence of particular outcomes
from more than one process. Comparable to
logical AND.
– But only if the processes are independent.

59
Practice questions

Q1. 3 persons A, B, C independently fire at a target. What is the

probability that (i) Exactly one of them hits the target, (ii) At least one of
them hits the target? Given: Probability of hitting the target. P(A) = 1/6, P(B)
= 1/4, P(C) = 1/3.

Solution: P(A)=1/6, P(A’)=5/6, P(B)=1/4, P(B’)=3/4,

P(C)=1/3 and P(C’)=2/3

(i) The probability that exactly one of them hits the target (it means only one
will hit the target while others two cannot)
P(A)P(B’)P(C’) + P(A’)P(B)P(C’) + P(A’)P(B’)P(C)
1/6×3/4×2/3 + 5/6×1/4×2/3 + 5/6×3/4×1/3 = 31/72

(ii) The probability that atleast one of them hits the target will be:
1- P(A’)P(B’)P(C’) = 1-[(5/6) ×(3/4) ×(2/3)] = 1-30/72 = 42/72
Practice questions

Q2. A covid test is 99% accurate (both ways). Only 0.3% of the population
is covid+. What is the probability that a random person is covid+ given
that the person tests+?

Solution: Let person is covid+ is taken as A and person tests+ as B hence we

By using equation2
P(person tests+) = P[(person tests+) | (person is covid+)]  P(person is
covid+) + P[(person tests+) | (person is not covid+)]  P(person is not
covid+)
Practice questions

0.3%  0.99%
Solution: = 22.95%
(0.3%  0.99%) + (99.7% 1% )

[ P(is Covid+) = 0.3% and P(is not covid+) = 99.7%]

Practice questions

Q3. Two dice are rolled. Consider the events A = {sum of two dice equals
3}, B = {sum of two dice equals 7 }, and C = {at least one of the dice shows a
1}.
(a) What is P (A | C)?
(b) What is P (B | C)?
(c) Are A and C independent? What about B and C?
Solution:
Note that the sample space is S = {(i, j) | i, j = 1, 2, 3, 4, 5, 6} with each
outcome equally likely.
Then, A = {(1, 2),(2, 1)}
B = {(1, 6),(2, 5),(3, 4),(4, 3),(5, 2),(6, 1)}
C = {(1, 1),(1, 2),(1, 3),(1, 4),(1, 5),(1, 6),(2, 1),(3, 1),(4, 1),(5, 1),(6, 1)}
2/36
Hence, P (A | C) = P (A ∩ C) /P(C) = = 2/11 .
11/36
2/36
P (B | C) = P (B ∩ C)/P(C) = = 2/11 .
11/36
Note that P(A) = 2/36 ≠ P (A | C), so they are not independent. Similarly, P (B)
= 6/36 ≠ P(B | C), so they are not independent.
Linear algebra applications

1) Operations on or between vectors and matrices

2) Coordinate transformations
3) Dimensionality reduction
4) Linear regression
5) Solution of linear systems of equations
6) Many others

Applications 1) – 4) are directly relevant to this

course.

64
Why vectors and matrices?

 Most common form of data

organization for machine Refund Marital
Cheat
learning is a 2D array, where Yes
Status

– rows represent samples No

No
No
No

(records, items, datapoints) Yes No

No Yes

– columns represent attributes No No

Yes No
(features, variables) No Yes
No No
 Natural to think of each sample 10
No Yes

as a vector of attributes, and

whole array as a matrix

65
Vectors

 Definition: an n-tuple of values (usually real

numbers).
– n referred to as the dimension of the vector
– n can be any positive integer, from 1 to infinity
 Can be written in column form or row form
– Column form is conventional
– Vector elements referenced by subscript
x1 
 
x  M x T  x1 L xn 
x 
 n T
means "transpose"

66
Vectors

 Can think of a vector as:

– a point in space or
– a directed line segment with a magnitude and
direction

67
Vector arithmetic

 Addition of two vectors

– add corresponding elements
z  x  y  x1  y1 L xn  yn T
– result is a vector

 Scalar multiplication of a vector

– multiply each element by scalar
y  ax  ax1 L axn 
T

– result is a vector

68
Vector arithmetic

 Dot product of two vectors

– multiply corresponding elements, then add products
n
a  xy   xi yi
i1
– result is a scalar
y
 Dot product alternative form
a  x y  x y cos 
x

69
Matrices

 Definition: an m x n two-dimensional array of

values (usually real numbers).
– m rows
– n columns
 Matrix referenced by two-element subscript
– first element in
a11 L a1n 
subscript is row  
A M O M
– second element in a 
 m1 L a mn 
subscript is column
– example: A24 or a24 is element in second row,
fourth column of A

70
Matrices

 A vector can be regarded as special case of a

matrix, where one of matrix dimensions = 1.
 Matrix transpose (denoted T)
– swap columns and rows
 row 1 becomes column 1, etc.
– m x n matrix becomes n x m matrix
– example:

71
Matrix arithmetic

 Addition of two matrices C  AB 

– matrices must be same size
– add corresponding elements:
cij = aij + b ij
– result is a matrix of same size

 Scalar multiplication of a matrix B  d A 

– multiply each element by scalar:
bij = d  a ij
– result is a matrix of same size

72
Matrix arithmetic

 Matrix-matrix multiplication
– vector-matrix multiplication just a special case

 Multiplication is associative
A(BC)=(AB) C
 Multiplication is not commutative
A  B  B  A (generally)
 Transposition rule:
( A  B )T = B T  A T

73
Matrix arithmetic

 RULE: In any chain of matrix multiplications, the

column dimension of one matrix in the chain must
match the row dimension of the following matrix
in the chain.
 Examples
A3x5 B5x5 C3x1
Right:
A  B  AT CT  A  B AT  A  B C  CT  A
Wrong:
ABA CAB A  AT  B CT  C  A

74
Vector projection

 Orthogonal projection of y onto x

– Can take place in any space of dimensionality > 2
– Unit vector in direction of x is
y
x / || x ||
– Length of projection of y in
direction of x is

|| y || cos( ) x
projx( y )
– Orthogonal projection of
y onto x is the vector
projx( y ) = x  || y ||  cos( ) / || x || =
[ ( x  y ) / || x ||2 ] x (using dot product alternate form)

75
Vector and Matrix Operation
 Consider a vector [𝑥, 𝑦]𝑇
y
(x, y)

 Translation of vector by (a, b) is

𝑥′ 𝑥 𝑎 𝑥+𝑎 x
 = 𝑦 + = 𝑦+𝑏
𝑦′ 𝑏

y (x, y) (x+a, y+b)

𝐸. 𝑔. 𝑜𝑓 𝑇𝑟𝑎𝑛𝑠𝑙𝑎𝑡𝑖𝑜𝑛

2 0 2 4
∗ =
0 1 3 3 x
Vector and Matrix Operation…
 Consider a vector [𝑥, 𝑦]𝑇
y
(x, y)

 Scaling of vector by (a) is

𝑥′ 𝑥 𝑎 𝑥+𝑎 x
 = 𝑦 + = 𝑦+𝑏
𝑦′ 𝑏

y (x, y) (x+a, y+b)

𝐸. 𝑔. 𝑜𝑓 𝑆𝑐𝑎𝑙𝑖𝑛𝑔

2 0 2 4
∗ =
0 2 3 6
x
Matrix Transformation

𝑥′ 𝑥 𝑡𝑥 𝑆𝑥 0 𝑥
• Translation: = 𝑦 + 𝑡𝑦 = ∗ 𝑦
𝑦′ 0 1

𝑥′ 𝐶𝑜𝑠(𝜃) −𝑆𝑖𝑛(𝜃) 𝑥
• Rotation: = ∗ 𝑦
𝑦′ 𝑆𝑖𝑛(𝜃) 𝐶𝑜𝑠(𝜃)

𝑥′ 𝑆𝑥 0 𝑥
• Scaling: = 0 𝑆𝑦 ∗ 𝑦
𝑦′
Transformation of an Image

• An image of a fern-
like fractal (Barnsley's
fern) that exhibits
affine self-similarity.
• Each of the leaves of the
fern is related to each
other leaf by an affine
transformation.
• For instance, the red leaf
can be transformed into
both the dark blue leaf
and any of the light blue
leaves by a combination
of reflection, rotation,
scaling, and
translation.
https://en.wikipedia.org/wiki/Affine_transformation
Optimization theory topics

 Maximum likelihood
 Expectation maximization
 Gradient descent

cs109 Final Cheat 3 PDF
No ratings yet
cs109 Final Cheat 3 PDF
13 pages
Probability_FoundationalMathofAI_S24
No ratings yet
Probability_FoundationalMathofAI_S24
7 pages
Introduction To Discrete Probability Theory and Bayesian Networks
No ratings yet
Introduction To Discrete Probability Theory and Bayesian Networks
26 pages
Orientation - Basic Mathematics and Statistics - Probability
No ratings yet
Orientation - Basic Mathematics and Statistics - Probability
48 pages
PTSP
No ratings yet
PTSP
101 pages
3_prob-review
No ratings yet
3_prob-review
77 pages
Lecture03 Probability Review
No ratings yet
Lecture03 Probability Review
48 pages
Mathematical statistics
No ratings yet
Mathematical statistics
7 pages
ML U3
No ratings yet
ML U3
34 pages
Unit7 Probability Statistics I-1
No ratings yet
Unit7 Probability Statistics I-1
49 pages
Lecture 1
No ratings yet
Lecture 1
81 pages
CENG 222 Statistical Methods For Computer Engineering
No ratings yet
CENG 222 Statistical Methods For Computer Engineering
31 pages
Lecture 9 - Probability COMP7180
No ratings yet
Lecture 9 - Probability COMP7180
58 pages
Intro To Probability (Pattern Recognition)
No ratings yet
Intro To Probability (Pattern Recognition)
94 pages
1234adadvklop32165adada PDF
No ratings yet
1234adadvklop32165adada PDF
55 pages
Math Essentials1234adadvklop32165adada PDF
No ratings yet
Math Essentials1234adadvklop32165adada PDF
55 pages
Math Essentials PDF
No ratings yet
Math Essentials PDF
55 pages
Math Essentials1234adadada PDF
No ratings yet
Math Essentials1234adadada PDF
55 pages
Probability Review
No ratings yet
Probability Review
12 pages
1-ProbabilityReview v3
No ratings yet
1-ProbabilityReview v3
116 pages
ML Unit2-1
No ratings yet
ML Unit2-1
11 pages
2 Probability and Statistics
No ratings yet
2 Probability and Statistics
29 pages
Cs229 Probability Review
No ratings yet
Cs229 Probability Review
36 pages
Probability
No ratings yet
Probability
78 pages
On Probability Theory &stochastic Process
No ratings yet
On Probability Theory &stochastic Process
101 pages
Probability Basics
No ratings yet
Probability Basics
19 pages
Probability Distributions_training
No ratings yet
Probability Distributions_training
43 pages
PTSP PPT
No ratings yet
PTSP PPT
74 pages
Prob__q (1)
No ratings yet
Prob__q (1)
2 pages
Stat 333
No ratings yet
Stat 333
128 pages
Foundations of Machine Learning: Part A: Probability Basics
No ratings yet
Foundations of Machine Learning: Part A: Probability Basics
75 pages
340 Printable Course Notes
No ratings yet
340 Printable Course Notes
184 pages
Probability
No ratings yet
Probability
28 pages
Probability Theory - Towards Data Science
No ratings yet
Probability Theory - Towards Data Science
19 pages
MAS 102_Topic 1
No ratings yet
MAS 102_Topic 1
13 pages
APMA1655
No ratings yet
APMA1655
56 pages
Stochastic Systems: Dr. Farah Haroon
No ratings yet
Stochastic Systems: Dr. Farah Haroon
24 pages
Probability notes
No ratings yet
Probability notes
19 pages
Prob Distribution 1
No ratings yet
Prob Distribution 1
86 pages
PROBABILITY AND RANDOM PROCESSES
No ratings yet
PROBABILITY AND RANDOM PROCESSES
167 pages
TP Lecture1h
No ratings yet
TP Lecture1h
34 pages
Lecture Notes in Probability: Raz Kupferman Institute of Mathematics The Hebrew University April 5, 2009
No ratings yet
Lecture Notes in Probability: Raz Kupferman Institute of Mathematics The Hebrew University April 5, 2009
159 pages
01 Lectureslides ProbTheory
No ratings yet
01 Lectureslides ProbTheory
42 pages
Chapter 2 - Random Variables and Probabi - 2016 - Introduction To Statistical Ma
No ratings yet
Chapter 2 - Random Variables and Probabi - 2016 - Introduction To Statistical Ma
14 pages
Random Variables: 1.1 Elementary Examples
No ratings yet
Random Variables: 1.1 Elementary Examples
14 pages
Probability Cheatsheet
No ratings yet
Probability Cheatsheet
4 pages
Probability and Random Processes 2023
No ratings yet
Probability and Random Processes 2023
43 pages
CPSC531 Probability
No ratings yet
CPSC531 Probability
75 pages
M.E Maths
No ratings yet
M.E Maths
87 pages
MCCC101 - Unit 1-6
No ratings yet
MCCC101 - Unit 1-6
166 pages
Lecture01 Intro Probability Theory
No ratings yet
Lecture01 Intro Probability Theory
80 pages
Probability Review
No ratings yet
Probability Review
5 pages
Probability Distributions
No ratings yet
Probability Distributions
56 pages
Probability and Statistics: Dr. K.W. Chow Mechanical Engineering
No ratings yet
Probability and Statistics: Dr. K.W. Chow Mechanical Engineering
113 pages
Probability Theory and Stochastic Processes
0% (1)
Probability Theory and Stochastic Processes
155 pages
CQF Jan Maths Primer 2013 Probability Blank
No ratings yet
CQF Jan Maths Primer 2013 Probability Blank
84 pages
Module 1 Lecture 4-Probability Distributions
No ratings yet
Module 1 Lecture 4-Probability Distributions
39 pages
Probability Slides
No ratings yet
Probability Slides
45 pages
School of Engineering: ES183 - Engineering Mathematics and Systems Modelling Mathematics
No ratings yet
School of Engineering: ES183 - Engineering Mathematics and Systems Modelling Mathematics
36 pages
Functions and Probability for Sixth Graders
From Everand
Functions and Probability for Sixth Graders
Home School Brew
No ratings yet
Tutorial Sheet 5
No ratings yet
Tutorial Sheet 5
2 pages
Calculus
No ratings yet
Calculus
49 pages
Lecture 3 ML_optimization
No ratings yet
Lecture 3 ML_optimization
32 pages
9113d810-f26b-4264-a78f-505cdb5f10cb (2) (2).ASANAS AND PRANAYAMA
No ratings yet
9113d810-f26b-4264-a78f-505cdb5f10cb (2) (2).ASANAS AND PRANAYAMA
25 pages
Baulkham Hills 2012 2U Prelim HY & Solutions
No ratings yet
Baulkham Hills 2012 2U Prelim HY & Solutions
9 pages
Learning Plan For TTL2
No ratings yet
Learning Plan For TTL2
8 pages
CPS Match 2019 Tag 1
No ratings yet
CPS Match 2019 Tag 1
2 pages
Extra Questions - Sets - Everonn - Class-11th Commerce
No ratings yet
Extra Questions - Sets - Everonn - Class-11th Commerce
579 pages
Lecture 6
No ratings yet
Lecture 6
4 pages
mit18_225_f23_notation_conventions
No ratings yet
mit18_225_f23_notation_conventions
5 pages
Course Title Advanced Calculus & Linear Algebra
No ratings yet
Course Title Advanced Calculus & Linear Algebra
51 pages
Maths II Formulas
No ratings yet
Maths II Formulas
13 pages
Fascinating Triangular Numbers
No ratings yet
Fascinating Triangular Numbers
13 pages
Plane (Geometry) : From Wikipedia, The Free Encyclopedia
No ratings yet
Plane (Geometry) : From Wikipedia, The Free Encyclopedia
7 pages
Fundamental Counting Principle and The Factorial Notation: Lesson 1
No ratings yet
Fundamental Counting Principle and The Factorial Notation: Lesson 1
62 pages
maths cb
No ratings yet
maths cb
36 pages
Ix Maths Dec 2020
No ratings yet
Ix Maths Dec 2020
2 pages
Normanhurst Bhs 2023 - Year - 12-Ext - 2 - Assessment - Task - 4 - Yearly - Trial - Questions - 2
No ratings yet
Normanhurst Bhs 2023 - Year - 12-Ext - 2 - Assessment - Task - 4 - Yearly - Trial - Questions - 2
13 pages
Nonlinear Optimization With GAMS /LGO: János D. Pintér
No ratings yet
Nonlinear Optimization With GAMS /LGO: János D. Pintér
24 pages
1583 Varias Variables 3
No ratings yet
1583 Varias Variables 3
2 pages
Lecture 04 Dijkstra
No ratings yet
Lecture 04 Dijkstra
27 pages
Metric Space
No ratings yet
Metric Space
36 pages
Math 7 1st
100% (1)
Math 7 1st
13 pages
Cbse Archive
No ratings yet
Cbse Archive
3 pages
Ch 20 - Sequences
No ratings yet
Ch 20 - Sequences
4 pages
CO-1-Home Assignment
No ratings yet
CO-1-Home Assignment
3 pages
FEM in Geotech Engineering
No ratings yet
FEM in Geotech Engineering
21 pages
CP3 June 2017
No ratings yet
CP3 June 2017
6 pages
Trigo 2
No ratings yet
Trigo 2
2 pages
model qp 3_250201_092352
No ratings yet
model qp 3_250201_092352
4 pages
Math-21 Analysis 1
No ratings yet
Math-21 Analysis 1
1 page
2024 Preparation for the Final Exam
No ratings yet
2024 Preparation for the Final Exam
5 pages
Application of The Choquet Integral in Multicriteria Decision Making
No ratings yet
Application of The Choquet Integral in Multicriteria Decision Making
27 pages
Sample Problems of Experimental Mathematics
No ratings yet
Sample Problems of Experimental Mathematics
7 pages