0% found this document useful (0 votes)
62 views72 pages

APS1070 Lecture (5) Slides Annotated

The document provides information about an upcoming midterm assessment for a data analytics course. It details the date, time, and location of the exam, as well as what content will be covered and expectations for student preparation. Students are also encouraged to use online resources like Piazza to ask questions and help their peers study.

Uploaded by

Саша Цой
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views72 pages

APS1070 Lecture (5) Slides Annotated

The document provides information about an upcoming midterm assessment for a data analytics course. It details the date, time, and location of the exam, as well as what content will be covered and expectations for student preparation. Students are also encouraged to use online resources like Piazza to ask questions and help their peers study.

Uploaded by

Саша Цой
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

APS1070

Foundations of Data Analytics and


Machine Learning
Fall 2022

Lecture 5:
• Linear Algebra
• Analytical Geometry
• Data Augmentation

Samin Aref
Mid-term assessment logistics
• Midterm Assessment: Oct 21 at 9:20 at EX 200
• Absence from the assessment or not having student ID= A mark of 0. No excuses.
• Everything from lectures 1-5 (inclusive) are included (the slides and codes for weeks 1-5
plus: same topics from the main textbook, all piazza posts, project 1, tutorials 0-2, and
reading assignments 1-4).
• You may practice with sample midterm questions (the solution is available now on
Quercus) and examples with solutions in the textbook and lecture slides.

I expect that all students who have learned the materials


to do well on the midterm assessment.
Good luck!
2
Mid-term assessment logistics
• Make sure to ask your questions on Piazza
or in Q/A sessions. You may also answer
questions of your classmates on Piazza:
• Top 10 students, ranked on the number of
endorsed answerers on Piazza, will received 2
bonus points added to their final course grade.
A minimum of 3 endorsed answers are required
to qualify for the bonus.

3
Slide Attribution
These slides contain materials from various sources. Special
thanks to the following authors:

• Marc Deisenroth
• Mark Schmidt
• Jason Riordon
Agenda
➢ Linear Algebra
➢ Scalars, Vectors, Matrices
➢ Solving Systems of Linear Equations
➢ Linear Independence
➢ Linear Mappings Today’s Theme:
➢ Analytic Geometry Data Processing
➢ Norms, Inner Products, Lengths, etc.
➢ Angles and Orthonormal Basis

➢ Data Augmentation
5
Part 1
Linear Algebra
Readings:
• Chapter 2.1-5 MML Textbook
Systems of Linear Equations
➢ The solution space of a system of two linear equations with two variables can be
geometrically interpreted as the intersection of two lines
➢ intersection of planes in three variables

x2

4x1+ 4x2=5

2x1- 4x2=1

x1
System in three variables – solution is System with 2 equations and three
at intersection variables – solution is typically a line

7
Matrix Representation
➢ Used to solve systems of linear equations more systematically
➢ Compact notation collects coefficients into vectors, and vectors
into matrices:

𝑎11 𝑎12 𝑎1𝑛 𝑏1


𝑥1 ⋮ + 𝑥2 ⋮ + … + 𝑥𝑛 ⋮ = ⋮
𝑎𝑚1 𝑎𝑚2 𝑎𝑚𝑛 𝑏𝑚

𝑎11 ⋯ 𝑎1𝑛 𝑥1 𝑏1
⟺ ⋮ ⋮ ⋮ = ⋮
𝑎𝑚1 ⋯ 𝑎𝑚𝑛 𝑥𝑛 𝑏𝑚

8
Matrix Notation
➢ A matrix has m x n elements (with 𝑚, 𝑛 ∈ ℕ, and aij, i=1,…,m; j=1,…,n)
which are ordered according to a rectangular scheme consisting of m rows
and n columns:
1 2 n
𝑎11 𝑎12 … 𝑎1𝑛 1
𝑎21 𝑎22 … 𝑎2𝑛 2
𝐴= ⋮ ⋮ ⋮ 𝑎𝑖𝑗 ∈ ℝ
𝑎𝑚1 𝑎𝑚2 … 𝑎𝑚𝑛 m

➢ By convention (1 by n)-matrices are called rows and (m by 1)-matrices are


called columns. These special matrices are also called row/column vectors.
➢ A (1 by 1)-matrices is referred to as scalars 9
Addition and Scalar Multiplication
➢ Vector addition:
𝑎1 𝑏1 𝑎1 + 𝑏 1
𝑎+𝑏 = + =
𝑎2 𝑏2 𝑎2 + 𝑏 2

➢ Scalar multiplication:
𝑏1 ⍺𝑏1
⍺𝑏 = ⍺ =
𝑏2 ⍺𝑏2

10
Addition and Scalar Multiplication
➢ Matrix addition: The sum of two matrices 𝐴 ∈ ℝ𝑚×𝑛 , 𝐵 ∈ ℝ𝑚×𝑛 is
defined as the element-wise sum:

𝑎11 + 𝑏11 … 𝑎1𝑛 + 𝑏1𝑛


𝐴+𝐵 = ⋮ ⋮
𝑎𝑚1 + 𝑏𝑚1 … 𝑎𝑚𝑛 + 𝑏𝑚𝑛

➢ Scalar multiplication of a matrix 𝐴 ∈ ℝ𝑚×𝑛 is defined as:

𝛼 ∗ 𝑎11 … 𝛼 ∗ 𝑎1𝑛
𝛼∗𝐴 = ⋮ ⋮
𝛼 ∗ 𝑎𝑚1 … 𝛼 ∗ 𝑎𝑚𝑛
11
Example: Matrix Multiplication
0 2
1 2 3
➢ For two matrices: 𝐴 = ∈ ℝ2×3 , 𝐵 = 1 −1 ∈ ℝ3×2 ,
3 2 1
0 1
➢ we obtain:
0 2
1 2 3 2 3
𝐴𝐵 = 1 −1 = ∈ ℝ2×2 ,
3 2 1 2 5
0 1

0 2 6 4 2
1 2 3
𝐵𝐴 = 1 −1 = −2 0 2 ∈ ℝ3×3
3 2 1
0 1 3 2 1

Not commutative! 𝐴𝐵 ≠ 𝐵𝐴
12
Basic Properties
➢ A few properties:

➢ Associativity:
∀𝐴 ∈ ℝ𝑚×𝑛 , 𝐵 ∈ ℝ𝑛×𝑝 , 𝐶 ∈ ℝ𝑝×𝑞 : 𝐴𝐵 𝐶 = 𝐴(𝐵𝐶)

➢ Distributivity:
∀𝐴, 𝐵 ∈ ℝ𝑚×𝑛 , 𝐶, 𝐷 ∈ ℝ𝑛×𝑝 : 𝐴 + 𝐵 𝐶 = 𝐴𝐶 + 𝐵𝐶
𝐴 𝐶 + 𝐷 = 𝐴𝐶 + 𝐴𝐷

13
Transpose
➢ Transpose definition: For 𝐴 ∈ ℝ𝑚×𝑛 the matrix 𝐵 ∈ ℝ𝑛×𝑚 with 𝑏𝑖𝑗 =
𝑎𝑗𝑖 is called transpose of A. We write 𝐵 = 𝐴𝑇 .
➢ Symmetric Matrix: A matrix 𝐴 ∈ ℝ𝑛×𝑛 is symmetric if 𝐴 = 𝐴𝑇 .
➢ Some useful identities:
𝐴𝐴−1 = 𝐼 = 𝐴−1 𝐴
(𝐴𝐵)−1 = 𝐵 −1 𝐴−1
(𝐴 + 𝐵)−1 ≠ 𝐴−1 + 𝐵 −1
(𝐴𝑇 )𝑇 = 𝐴
(𝐴 + 𝐵)𝑇 = 𝐴𝑇 + 𝐵𝑇
(𝐴𝐵)𝑇 = 𝐵𝑇 𝐴𝑇
14
Inner Product and Outer Product
➢ The inner product between vectors of the same length is:
𝑛

𝑎𝑇𝑏 = ෍ 𝑎𝑖𝑏𝑖 = 𝑎1𝑏1 + 𝑎2𝑏2 + … + 𝑎𝑛𝑏𝑛 = 𝛾 The inner


product is a
𝑖=1 scalar

➢ The outer product between vectors of the same length is:

𝑎1𝑏1 𝑎1𝑏2 … 𝑎1𝑏𝑛


𝑎𝑏 𝑎2𝑏2 … 𝑎2𝑏𝑛 The outer
𝑎𝑏 𝑇 = 2 1 product is a
⋮ ⋮ ⋮ matrix
𝑎𝑛𝑏1 𝑎𝑛𝑏2 … 𝑎𝑛𝑏𝑛

15
Identity Matrix
➢ We define the identity matrix as shown:

1 0 ⋯ 0 ⋯ 0
0 1 ⋯ 0 ⋯ 0
⋮ ⋮ ⋱ ⋮ ⋱ ⋮
𝐼𝑛 : = ∈ ℝ𝑛×𝑛
0 0 ⋯ 1 ⋯ 0
⋮ ⋮ ⋱ ⋮ ⋱ ⋮
0 0 ⋯ 0 ⋯ 1

➢ Any matrix multiplied by the identity will not change the matrix:
1 2 3 1 0 0 1+0+0 0+2+0 0+0+3
4 5 6 X 0 1 0 = 4+0+0 0+5+0 0+0+6

7 8 9 0 0 1 7+0+0 0+8+0 0+0+9

16
Inverse
➢ If square matrices 𝐴 ∈ ℝ𝑛×𝑛 and 𝐵 ∈ ℝ𝑛×𝑛 have the property that
𝐴𝐵 = 𝐼𝑛 = 𝐵𝐴. Then B is called the inverse of A and denoted by A-1.

➢ Example, these matrices are inverse to each other:

1 2 1 −7 −7 6
𝐴= 4 4 5 , 𝐵= 2 1 −1
6 7 7 4 5 −4

➢ We’ll look at how to calculate the inverse later


17
Solving Systems of Linear Equations
➢ Given A and b, we want to solve for x:
2 1 1 𝑥1 5
𝐴𝑥 = 𝑏 4 −6 0 𝑥2 = −2
−2 7 2 𝑥3 9
➢ Key to solving a system of linear equations are elementary transformations
that keep the solution set the same but transform the equation system into
a simpler form.
1. Exchange of two equations (rows in the matrix)
2. Multiplication of an equation (row) with a constant
3. Addition of two equations (rows)

➢ This is known as Gaussian Elimination (aka row reduction)


18
Triangular Linear Systems
➢ Consider a square linear system with an upper triangular matrix (non-
zero diagonals):
𝑎11 𝑎12 𝑎13 𝑥1 𝑏1
0 𝑎22 𝑎23 𝑥2 𝑏2
0 0 𝑎33 𝑥3 𝑏3

➢ We can solve this system bottom to top using substitution:


𝑏3
𝑎33 𝑥3 = 𝑏3 𝑥3 =
𝑎33
𝑏2 −𝑎23 𝑥3
𝑎22 𝑥2 + 𝑎23 𝑥3 = 𝑏2 𝑥2 =
𝑎22
𝑏1 −𝑎13 𝑥3 −𝑎12 𝑥2
𝑎11 𝑥1 + 𝑎12 𝑥2 + 𝑎13 𝑥3 = 𝑏1 𝑥1 =
𝑎11
19
Example: Gaussian Elimination
➢ Gaussian elimination uses elementary row operations to transform a
linear system into a triangular system:
2𝑥1 + 𝑥2 + 𝑥3 = 5 2 1 1 5
4𝑥1 − 6𝑥2 = −2 4 −6 0อ −2
−2𝑥1 + 7𝑥2 + 2𝑥3 = 9 −2 7 2 9
➢ Add -2 times first row to second
➢ Add 1 times first row to third
2𝑥1 + 𝑥2 + 𝑥3 = 5 2 1 1 5
−8𝑥2 − 2𝑥3 = −12 0 −8 −2อ −12
8𝑥2 + 3𝑥3 = 14 0 8 3 14
➢ Add 1 times second row to third
2𝑥1 + 𝑥2 + 𝑥3 = 5 2 1 1 5 Row
−8𝑥2 − 2𝑥3 = −12 0 −8 −2อ −12 Echelon
0 0 1 2 form
𝑥3 = 2
20
Row Echelon Form (REF)

• The first non-zero coefficient from the left (the “leading


coefficient”) is always to the right of the first non-zero
coefficient in the row above.
• Rows consisting of all zero coefficients are at the bottom of
the matrix.

2 1 1 5 Row
0 −8 −2อ −12 Echelon
0 0 1 2 form
21
Example: Reduced Row Echelon Form
➢ We can simplify this even further:
2𝑥1 + 𝑥2 + 𝑥3 = 5
2 1 1 5
−8𝑥2 − 2𝑥3 = −12 0 −8 −2อ −12
𝑥3 = 2 0 0 1 2

➢ Divide first row by 2


➢ Divide 2nd row by -8 𝑥1 + 0.5𝑥2 + 0.5𝑥3 = 2.5 1 0.5 0.5 2.5
𝑥2 + 0.25𝑥3 = 1.5 0 1 0.25อ 1.5
𝑥3 = 2 0 0 1 2
➢ Add -0.25 times third row to second row
➢ Add -0.5 times third row to first row
➢ Add -0.5 times second row to first row
𝑥1 = 1 1 0 0 1
Reduced row
𝑥2 = 1 0 1 0อ 1
Echelon form
𝑥3 = 2 0 0 1 2
22
The coefficient matrix could be non-square
Example 2.6 from MML book (reading
assignment 4):

Four equations and five unknowns


Alternative Method: Inverse Matrix
➢ We can also solve linear systems of equations (of square
matrices) by applying the inverse.
➢ The solution to 𝐴𝑥 = 𝑏 can be obtained by multiplying by
𝐴−1 to isolate for x.

𝐴𝑥 = 𝑏
𝐴−1 𝐴𝑥 = 𝐴−1 𝑏 Note that 𝐴−1 will cancel out
𝐴 only if multiplied from the
𝐼𝑛 𝑥 = 𝐴−1 𝑏 left-hand side, otherwise we
have 𝐴𝑥𝐴−1
𝑥 = 𝐴−1 𝑏
29
Calculating an Inverse Matrix
1 0 2 0
➢ To determine the inverse of a matrix A 𝐴=
1 1 0 0
1 2 0 1
1 1 1 1

➢ Write down the augmented matrix with 1 0 2 0 1 0 0 0


the identity on the right-hand side 1 1 0 0 0 1 0 0

1 2 0 1 0 0 1 0
1 1 1 1 0 0 0 1

➢ Apply Gaussian elimination to bring it into 1 0 0 0 −1 2 −2 2


reduced row-echelon form. The desired 0 1 0 0 1 −1 2 −2

0 0 1 0 1 −1 1 −1
inverse is given as its right-hand side:
0 0 0 1 −1 0 −1 2
−1 2 −2 2
➢ We can verify that this is indeed the inverse by
1 −1 2 −2
performing the multiplication 𝐴𝐴−1 and 𝐴−1 =
1 −1 1 −1
observing that we recover 𝐼𝑛 . −1 0 −1 2
30
What can go wrong?
➢ Applying Gaussian Elimination (row reduction) does not always lead
to a solution.
➢ Singular Case: When we have a 0 in a pivot column. This is an
example of a matrix that is not invertible.
➢ For example:

2 1 1 1 1 0 0 5
0 0 3อ −2 0 1 0อ 4
0 0 4 2 0 0 0 0
singular singular

31
What can go wrong?
➢ Applying Gaussian Elimination (row reduction) does not always lead
to a solution.
➢ Singular Case: When we have a 0 in a pivot column. This is an
example of a matrix that is not invertible.
➢ For example:

2 1 1 1 1 0 0 5
0 0 3อ −2 0 1 0อ 4
0 0 4 2 0 0 0 0
singular singular

➢ To understand this better it helps to consider matrices from a


geometric perspective.
32
Several Interpretations
➢ Given A and b, we want to solve for x:
2 −1 𝑥 1
𝑨𝒙 = 𝒃 𝑦 =
1 1 5
➢ This can be given several interpretations:
➢ By rows: X is the intersection of hyper-planes:
2𝑥 − 𝑦 = 1
𝑥+𝑦 =5
➢ By columns: X is the linear combination that gives b:
2 −1 1
𝑥 +𝑦 =
1 1 5
➢ Transformation: X is the vector transformed to b:
𝑇 𝑥 =𝑏 33
Geometry of Linear Equations
➢ By Rows: ➢ By Columns:
Find intersection of Find linear combination of
hyperplanes columns 𝑥
2
+𝑦
−1
=
1
1 1 5

Hyperplane
from row 1
b=1.7*Column1+
2.2*Column2
so x=[1.7,2.2]T
Solution(x)
x=[1.7,2.2]T
Hyperplane Column 2
from row 2
Column 1

34
What can go wrong?
➢ By rows:

Hyperplane
from row 2
Hyperplane
from row 2
Hyperplane Hyperplane
from row 1 from row 1

No intersection Infinite intersection


35
No solution
➢ By columns:

Vector not in column


space (no solution)

Columns of matrix

Column-space

36
Infinite solution
➢ By columns:

Columns of matrix

Column-space Vector in column space


(infinite solution)

37
Solutions to Ax=b
➢ Q: In general, when does Ax=b Vector not in column
have a unique solution? space (no solution)

➢ A: When b is in the column- Columns of matrix


space of A, and the columns of
A are linearly independent
Vector in column space
Column-space (infinite solutions)

➢ Q: What does it mean to be


independent?

38
Linear Dependence
➢ A vector is linearly dependent on a set of vectors if it can be written as a
linear combination of them:
𝑐 = 𝛼1 𝑏1 + 𝛼2 𝑏2 + … 𝛼𝑛 𝑏𝑛

➢ We say that c is “linearly dependent” on {b1, b2, …, bn}, and that the set
{c,b1, b2, …, bn} is “linearly dependent”

39
Linear Independence
➢ A set of vectors is either linearly dependent or linearly independent.
➢ If the vectors are independent, then there is no way to represent any of
the vectors as a combination of the others.

40
Linear Dependence vs Independence
➢ Independence in R2:

Dependent Independent

Independent Dependent
41
Linear Independence
➢ Consider we have a set of three vectors {𝑥1 , 𝑥2 , 𝑥3 } 𝜖 ℝ4 1 1 −1
2 1 −2
𝑥1 = , 𝑥2 = , 𝑥3 =
➢ To check whether they are linearly dependent, we write −3 0 1
the vectors 𝑥𝑖 , 𝑖 = 1, 2, 3, as the columns of a matrix and 4 2 1

apply elementary row operations until we identify the


pivot columns.
1 1 −1
2 1 −2
➢ All column vectors are linearly independent if and only −3 0 1
if all columns are pivot columns. 4 2 1

➢ If there is at least one non-pivot column, the vectors are
1 1 −1
linearly dependent. 0 1 0
0 0 1
0 0 0
42
Subspace
➢ Subspaces generated in R2:

set of vectors 𝒜 =
{𝑥1 , … , 𝑥𝑘 } ⊆ 𝒱

span({x1,x2})=line span({x1,x2})=R2 The set of all linear


combinations of vectors in
𝒜 is called the span of 𝒜.

If 𝒜 spans the vector space


V, write 𝑉 = 𝑠𝑝𝑎𝑛[𝒜] or
span({x1,x2})=R2 span({x1,x2,x3})=R2 𝑉 = 𝑠𝑝𝑎𝑛[𝑥1 , … , 𝑥𝑘 ]
43
Basis
➢ The vectors that span a subspace are not unique
➢ However, the minimum number of vectors needed to span a subspace is
unique
➢ This number is called the dimension or rank of the subspace
➢ A minimal set of vectors that span a subspace is called a basis for the
space
➢ The vectors in a basis must be linearly independent, otherwise we
could remove one and still span space

44
Basis
➢ Basis in vector space V ∈ R2:

Every linearly
Basis
independent set
Not a basis
of vectors that
span V is called
a basis of V

Basis Not a basis


45
Example Bases
➢ In ℝ3 , the canonical/standard basis is:
1 0 0
ℬ=൝ 0 , 1 , 0 ൡ
0 0 1

➢ Two different bases of ℝ3 are:

1 1 1 0.5 1.8 −2.2


ℬ1 = ൝ 0 , 1 , 1 ൡ ℬ2 = ൝ 0.8 , 0.3 , −1.3 ൡ
0 0 1 0.4 0.3 3.5

46
Linear Mapping/Transformation
➢ A vector has different coordinate representations
depending on which coordinate system or basis is
chosen. 𝑥

➢ Example: two different coordinate systems defined by two 𝑥 = 2𝑒1 + 3𝑒2


sets of basis vectors. 1 5
𝑥 = − 𝑏1 + 𝑏2
2 2

𝑒2
𝑏2

𝑒1

𝑏1
two different bases 47
Source: Eli Bendersky

Example: Change of Basis Matrix


v2

f u2
u1
v1

2
[u1]v = [ ]
3
4
[u2]v = [ ]
5
2
[f]v= [ ]
4
[f]u=?
Examples of Transforms
➢Scale ➢Rotation ➢Horizontal Mirror

➢Vertical Mirror ➢Combination of Transformations

Source: mathisfun.com 49
Part 2
Analytical Geometry

Readings:
• Chapter 3.1-5,8,9 MML Textbook
Norms
➢ A norm is a scalar measure of a vector’s length.
➢ The most important norm is the Euclidean norm and for 𝑥 ∈
ℝ𝑛 is defined as:
𝑛

𝑥 = 𝑥 2 ≔ ෍ 𝑥𝑖2 = 𝑥𝑇𝑥
𝑖=1

computes the Euclidian distance of x from the origin.

Euclidean norm is also known as the L2 norm


51
Norms
➢ For different norms, the red lines indicate the set of vectors with norm 1.

1
1 𝑥 1 =1 𝑥 2 =1

1 1

Manhattan norm Euclidean distance

52
Dot product
➢ Dot product:
𝑛

𝑥 𝑇 𝑦 = ෍ 𝑥𝑖 𝑦𝑖
𝑖=1
1 3
𝑎1 ∙ 𝑏1 = ∙ = 1 ∙ 3 + 7 ∙ 5 = 38
7 5

➢ Commonly, the dot product between two vectors a, b is


denoted by 𝑎𝑇 𝑏 or 𝑎, 𝑏 .

53
Lengths and Distances
➢ Consider an inner product space.

➢ Then
𝑑 𝑥, 𝑦 ≔ 𝑥 − 𝑦 = 𝑥 − 𝑦, 𝑥 − 𝑦

is called the distance between x and y for 𝑥, 𝑦 ∈ 𝑉.

➢ If we use the Euclidean norm, then the distance is called Euclidean


distance.

54
Angles
➢ The angle 𝜽 between two vectors 𝒙, 𝒚 is computed using the inner product.

➢ For Example: Let us compute the angle between


𝑥 = [1,1]𝑇 ∈ ℝ2 and 𝑦 = [1,2]𝑇 ∈ ℝ2
y
➢ Using the dot product as the inner product we get:
𝑥, 𝑦 𝑥𝑇𝑦 3
cos 𝜃 = = = 1
𝑥, 𝑥 𝑦, 𝑦 𝑥 𝑇 𝑥𝑦 𝑇 𝑦 10
𝜃 x

➢ Then the angle between the two vectors is


3 0 1
cos −1 ( ) ≈ 0.32𝑟𝑎𝑑 , which corresponds to
10
approximately 18°.
55
Orthogonality
➢ Orthonormal = Orthogonal and unit vectors

➢ Orthogonal Matrix: A square matrix 𝐴 ∈ ℝ𝑛×𝑛 is an orthogonal


matrix if and only if its columns are orthonormal so that
𝐴𝐴𝑇 = 𝐼 = 𝐴𝑇 𝐴,
➢ which implies that
𝐴−1 = 𝐴𝑇 ,
i.e., the inverse is obtained by simply transposing the matrix.

56
Orthonormal Basis
➢ In n-dimensional space, we need n basis vectors that are linearly
independent, if these vectors are orthogonal, and each has length 1,
it’s a special case: orthonormal basis
➢ Consider an n-dimensional vector space V and a basis 𝑏1 , … , 𝑏𝑛 of V. If
𝑏𝑖 , 𝑏𝑗 = 0 𝑓𝑜𝑟 𝑖 ≠ 𝑗
𝑏𝑖 , 𝑏𝑖 = 1
for all 𝑖, 𝑗 = 1, … , 𝑛 then the basis is called an orthonormal basis (ONB). Note
that 𝑏𝑖 , 𝑏𝑖 = 1 implies that every basis vector has length/norm 1.

➢ If only 𝑏𝑖 , 𝑏𝑗 = 0 𝑓𝑜𝑟 𝑖 ≠ 𝑗 is satisfied, then the basis is called an orthogonal


basis.
57
Orthonormal Basis
➢ The canonical/standard basis for a Euclidean vector space ℝ𝒏 is
an orthonormal basis, where the inner product is the dot product
of vectors.

➢ Example: In ℝ2 , the vectors:


1 1 1 1
𝑏1 = , 𝑏2 = ,
2 1 2 −1

form an orthonormal basis since 𝑏1𝑇 𝑏2 = 0 and 𝑏1 = 1 = 𝑏2 .

58
Orthogonal Projections

➢ Projections are linear transformations, project to lower dimensional feature space


59
Orthogonal Projections

➢ The projection is defined

𝒃𝑇 𝒙 𝒃𝒃𝑇
𝜋𝑈 𝒙 = 𝜆𝒃 = 𝒃 2
= 2
𝒙
𝒃 𝒃

60
Example: Orthogonal Projections

➢ Compute the projection of 𝑥 = [1,2]𝑇 ∈ ℝ2


onto 𝑏 = [1,1]𝑇 ∈ ℝ2
x

𝒃𝒃𝑇
1
𝜋𝑈 𝒙 = 2
𝒙
𝒃
b

0 1

61
Projection Matrix
➢ We can also use a projection matrix, which allows us to
project any vector x onto the subspace defined by 𝜋. 𝒃𝒃𝑇
𝜋𝑈 𝒙 = 2
𝒙
𝒃
𝒃𝒃𝑇
𝑃𝜋 =
𝒃 2

➢ Note that 𝒃𝒃𝑇 will be a symmetric matrix

62
Example: Applying Projection Matrix

➢ Compute the projection matrix for 𝑏 = [1,1]𝑇 ∈ ℝ2

𝒃𝒃𝑇
𝑃𝜋 =
𝒃 2

1
b

0 1

63
Part 3
Data Augmentation
What do these datasets have in common?

➢ How can we improve these datasets? 65


ML Performance Benchmarks

➢ C10+ and C100+ highlight


the error rates after data
augmentation
➢ Data augmentation found
to consistently lower the
error rates!

66
Non-Representative Data
➢ Everything our algorithms learn comes form
the data used to train them.

➢ If the data is of poor quality, unbalanced or


not representative of the task we want to
solve, then how are our algorithms going to
learn to generalize?

67
Capacity and Training
➢ Deep learning algorithms have the
capacity to classify real images in
various orientations and scales.

➢ If you train your algorithms on


perfectly processed samples, then
they won’t know how to predict
anything but perfectly cropped
images.

68
Data Augmentation
➢ Use linear algebra to perform common GAN Fake Celebrities

transformations to supplement datasets


➢ Translation, Scaling, Rotation, Reflection
➢ Noise, Light and Colour Intensity
➢ Many more…

Source: Viridian Martinez

➢ Advanced:
➢ Generative models (i.e., Deep
learning) to create new images with
similar characteristics
Source: kaggle.com 69
Test Time Data Augmentation
➢ You can also apply data augmentation to better
evaluate your performance on test examples.
➢ Great way to assess limitations of your model to
images of different rotations, scales, noise, etc.

70
Next Time
➢ Week 6 labs on Wednesday and Thursday: Q&A sessions before the midterm
➢ Week 6 sessions on Friday: midterm, no lectures

➢ Project 2 is due on Oct 27th

➢ Week 7: Lecture 7 – Dimensionality Reduction


➢ Curse of Dimensionality
➢ Eigendecomposition
➢ Principle Component Analysis

71
Google Colab

You might also like