Programming Basics and AI Lecture
Programming Basics and AI Lecture
Lectures on YouTube:
https://www.youtube.com/@mathtalent
Seongjai Kim
• mathematical analysis,
• generating computational algorithms,
• profiling algorithms’ accuracy and cost, and
• the implementation of algorithms in selected programming languages
(commonly referred to as coding).
The source code of a program can be written in one or more programming languages.
The manuscript is conceived as an introduction to the thriving field of information engi-
neering, particularly for early-year college students who are interested in mathemat-
ics, engineering, and other sciences, without an already strong background in computa-
tional methods. It will also be suitable for talented high school students. All examples
to be treated in this manuscript are implemented in Matlab and Python, and occasionally
in Maple.
iii
iv
Contents
Title ii
Prologue iii
1 Programming Basics 1
1.1. What is Programming or Coding? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1. Programming: Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2. Simple form of programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.3. Functions: generalization and reusability . . . . . . . . . . . . . . . . . . . . . . 6
1.1.4. Becoming a good programmer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2. Matlab: A Powerful Computer Language . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.1. Introduction to Matlab/Octave . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.2. Graphics with Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.3. Repetition: iteration loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.4. Loop control statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.2.5. Anonymous function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.2.6. Open source alternatives to Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Exercises for Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
v
vi Contents
6 Fundamentals of AI 131
6.1. What is Artificial Intelligence (AI)? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.2. Constituents of AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.3. Designing Artificial Brains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.4. Future of AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Exercises for Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
P Projects 251
P.1. Edge Detection, using Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
P.2. Number Plate Detection, using Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Bibliography 255
Index 257
1
C HAPTER
Programming Basics
Contents of Chapter 1
1.1. What is Programming or Coding? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2. Matlab: A Powerful Computer Language . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Exercises for Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1
2 Chapter 1. Programming Basics
Let Q = 5.
1. initialization: p
2. for i = 1, 2, · · · , itmax
p ← (p + Q/p)/2;
3. end for
squareroot_Q.m
1 Q=5;
2
3 p = 1;
4 for i=1:8
5 p = (p+Q/p)/2;
6 fprintf("%3d %.20f\n",i,p)
7 end
Output
1 1 3.00000000000000000000
2 2 2.33333333333333348136
3 3 2.23809523809523813753
4 4 2.23606889564336341891
5 5 2.23606797749997809888
6 6 2.23606797749978980505
7 7 2.23606797749978980505
8 8 2.23606797749978980505
7 sqsum = 0;
8 for i=m:n
9 sqsum = sqsum + i^2;
10 end
1.1. What is Programming or Coding? 7
• Lines 2–5 of squaresum.m, beginning with the percent sign (%), are for
a convenient user interface. A built–in function help can be utilized
whenever we want to see what the programmer has commented for the
function.
• For example,
help
1 >> help squaresum
2 function sqsum = squaresum(m,n)
3 Evaluates the square sum of consecutive integers: m to n.
4 input: m,n
5 output: sqsum
• The last four lines of squaresum.m include the required operations for
the given task.
• On the command window, the function is called for various m and n.
For example,
1 >> squaresum(1,10)
2 ans = 385
5 %% initial setting
6 S = R;
7
11 %% begin sorting
12 for j=n:-1:2 %index for the largest among remained
13 for i=1:j-1
14 if S(i) > S(i+1)
15 tmp = S(i);
16 S(i) = S(i+1);
17 S(i+1) = tmp;
18 end
19 end
20 end
1.1. What is Programming or Coding? 11
SortArray.m
1 % User parameter
2 n=10;
3
Output
1 >> SortArray
2 R =
3 33 88 75 17 91 94 79 36 2 72
4 S =
5 2 17 33 36 72 75 79 88 91 94
For example,
Vectors and Matrices
1 >> v = [1; 2; 3] % column vector
2 v =
3 1
4 2
5 3
6 >> w = [5, 6, 7, 8] % row vector
7 w =
8 5 6 7 8
9 >> A = [2 1; 1 2] % matrix
10 A =
11 2 1
12 1 2
13 >> B = [2, 1; 1, 2]
14 B =
15 2 1
16 1 2
• The symbols (,) and (;) can be used to combine more than one command
in the same line.
• If we use semicolon (;), Matlab sets the variable but does not print the
output.
For example,
For example,
1 >> c1=[1; 2]; c2=[3; 4];
2 >> M=[c1,c2]
3 M =
4 1 3
5 2 4
6 >> c3=[5; 6];
7 >> M=[M,c3]
8 M =
9 1 3 5
10 2 4 6
11 >> c4=c1; r3=[2 -1 5 0];
12 >> N=[M, c4; r3]
13 N =
14 1 3 5 1
15 2 4 6 2
16 2 -1 5 0
For example,
1 >> M=[1 2 3 4; 5 6 7 8; 9 10 11 12], v=[1;-2;2;1];
2 M =
3 1 2 3 4
4 5 6 7 8
5 9 10 11 12
6 >> M(2,3)
1.2. Matlab: A Powerful Computer Language 15
7 ans =
8 7
9 >> M(3,[2 4])
10 ans =
11 10 12
12 >> M(:,2)
13 ans =
14 2
15 6
16 10
17 >> 3*v
18 ans =
19 3
20 -6
21 6
22 3
23 >> M*v
24 ans =
25 7
26 15
27 23
16 Chapter 1. Programming Basics
For example,
For example,
fig_plot.m
1 close all
2
3 %% a curve
4 X1=linspace(0,2*pi,11); % n=11
5 Y1=cos(X1);
6
7 %% another curve
8 X2=linspace(0,2*pi,51);
9 Y2=sin(X2);
10
11 %% plot together
12 plot(X1,Y1,'-or','linewidth',2, X2,Y2,'-b','linewidth',2)
13 legend({'y=cos(x)','y=sin(x)'})
14 axis tight
15 print -dpng 'fig_cos_sin.png'
The command doc opens the Help browser. If the Help browser is already
open, but not visible, then doc brings it to the foreground and opens a
new tab. Try doc surf, followed by doc contour.
1.2. Matlab: A Powerful Computer Language 19
While loop
The while loop repeatedly executes statements while a specified condi-
tion is true. The syntax of a while loop in Matlab is as follows.
while <expression>
<statements>
end
An expression is true when the result is nonempty and contains all
nonzero elements, logical or real numeric; otherwise the expression is
false.
while a<=b
fprintf(' The value of a=%d\n',a);
a = a+1;
end
When the code above is executed, the result will be:
while loop execution: a=10, b=15
The value of a=10
The value of a=11
The value of a=12
The value of a=13
The value of a=14
The value of a=15
1.2. Matlab: A Powerful Computer Language 21
For loop
A for loop is a repetition control structure that allows you to efficiently
write a loop that needs to execute a specific number of times. The syntax
of a for loop in Matlab is as following:
for index = values
<program statements>
end
Here values can be any list of numbers. For example:
• initval:endval – increments the index variable from initval to
endval by 1, and repeats execution of program statements while in-
dex is not greater than endval.
• initval:step:endval – increments index by the value step on each
iteration, or decrements when step is negative.
Example 1.17. The code in Example 1.16 can be rewritten as a for loop.
%% for loop
a=10; b=15;
fprintf('for loop execution: a=%d, b=%d\n',a,b);
for i=a:b
fprintf(' The value of i=%d\n',i);
end
When the code above is executed, the result will be:
for loop execution: a=10, b=15
The value of i=10
The value of i=11
The value of i=12
The value of i=13
The value of i=14
The value of i=15
22 Chapter 1. Programming Basics
Nested loops
Matlab also allows to use one loop inside another loop. The syntax for a
nested loop in Matlab is as follows:
for n = n0:n1
for m = m0:m1
<statements>;
end
end
The syntax for a nested while loop statement in Matlab is as follows:
while <expression1>
while <expression2>
<statements>;
end
end
For a nested loop, you can combine
• for loop and while loop
• more than two
Note: Loop control statements change execution from its normal se-
quence.
• When execution leaves a scope, all automatic objects that were cre-
ated in that scope are destroyed.
• The scope defines where the variables can be valid in Matlab, typi-
cally a scope within a loop body is from the beginning of conditional
code to the end of conditional code. It tells Matlab what to do when
the conditional code fails in the loop.
• Matlab supports both break statement and continue statement.
1.2. Matlab: A Powerful Computer Language 23
Break Statement
The break statement terminates execution of for or while loops.
• Statements in the loop that appear after the break statement are
not executed.
• In nested loops, break exits only from the loop in which it occurs.
• Control passes to the statement following the end of that loop.
Example 1.18. Let’s modify the code in Example 1.16 to involve a break
statement.
%% "break" statement with while loop
a=10; b=15; c=12.5;
fprintf('while loop execution: a=%d, b=%d, c=%g\n',a,b,c);
while a<=b
fprintf(' The value of a=%d\n',a);
if a>c, break; end
a = a+1;
end
When the code above is executed, the result is:
while loop execution: a=10, b=15, c=12.5
The value of a=10
The value of a=11
The value of a=12
The value of a=13
When the condition a>c is satisfied, break is invoked; which breaks the while
loop to stop.
24 Chapter 1. Programming Basics
Continue Statement
continue passes control to the next iteration of a for or while loop.
• It skips any remaining statements in the body of the loop for the
current iteration; the program continues execution from the next
iteration.
• continue applies only to the body of the loop where it is called.
• In nested loops, continue skips remaining statements only in the
body of the loop in which it occurs.
for i=a:b
if mod(i,2), continue; end % even integers, only
disp([' The value of i=' num2str(i)]);
end
When the code above got executed, the result is:
for loop execution: a=10, b=15
The value of i=10
The value of i=12
The value of i=14
9 %% Calculus
10 q = integral(f,1,3)
Output
1 >> anonymous_function
2 f1 =
3 -2
4 fX =
5 -2 4 22 58 118 208
6 q =
7 12
• 1:20
• 1:1:20
• 1:2:20
• 1:3:20;
• isprime(12)
• isprime(13)
• for i=3:3:30, fprintf('[i,i^2]=[%d, %d]\n',i,i^2), end
for i=3:3:30
1.2. Compose a code and write as a function for the sum of prime numbers not larger than
a positive integer n.
1.3. Modify the function you made in Exercise 2 to count the number of prime numbers
and return the result along with the sum. For multiple output, the function may
start with
function [sum, numver] = <function_name>(inputs)
and n
X
Tn = Sk .
k=1
n fn rn tn
for n ≤ K = 20.
You may start with
Fibonacci_sequence.m
1 K = 20;
2 F = zeros(K);
3 F(1)=1; F(2)=F(1);
4
5 for n=3:K
6 F(n) = F(n-1)+F(n-2);
7 rn = F(n)/F(n-1);
8 fprintf("n =%3d; F = %7d; rn = %.12f\n",n,F(n),rn);
9 end
(b) Find n such that rn has 12-digit decimal accuracy to the golden ratio φ.
Ans: (b) n = 32
28 Chapter 1. Programming Basics
2
C HAPTER
Contents of Chapter 2
2.1. Area Estimation of A Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2. Visualization of Complex-Valued Solutions . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3. Inverse Functions and Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Exercises for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
29
30 Chapter 2. Simple Programming Examples
b · (d − c) − a · (d − c) = b · (d − c) + a · (c − d)
where the sum is carried out over line segments Li and x∗i denotes the
mid value of x on Li .
2.1. Area Estimation of A Region 31
2 · 3 − 2.5 · 2 + 3.5 · 2 − 4 · 3
+6 · 6
−3.5 · 2 + 2.5 · 2
= 6 − 5 + 7 − 12
+36
−7 + 5
= 30
• Let Li be the i-th line segment connecting (xi−1 , yi−1 ) and (xi , yi ), n =
1, 2, · · · , n. Then the area of R can be computed using the formula
n
X
Area(R) = x∗i · ∆y i , (2.4)
i=1
where
xi−1 + xi
x∗i = , ∆y i = yi − yi−1 .
2
Note: The formula (2.4) is a result of Green’s Theorem for the line
integral and numerical approximation.
(a) Generate a dataset that represents the circle of radius centered the
origin. For example, for i = 0, 1, 2, · · · , n,
2π
(xi , yi ) = (cos θi , sin θi ), θi = i · . (2.5)
n
Note that (xn , yn ) = (x0 , y0 ).
(b) Analyze accuracy improvement of the area as n grows. The larger n you
choose, the more accurately the data would represent the region.
Solution.
circle.m
1 n = 10;
2 %%---- Data generation -----------------
3 theta = linspace(0,2*pi,n+1)'; % a column vector
4 data = [cos(theta),sin(theta)];
5
2.1. Area Estimation of A Region 33
20 %%======================================
21 %%---- Read the data -------------------
22 %%======================================
23 DATA = load(filename);
24 X = DATA(:,1);
25 Y = DATA(:,2);
26
27 figure,
28 plot(X,Y,'b--','linewidth',2);
29 daspect([1 1 1]); axis tight
30 xlim([-1 1]), ylim([-1 1]);
31 title(['Circle: n=' int2str(n)])
32 yticks(-1:0.5:1)
33 saveas(gcf,'circle-dashed.png');
34
Accuracy Improvement
1 n = 10; area = 2.938926261462, misfit = 0.202666392127
2 n = 20; area = 3.090169943749, misfit = 0.051422709840
3 n = 40; area = 3.128689300805, misfit = 0.012903352785
4 n = 80; area = 3.138363829114, misfit = 0.003228824476
5 n = 160; area = 3.140785260725, misfit = 0.000807392864
f (x) = x2 − x + 1 = 0, (2.6)
we can easily find that the equation has no real solutions. However,
by using the quadratic formula, the complex-valued solutions are
√
1 ± 3i
x= .
2
Here the question is:
What does the complex-valued solution mean? Can we visualize
the solutions?
C = {x + yi | x, y ∈ R}
√
where i = −1, called the imaginary unit.
• Seeking a real-valued solution of f (x) = 0 is the same as finding
a solution of f (z) = 0, z = x + yi, restricting on the x-axis (y = 0).
• If
f (z) = A(x, y) + B(x, y) i, (2.7)
then the complex-valued solutions are the points x + yi such that
A(x, y) = B(x, y) = 0.
Ans: f (z) = (x2 − x − y 2 + 1) + (2x − 1)y i
36 Chapter 2. Simple Programming Examples
4 syms x y real
5
6 %% z^2 -z +1 = 0
7 A = @(x,y) x.^2-x-y.^2+1;
8 B = @(x,y) (2*x-1).*y;
9 T = 'z^2-z+1=0';
10
24 figure,
25 np=101; X=linspace(-5,5,np); Y=linspace(-5,5,np);
26 contour(X,Y,A(X,Y'), [0 0],'r','linewidth',2), hold on
27 contour(X,Y,B(X,Y'), [0 0],'b--','linewidth',2)
28 plot(double(xs),double(ys),'r.','MarkerSize',30) % the solutions
29 grid on
30 %ax=gca; ax.GridAlpha=0.5; ax.FontSize=13;
31 legend("A=0","B=0")
32 xlabel('x'), ylabel('yi'), title(['Compex solutions of ' T])
33 hold off
34 print -dpng 'complex-solutions-A-B=0.png'
2.2. Visualization of Complex-Valued Solutions 37
Figure 2.3: Two solutions are 1/2 + -3ˆ (1/2)/2 i and 1/2 + 3ˆ (1/2)/2 i.
Remark 2.10. You can easily find the real part and the imaginary
part of polynomials of z = x + iy as follows.
Real and Imaginary Parts
1 syms x y real
2 z = x + 1i*y;
3
4 g = z^2 -z +1;
5 simplify(real(g))
6 simplify(imag(g))
Here “1i” (number 1 and letter i), appeared in Line 2, means the imaginary
√
unit i = −1.
Output
1 ans = x^2 - x - y^2 + 1
2 ans = y*(2*x - 1)
38 Chapter 2. Simple Programming Examples
y = f (x) = 2x + 1. (2.8)
1. f (x) = x2 3. h(x) = x3
2. g(x) = x2 , x ≥ 0
Solution.
Remark 2.17.
The definition says that if f maps x into
y, then f −1 maps y back into x. From
(2.12), we can obtain the cancellation
equations
Example 2.18. For example, if f (x) = x3 , then f −1 (x) = x1/3 and so that
the cancellation equations read
Solution.
6 − 3x
Example 2.22. Find the inverse of the function h(x) = .
5x + 7
Solution.
Solution. Write y = x3 + 2.
Observation 2.24.
• The graph of f −1 is obtained by reflecting the graph of f about the line
y = x.
• (Domain of f −1 ) = (Range of f )
42 Chapter 2. Simple Programming Examples
Example 2.26. Sketch the graph of the function f (x) = 3 − 2x and deter-
mine its domain and range.
Solution.
2.3. Inverse Functions and Logarithms 43
Example 2.27. Table 2.1 shows data for the population of the world in
the 20th century. Figure 2.5 shows the corresponding scatter plot.
• The pattern of the data points in Figure 2.5 suggests an exponential
growth.
• Use an exponential regression algorithm to find a model of the
form
P (t) = a · bt , (2.16)
where t = 0 corresponds to 1900.
Table 2.1
t Population P
(years since 1900) (millions)
0 1650
10 1750
20 1860
30 2070
40 2300
50 2560
60 3040
70 3710
80 4450
90 5280
100 6080 Figure 2.5: Scatter plot for world population
growth.
110 6870
12
13 plot(Data(:,1),Data(:,2),'k.','MarkerSize',20)
14 xlabel('Years since 1900');
15 ylabel('Millions'); hold on
16 print -dpng 'population-data.png'
17 t = Data(:,1);
18 plot(t,a*b.^t,'r-','LineWidth',2)
19 print -dpng 'population-regression.png'
20 hold off
Laws of Exponents
If a > 0 and b > 0, the following rules hold for all real numbers x and y.
1. ax · ay = ax+y 4. ax · bx = (ab)x
ax ax a x
2. y = ax−y 5. x =
a b b
3. (ax )y = (ay )x = axy
The Number e
Of all possible bases for an exponential function, there is one that is
most convenient for the purposes of calculus. The choice of a base a is
influenced by the way the graph of y = ax crosses the y-axis.
• Some of the formulas of calculus will be greatly simplified, if we
choose the base a so that the slope of the tangent line to y = ax
at x = 0 is exactly 1.
• In fact, there is such a number and it is denoted by the letter e.
(This notation was chosen by the Swiss mathematician Leonhard
Euler in 1727, probably standing for exponential.)
• It turns out that the number e lies between 2 and 3:
e ≈ 2.718282 (2.18)
1+x 1 − ex
2
Solution.
Example 2.30. Graph the function y = 12 e−x + 1 and state the domain and
range.
Solution.
2.3. Inverse Functions and Logarithms 47
loga x = y ⇐⇒ ay = x. (2.19)
Solution.
1. Solve y = 2x for x:
x = log2 y
2. Exchange x and y:
y = log2 x
Note:
• Equation (2.19) represents the action of “solving for x”
• The domain of y = loga x must be the range of y = ax , which is (0, ∞).
48 Chapter 2. Simple Programming Examples
• The logarithm with base e is called the natural logarithm and has
a special notation:
loge x = ln x (2.20)
Remark 2.34.
• From your calculator, you can see buttons of LN and LOG , which
represent ln = loge and log = log10 , respectively.
• When you implement a code on computers, the functions ln and
log can be called by “log” and “log10”, respectively.
Properties of Logarithms
x 2 √x 2 + 3
Example 2.35. Use the laws of logs to expand ln .
3x + 1
Solution.
Solution.
50 Chapter 2. Simple Programming Examples
(a) e5−3x = 3.
(b) log3 x + log3 (x − 2) = 1
(c) ln(ln x) = 0
Solution.
Claim 2.38.
(a) Every exponential function is a power of the natural exponential
function.
ax = ex ln a . (2.24)
Hint : You may use the following. You should finish the function area_closed_curve.
Note that the index in Matlab arrays begins with 1, not 0.
heart.m
1 DATA = load('heart-data.txt');
2
3 X = DATA(:,1); Y = DATA(:,2);
4 figure, plot(X,Y,'r-','linewidth',2);
5
6 [m,n] = size(DATA);
7 area = area_closed_curve(DATA);
8
area_closed_curve.m
1 function area = area_closed_curve(data)
2 % compute the area of a region of closed curve
3
4 [m,n] = size(data);
5 area = 0;
6
7 for i=2:m
8 %FILL HERE APPROPRIATELY
9 end
Pn = P0 · (1 + r)n , (2.26)
where n is the elapsed year and r denotes the growth rate per year.
Hint : Applying the natural log to (2.26) reads log(Pn /P0 ) = n log(1 + r). Dividing it
by n and applying the natural exponential function gives 1 + r = exp(log(Pn /P0 )/n),
where Pn = 25495, P0 = 2689, and n = 120.
Ans: (a) r = 0.018921(= 1.8921%). (c) 2056.
3
C HAPTER
Contents of Chapter 3
3.1. Derivative: The Slope of the Tangent Line . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2. Basis Functions and Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.3. Newton’s Method for Zero-Finding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4. Zeros of Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.5. Multi-Variable Functions and the Gradient Vector . . . . . . . . . . . . . . . . . . . . . 81
Exercises for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
53
54 Chapter 3. Programming with Calculus
Note: In the late 16th century, Galileo discovered that a solid object
dropped from rest (initially not moving) near the surface of the earth
and allowed to fall freely will fall a distance proportional to the square
of the time it has been falling.
• This type of motion is called free fall.
• It assumes negligible air resistance to slow the object down, and that
gravity is the only force acting on the falling object.
• If y denotes the distance fallen in feet after t seconds, then the
Galileo’s law of free-fall is
The Galileo’s law of free-fall states that, in the absence of air resistance, all
bodies fall with the same acceleration, independent of their mass.
Average and Instantaneous Speed
Average Speed. When f (t) measures the distance traveled at time t,
distance traveled f (t0 + h) − f (t0 )
Average speed over [t0 , t0 + h] = =
elapsed time (t0 + h) − t0
(3.2)
Instantaneous Speed. For h very small,
f (t0 + h) − f (t0 )
Instantaneous speed at t0 ≈ (3.3)
h
3.1. Derivative: The Slope of the Tangent Line 55
Solution.
free_fall.m
1 syms f(t) Q(h) %also, views t and h as symbols
2
27
Difference Quotient at t0 = 1
1 h= -0.10000; Q(h) = 30.40000000
2 h= -0.01000; Q(h) = 31.84000000
3 h= -0.00100; Q(h) = 31.98400000
4 h= -0.00010; Q(h) = 31.99840000
5 h= -0.00001; Q(h) = 31.99984000
6 h= 0.10000; Q(h) = 33.60000000
7 h= 0.01000; Q(h) = 32.16000000
8 h= 0.00100; Q(h) = 32.01600000
9 h= 0.00010; Q(h) = 32.00160000
10 h= 0.00001; Q(h) = 32.00016000
Solution. Let’s first try to find the slope, as the limit of the difference
quotient.
Definition 3.4.
The slope of the curve y = f (x) at the point P (x0 , f (x0 )) is the number
f (x0 + h) − f (x0 )
lim (provided the limit exists). (3.5)
h→0 h
The tangent line to the curve at P is the line through P with this slope.
58 Chapter 3. Programming with Calculus
secant_lines_abs_x2_minus_1.m
1 syms f(x) Q(h) %also, views t and h as symbols
2
3 f(x)=abs(x.^2-1); x0=1;
4 figure, fplot(f(x),[x0-3,x0+1.5], 'k-','LineWidth',3)
5 hold on
6
7 Q(h) = (f(x0+h)-f(x0))/h;
8 S(x,h) = Q(h)*(x-x0)+f(x0); % Secant line
9 %%---- Secant Lines, with Various h ------
10 for h0 = [-0.5 -0.25 -0.1 0.1 0.25 0.5]
11 fplot(S(x,h0),[x0-1,x0+1], 'b--','LineWidth',2)
12 plot([x0+h0],[f(x0+h0)],'b.','markersize',25)
13 end
14 plot([x0],[f(x0)],'r.','markersize',35)
15 daspect([1 2 1])
16 axis tight, grid on
17 ax=gca; ax.FontSize=15; ax.GridAlpha=0.5;
18 hold off
19 print -dpng 'secant-y=abs-x2-1.png'
3.1. Derivative: The Slope of the Tangent Line 59
f (x + h) − f (x)
Example 3.6. Find and simplify the difference quotients
h
for the functions, and then apply lim .
h→0
Solution.
f (x) = x ⇒ f 0 (x) = 1
f (x) = x2 ⇒ f 0 (x) = 2x
f (x) = x3 ⇒ f 0 (x) = 3x2
.. ..
. .
f (x) = xn ⇒ f 0 (x) = nxn−1
60 Chapter 3. Programming with Calculus
Solution.
Example 3.11. Use the product rule (3.8) to find the derivative of the
function
f (x) = x6 = x2 · x4
Solution.
Example 3.12. Does the curve y = x4 −2x2 +2 have any horizontal tangent
line? Use the information you just found, to sketch the graph.
Solution.
62 Chapter 3. Programming with Calculus
Rules of Derivative
Example 3.13. Consider a computer program.
derivative_rules.m
1 syms n a b real
2 syms u(x) v(x)
3 syms f(x) Q(h)
4
Ans: (b) (3x − 1)6 (6x + 5) /x6
64 Chapter 3. Programming with Calculus
{xn | n = 0, 1, 2, · · · } (3.13)
Example 3.20. Taking all the coefficients to be 1 in (3.14) gives the geo-
metric series ∞
X
xn = 1 + x + x2 + · · · + xn + · · · ,
n=0
which converges to 1/(1 − x) for |x| < 1. That is,
1
= 1 + x + x2 + · · · + xn + · · · , |x| < 1. (3.16)
1−x
66 Chapter 3. Programming with Calculus
Theorem
X 3.22. The Ratio Test:
Let an be any series and suppose that
an+1
lim = ρ. (3.17)
n→∞ an
P
(a) If ρ < 1, then the series converges absolutely. ( |an | converges)
(b) If ρ > 1, then the series diverges.
(c) If ρ = 1, then the test is inconclusive.
Example 3.23. For what values of x do the following power series con-
verge?
∞ n ∞
X
n−1 x x2 x3 X xn x2 x3
(a) (−1) = x− + −··· (b) = 1+x+ + + ···
n=1
n 2 3 n=0
n! 2! 3!
Solution.
∞
X
f (x) = cn (x − a)n on |x − a| < R. (3.18)
n=0
This function f has derivatives of all orders inside the interval, and we
obtain the derivatives by differentiating the original series term by term:
∞
X
0
f (x) = ncn (x − a)n−1 ,
n=1
∞
X (3.19)
f 00 (x) = n(n − 1)cn (x − a)n−2 ,
n=2
and so on. Each of these derived series converges at every point of the
interval a − R < x < a + R.
Series Representations
• Thus, when x = a,
Taylor Polynomials
Definition 3.28. Let f be a function with derivatives of order k =
1, 2, · · · , N in some interval containing a as an interior point. Then for
any integer n from 0 through N , the Taylor polynomial of order n
generated by f at x = a is the polynomial
Example 3.29. Find the Taylor series and Taylor polynomials generated
by f (x) = cos x at x = 0.
Solution. The cosine and its derivatives are
f (x) = cos x f 0 (x) = − sin x
f 00 (x) = − cos x f (3) (x) = sin x
.. ..
. .
f (x) = (−1)n cos x
(2n)
f (2n+1)
(x) = (−1)n+1 sin x.
At x = 0, the cosines are 1 and the sines are 0, so
f (2n) (0) = (−1)n , f (2n+1) (0) = 0. (3.27)
The Taylor series generated by cos x at x = 0 is
1 2 0 3 1 4 x2 x4 x6
1+0·x− x + x + x + ··· = 1 − + − + ··· (3.28)
2! 3! 4! 2! 4! 6!
Note: The interval of convergence can be verified using e.g. the ratio
test, presented in Theorem 3.22, p. 66.
sin x
Self-study 3.30. Plot the sinc function f (x) = and its Taylor poly-
x
nomials of order 4, 6, and 8, about x = 0.
Solution. Hint : Use e.g., syms x; T4 = taylor(sin(x)/x,x,0,’Order’,5). Here “Or-
der” means the leading order of truncated terms.
3.3. Newton’s Method for Zero-Finding 71
f (p) = 0. (3.30)
0 (p − p0 )2 00
0 = f (p) = f (p0 + h) = f (p0 ) + (p − p0 )f (p0 ) + f (ξ), (3.31)
2
where ξ lies between p and p0 .
• If |p − p0 | is small, it is reasonable to ignore the last term of (3.31) and
solve for h = p − p0 :
f (p0 )
h = p − p0 ≈ − 0 . (3.32)
f (p0 )
• Define
f (p0 )
p1 = p0 − ; (3.33)
f 0 (p0 )
then p1 may be a better approximation of p than p0 .
• The above can be repeated.
Graphical interpretation
• Let p0 be the initial approximation close to p. Then, the tangent line
at (p0 , f (p0 )) reads
L(x) = f 0 (p0 )(x − p0 ) + f (p0 ). (3.35)
Remark 3.33.
• The Newton’s method may diverge, unless the initialization is accu-
rate.
• It cannot be continued if f 0 (pn−1 ) = 0 for some n. As a matter of fact,
the Newton’s method is most effective when f 0 (x) is bounded away
from zero near p.
Since p = 0, en = pn and
|en | ≤ 0.67|en−1 |3 , (3.41)
which is an occasional super-convergence.
Theorem 3.36. (Newton’s Method for a Convex Function): Let
f ∈ C 2 (R) be increasing, convex, and of a zero. Then, the zero is unique
and the Newton iteration will converge to it from any starting point.
Example 3.37. Use the Newton’s method to find the square root of a
positive number Q.
√
Solution. Let x = Q. Then x is a root of x2 − Q = 0. Define f (x) = x2 − Q;
set f 0 (x) = 2x. The Newton’s method reads
f (pn−1 ) p2n−1 − Q 1 Q
pn = pn−1 − 0 = pn−1 − = pn−1 + . (3.42)
f (pn−1 ) 2pn−1 2 pn−1
mysqrt.m
1 function x = mysqrt(q)
2 %function x = mysqrt(q)
3
4 x = (q+1)/2;
5 for n=1:10
6 x = (x+q/x)/2;
7 fprintf('x_%02d = %.16f\n',n,x);
8 end
3.3. Newton’s Method for Zero-Finding 75
Results
1 >> mysqrt(16); 1 >> mysqrt(0.1);
2 x_01 = 5.1911764705882355 2 x_01 = 0.3659090909090910
3 x_02 = 4.1366647225462421 3 x_02 = 0.3196005081874647
4 x_03 = 4.0022575247985221 4 x_03 = 0.3162455622803890
5 x_04 = 4.0000006366929393 5 x_04 = 0.3162277665175675
6 x_05 = 4.0000000000000506 6 x_05 = 0.3162277660168379
7 x_06 = 4.0000000000000000 7 x_06 = 0.3162277660168379
8 x_07 = 4.0000000000000000 8 x_07 = 0.3162277660168379
9 x_08 = 4.0000000000000000 9 x_08 = 0.3162277660168379
10 x_09 = 4.0000000000000000 10 x_09 = 0.3162277660168379
11 x_10 = 4.0000000000000000 11 x_10 = 0.3162277660168379
• Substituting the above into (3.46), utilizing (3.43), and setting equal
the coefficients of like powers of x on the two sides of the resulting
equation, we have
bn = an
bn−1 = an−1 + x0 bn
.. (3.48)
.
b1 = a1 + x 0 b 2
P (x0 ) = a0 + x0 b1
• Introducing b0 = P (x0 ), the above can be rewritten as
reads
P 0 (x) = Q(x) + (x − x0 )Q0 (x). (3.52)
Thus
P 0 (x0 ) = Q(x0 ). (3.53)
That is, the evaluation of Q at x0 becomes the desired quantity P 0 (x0 ).
3.4. Zeros of Polynomials 79
Example 3.44. Evaluate P 0 (3) for P (x) considered in Example 3.41, the
previous example.
Solution. As in the previous example, we arrange the calculation and carry
out the synthetic division one more time:
5 n = size(A(:),1);
6 p = A(n); d=0;
7
8 for i = n-1:-1:1
9 d = p + x0*d;
10 p = A(i) +x0*p;
11 end
Call_horner.m
1 a = [-2 -5 7 -4 1];
2 x0=3;
3 [p,d] = horner(a,x0);
4 fprintf(" P(%g)=%g; P'(%g)=%g\n",x0,p,x0,d)
5 Result: P(3)=19; P'(3)=37
80 Chapter 3. Programming with Calculus
5 x = x0;
6 for it=1:itmax
7 [p,d] = horner(A,x);
8 h = -p/d;
9 x = x + h;
10 if(abs(h)<tol), break; end
11 end
Call_newton_horner.m
1 a = [-2 -5 7 -4 1];
2 x0=3;
3 tol = 10^-12; itmax=1000;
4 [x,it] = newton_horner(a,x0,tol,itmax);
5 fprintf(" newton_horner: x0=%g; x=%g, in %d iterations\n",x0,x,it)
6 Result: newton_horner: x0=3; x=2, in 7 iterations
Figure 3.5: Polynomial P (x) = x4 − 4x3 + 7x2 − 5x − 2. Its two zeros are −0.275682 and 2.
3.5. Multi-Variable Functions and the Gradient Vector 81
√
Ans: f (3, 2) = 6/2; D = {(x, y) : x + y + 1 ≥ 0, x 6= 1}
Problemp3.50. Find the domain and the range of
f (x, y) = 9 − x2 − y 2 .
Solution.
82 Chapter 3. Programming with Calculus
Figure 3.6: Ordinary derivative f 0 (a) and partial derivatives fx (a, b) and fy (a, b).
Let f be a function of two variables (x, y). Suppose we let only x vary while
keeping y fixed, say y = b . Then g(x) := f (x, b) is a function of a single
variable. If g is differentiable at a, then we call it the partial derivative
of f with respect to x at (a, b) and denoted by fx (a, b).
g(a + h) − g(a)
g 0 (a) = lim
h→0 h
(3.54)
f (a + h, b) − f (a, b)
= lim =: fx (a, b).
h→0 h
3.5. Multi-Variable Functions and the Gradient Vector 83
G(b + h) − G(b)
G0 (b) = lim
h→0 h
(3.55)
f (a, b + h) − f (a, b)
= lim =: fy (a, b).
h→0 h
p
Problem 3.51. Find fx (0, 0), when f (x, y) = 3
x3 + y 3 .
Solution. Using the definition,
f (h, 0) − f (0, 0)
fx (0, 0) = lim
h→0 h
Ans: 1
Definition 3.52. If f is a function of two variables, its partial deriva-
tives are the functions fx = ∂f ∂f
∂x and fy = ∂y defined by:
∂f f (x + h, y) − f (x, y)
fx (x, y) = (x, y) = lim and
∂x h→0 h
(3.56)
∂f f (x, y + h) − f (x, y)
fy (x, y) = (x, y) = lim .
∂y h→0 h
84 Chapter 3. Programming with Calculus
Problem 3.57. If f (x, y) = sin(x) + exy , find ∇f (x, y) and ∇f (0, 1).
Solution.
Ans: h2, 0i
Claim 3.58. The gradient direction is the direction where the function
changes fastest, more precisely, increases fastest!
Solution.
86 Chapter 3. Programming with Calculus
3.1. In Example 3.5, we considered the curve y = |x2 − 1|. Find the left-hand limit and
right-hand slope of the difference quotient at x0 = 1.
Ans: −2 and 2.
3.2. The number e is determined so that the slope of the graph of y = ex at x = 0 is exactly
1. Let h be a point near 0. Then
eh − e0 eh − 1
Q(h) := =
h−0 h
represents the average slope of the graph between the two points (0, 1) and (h, eh ).
Evaluate Q(h), for h = 0.1, 0.01, 0.001, 0.0001. What can you say about the results?
Ans: For example, Q(0.01) = 1.0050.
√
3.3. Recall the Taylor series for ex , cos x and sin x in (3.29). Let x = iθ, where i = −1.
Then
i2 θ2 i3 θ3 i4 θ4 i5 θ5 i6 θ6
eiθ = 1 + iθ + + + + + + ··· (3.58)
2! 3! 4! 5! 6!
(a) Prove that eiθ = cos θ + i sin θ, which is called the Euler’s identity.
(b) Prove that eiπ + 1 = 0.
• Use fimplicit
• Visualize, with ylim([-2*pi 4*pi]), yticks(-pi:pi:3*pi)
3.5. Using your calculator (or pencil-and-paper), run two iterations of Newton’s method to
find x2 for given f and x0 .
3.5. Multi-Variable Functions and the Gradient Vector 87
(a) f (x) = x4 − 2, x0 = 1
(b) f (x) = xex − 1, x0 = 0.5
Ans: (b) x2 = 0.56715557
3.6. The graphs of y = x2 (x + 1) and y = 1/x (x > 0) intersect at one point x = r. Use
Newton’s method to estimate the value of r to eight decimal places.
3.7. Consider the level curve f (x, y) = −x2 + y = k as in Example 3.59. For k = 1:
x = A−1 b. (4.2)
Contents of Chapter 4
4.1. Solutions of Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.2. Invertible Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3. Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.4. Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Exercises for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
89
90 Chapter 4. Programming with Linear Algebra
a1 x1 + a2 x2 + · · · + an xn = b, (4.3)
Solving (4.4):
1 ↔ 2 : (interchange)
( " #
x1 + 2x2 = 4 1 1 2 4
−2x1 + 3x2 = −1 2 −2 3 −1
2 ← 2 +2· 1 : (replacement)
( " #
x1 + 2x2 = 4 1 1 2 4
7x2 = 7 2 0 7 7
2 ← 2 /7: (scaling)
( " #
x1 + 2x2 = 4 1 1 2 4
x2 = 1 2 0 1 1
1 ← 1 −2· 2 : (replacement)
( " #
x1 = 2 1 1 0 2
x2 = 1 2 0 1 1
Example 4.8. x = [x1 , x2 ]T = [−3, 2]T is the solution of the linear system.
which, in turn, has the same solution set as the system with augmented
matrix
[a1 a2 · · · an : b]. (4.10)
Example 4.10. Determine the values of h such that the given system is a
consistent linear system
x + h y = −5
2x − 8y = 6
Solution.
Ans: h 6= −4
4.1. Solutions of Linear Systems 95
x2 − 2x3 = 0
x1 − 2x2 + 2x3 = 3
4x1 − 8x2 + 6x3 = 14
Solution.
4 AA = [A b];
5 rref(AA)
Result
1 ans =
2 1 0 0 1
3 0 1 0 -2
4 0 0 1 -1
96 Chapter 4. Programming with Linear Algebra
Example 4.12. Find the general solution of the system whose aug-
mented matrix is
1 0 0 1 7
0 1 3 0 −1
[A|b] =
2 −1 −3 2 15
1 0 −1 0 4
Solution.
linear_equations_rref.m
1 Ab = [1 0 0 1 7; 0 1 3 0 -1; 2 -1 -3 2 15; 1 0 -1 0 4];
2 rref(Ab)
Result
1 ans =
2 1 0 0 1 7
3 0 1 0 -3 -10
4 0 0 1 1 3
5 0 0 0 0 0
Remark 4.13. Using rref, one can find the general solution which
may consist of infinitely many solutions.
4.2. Invertible Matrices 97
0 1 0
Self-study 4.19. Use pencil-and-paper to find the inverse of A = 1 0 3,
4 −3 8
if it exists.
Solution.
When it is implemented:
inverse_matrix.m
1 A = [0 1 0
2 1 0 3
3 4 -3 8];
4 I = eye(3);
5
6 AI = [A I];
7 rref(AI)
Result
1 ans =
2 1.0000 0 0 2.2500 -2.0000 0.7500
3 0 1.0000 0 1.0000 0 0
4 0 0 1.0000 -0.7500 1.0000 -0.2500
4.2. Invertible Matrices 99
Theorem 4.20.
" #
ba
a. (Inverse of a 2 × 2 matrix) Let A = . If ad − bc 6= 0, then A
dc
is invertible and " #
1 d −b
A−1 = (4.11)
ad − bc −c a
True-or-False 4.21.
a. In order for a matrix B to be the inverse of A, both equations AB = In
and BA = In must be true.
" #
a b
b. if A = and ad = bc, then A is not invertible.
c d
c. If A is invertible, then elementary row operations that reduce A to the
identity In also reduce A−1 to In .
Solution.
Ans: T,T,F
100 Chapter 4. Programming with Linear Algebra
4.3. Determinants
Definition 4.22. Let A be an n × n square matrix. Then determinant
is a scalar value denoted by det A or |A|.
1) Let A = [a] ∈ R1 × 1 . Then det A = a.
" #
a b
2) Let A = ∈ R2 × 2 . Then det A = ad − bc.
c d
" #
2 1
Example 4.23. Let A = . Consider a linear transformation T : R2 → R2
0 3
defined by T (x) = Ax.
Ans: 3) 12
Note: The determinant can be viewed as a volume scaling factor.
4.3. Determinants 101
Ans: −2
102 Chapter 4. Programming with Linear Algebra
determinant.m
1 A = [1 -2 5 2; 0 -6 -7 5; 0 0 3 0; 0 0 0 4];
2 det(A)
Result
1 ans =
2 -72
4.3. Determinants 103
Properties of Determinants
1 −4 2
Example 4.29. Compute det A, where A = −2 8 −9, after applying
−1 7 0
a couple of steps of replacement operations.
Solution.
Ans: 15
104 Chapter 4. Programming with Linear Algebra
1
c) If A is invertible, then det A−1 = . (∵ det In = 1.)
det A
Ans: −30
4.4. Eigenvalues and Eigenvectors 105
" #
1 6
Example 4.35. Let A = . Show that 7 is an eigenvalue of matrix A,
5 2
and find the corresponding eigenvectors.
Solution. Hint : Start with Ax = 7x. Then (A − 7I)x = 0.
106 Chapter 4. Programming with Linear Algebra
4 polyA = charpoly(A,x)
5 eigenA = solve(polyA)
6 [P,D] = eig(A) % A*P = P*D
7 P*D*inv(P)
Results
1 polyA =
2 12 - 4*x - 3*x^2 + x^3
3
4 eigenA =
5 -2
6 2
7 3
8
9 P =
10 0.4472 -0.3162 -0.6155
11 0.8944 0.9487 -0.6155
12 0 0 0.4924
13 D =
14 3 0 0
15 0 -2 0
16 0 0 2
17
18 ans =
19 1.0000 1.0000 -0.0000
20 6.0000 0.0000 5.0000
21 0 0 2.0000
108 Chapter 4. Programming with Linear Algebra
4.4.2. Similarity
Definition 4.42. Let A and B be n × n matrices. Then, A is similar to
B, if there is an invertible matrix P such that
A = P BP −1 , or equivalently, P −1 AP = B.
The next theorem illustrates one use of the characteristic polynomial, and
it provides the foundation for several iterative methods that approximate
eigenvalues.
Theorem 4.43. If n × n matrices A and B are similar, then they
have the same characteristic polynomial and hence the same eigenvalues
(with the same multiplicities).
Proof. B = P −1 AP . Then,
B − λI = P −1 AP − λI
= P −1 AP − λP −1 P
= P −1 (A − λI)P,
4.4.3. Diagonalization
Definition 4.44. An n × n matrix A is said to be diagonalizable if
there exists an invertible matrix P and a diagonal matrix D such that
A = P DP −1 (or P −1 AP = D) (4.16)
A2 = (P DP −1 )(P DP −1 ) = P D2 P −1
Ak = P Dk P −1
(4.17)
A−1 = P D−1 P −1 (when A is invertible)
det A = det D
2 · 5k − 3k 5k − 3k
k
Ans: A =
2 · 3k − 2 · 5k 2 · 3k − 5k
110 Chapter 4. Programming with Linear Algebra
P = [v1 v2 · · · vn ],
λ1 0 · · · 0
0 λ ··· 0 (4.18)
2
D = diag(λ1 , λ2 , · · · , λn ) = .. .. . . ,
. . .
. ..
0 0 · · · λn
where Avk = λk vk , k = 1, 2, · · · , n.
while
λ1 0 · · · 0
0 λ ··· 0
2
P D = [v1 v2 · · · vn ] .. .. . . = [λ1 v1 λ2 v2 · · · λn vn ]. (4.20)
. . .
. ..
0 0 · · · λn
(⇒ ) Now suppose A is diagonalizable and A = P DP −1 . Then we have
AP = P D; it follows from (4.19) and (4.20) that
Avk = λk vk , k = 1, 2, · · · , n. (4.21)
Solution.
1. Find the eigenvalues of A.
2. Find three linearly independent eigenvectors of A.
3. Construct P from the vectors in step 2.
4. Construct D from the corresponding eigenvalues.
Check: AP = P D?
−1 −1
1
Ans: λ = 1, −2, −2. v1 = −1 , v2 =
1 , v3 =
0
1 0 1
diagonalization.m
1 A = [1 3 3; -3 -5 -3; 3 3 1];
2 [P,D] = eig(A) % A*P = P*D
3 P*D*inv(P)
Results
1 P =
2 -0.5774 -0.7876 0.4206
3 0.5774 0.2074 -0.8164
4 -0.5774 0.5802 0.3957
5 D =
6 1.0000 0 0
7 0 -2.0000 0
8 0 0 -2.0000
9
10 ans =
11 1.0000 3.0000 3.0000
12 -3.0000 -5.0000 -3.0000
13 3.0000 3.0000 1.0000
4.1. An important concern in the study of heat Write a system of four equations whose so-
transfer is to determine the steady-state tem- lution gives estimates for the temperatures
perature distribution of a thin plate when the T1 , T2 , · · · , T4 , and solve it.
temperature around the boundary is known.
Assume the plate shown in the figure repre-
sents a cross section of a metal beam, with
negligible heat flow in the direction perpen-
dicular to the plate. Let T1 , T2 , · · · , T4 denote
the temperatures at the four interior nodes of
the mesh in the figure. The temperature at
a node is approximately equal to the average
of the four nearest nodes. For example, T1 =
(10 + 20 + T2 + T4 )/4 or 4T1 = 10 + 20 + T2 + T4 .
Figure 4.1
1 −2
1
3 −4
4.2. Find the inverses of the matrices, if exist: A = and B = 4 −7 3
7 −8
−2 6 −4
Ans: B is not invertible.
3 1
4.3. Let A = . Write 5A. Is det (5A) = 5det A?
4 2
1 1 −3
Regression Analysis
Contents of Chapter 5
5.1. Least-Squares Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.2. Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.3. Scene Analysis with Noisy Data: RANSAC . . . . . . . . . . . . . . . . . . . . . . . . . 124
Exercises for Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
113
114 Chapter 5. Regression Analysis
where x
b called a least-squares solution of Ax = b.
116 Chapter 5. Regression Analysis
Normal Equations
AT Ax = AT b. (5.2)
Method of Calculus
Let J (x) = kAx − bk2 = (Ax − b)T (Ax − b) and x
b a minimizer of J (x).
• Then we must have
∂J (x)
∇x J (b
x) = = 0. (5.3)
∂x x=b
x
b = (AT A)−1 AT b.
x (5.5)
least_squares.m
1 A = [1 1 0; 0 1 0; 0 0 1; 1 0 1];
2 b = [1; 3; 8; 2];
3 x = (A'*A)\(A'*b)
y = β0 + β1 x (5.6)
that is as close as possible to the given points. This line is called the
least-squares line; it is also called the regression line of y on x and
β0 , β1 are called regression coefficients.
5.2. Regression Analysis 119
where
1 x1 " # y1
1 x β0 y
2 2
X = .. .. , β= , y = .. .
. . β1 .
1 xm ym
Here we call X the design matrix, β the parameter vector, and y
the observation vector.
• Thus the LS solution can be determined by solving the normal equa-
tions:
X T Xβ = X T y, (5.9)
provided that X T X is invertible.
• The normal equations for the regression line read
" # " #
m Σxi Σyi
β= . (5.10)
Σxi Σx2i Σxi yi
120 Chapter 5. Regression Analysis
y = β0 + β1 x + β2 x2 ,
where
1 x1 x21 y1
1 x x2 β0 y
2 2 2
X = .. .. , β = β1 , y = .. .
. . .
..
.
β2
1 xm x2m ym
Now, it can be solved through normal equations:
2
Σ1 Σxi Σxi Σyi
X Xβ = Σxi Σxi Σxi β = Σxi yi = X T y
T 2 3 (5.14)
Σx2i Σx3i Σx4i Σx2i yi
Ans: y = 1 + 0.5x2
122 Chapter 5. Regression Analysis
Example 5.14. Find the best fitting curve of the form y = cedx for the data
0.1 1.9940
0.2 2.0087
0.3 1.8770
0.4 3.5783
0.5 3.9203
0.6 4.7617
0.7 6.7246
0.8 7.1491
0.9 9.5777
1.0 11.5625
ln y = ln c + dx. (5.16)
Y = ln y, a0 = ln c, a1 = d, X = x,
5.2. Regression Analysis 123
8 # The linear LS
9 L := CurveFitting[LeastSquares](xlny, x, curve = b*x + a);
10 0.295704647799999 + 2.1530740654363654 x
11
W = diag(w1 , w2 , · · · , wm ). (5.18)
Xβ = y. (5.19)
X T W Xβ = X T W y. (5.21)
Example 5.17. Given data, find the LS line with and without a weight.
When a weight is applied, weigh the first and the last data point by 1/4.
" #T
1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
xy :=
5.89 1.92 2.59 4.41 4.49 6.22 7.74 7.07 9.05 5.7
Solution.
Weighted-LS
1 LS := CurveFitting[LeastSquares](xy, x);
2 2.7639999999999967 + 0.49890909090909125 x
3 WLS := CurveFitting[LeastSquares](xy, x,
4 weight = [1/4,1,1,1,1,1,1,1,1,1/4]);
5 1.0466694879390623 + 0.8019424460431653 x
126 Chapter 5. Regression Analysis
3. Consensus set C:
n |a + bxi − yi | o
C = (xi , yi ) ∈ X | d = √ ≤ τe (5.22)
b2 + 1
5.3. Scene Analysis with Noisy Data: RANSAC 127
Note: In practice:
• Step 2: A hypothesis p is the set of model parameters, rather than
the model itself.
• Step 3: The consensus set can be represented more conveniently by
considering C as an index array. That is,
(
1 if xi ∈ C
C(i) = (5.23)
0 if xi 6∈ C
Remark 5.21.
• The above basic RANSAC algorithm is an iterative search method
for a set of inliers which may produce presumably accurate model
parameters.
• It is simple to implement and efficient. However, it is problematic
and often erroneous.
• The main disadvantage of RANSAC is that RANSAC is unrepeat-
able; it may yield different results in each run so that none of the
results can be optimal.
128 Chapter 5. Regression Analysis
Table 5.1: The RANSAC: model fitting y = a0 + a1 x. The algorithm runs 1000 times for
each dataset to find the standard deviation of the error: σ(a0 − b
a0 ) and σ(a1 − b
a1 ).
Data σ(a0 − b
a0 ) σ(a1 − b
a1 ) E-time (sec)
1 0.1156 0.0421 0.0156
2 0.1101 0.0391 0.0147
(a) Implement the method of normal equations for the least-squares regression to
find the best-fitting line.
(b) The RANSAC, Algorithm 5.18 is implemented for you below. Use the code to
analyze the performance of the RANSAC.
• Set τe = 1, γ = η|X| = 8, and N = 100.
• Run ransac2 100 times to get the minimum, maximum, and average number
of iterations for the RANSAC to find an acceptable hypothesis consensus set.
(c) Plot the best-fitting lines found from (a) and (b), superposed along the data.
ransac2.m
1 function [p,C,iter] = ransac2(X,tau_e,gamma,N)
2 % Input: X = {(x_i,y_i)}
3 % tau_e: the error tolerance
4 % gamma = eta*|X|
5 % N: the maximum number of iterations
6 % Output: p = [a,b], where y= a+b*x
7
8 %%-----------
9 [m,n] = size(X);
10 if n>m, X=X'; [m,n] = size(X); end
11
22 if sum(C)>=gamma
23 p = get_hypothesis_WLS(X,C);
24 break;
25 end
26 end
get_hypothesis_WLS.m
1 function p = get_hypothesis_WLS(X,C)
2 % Get hypothesis p, with C being used as weights
3 % Output: p = [a,b], where y= a+b*x
4
5 m = size(X,1);
6
7 A = [ones(m,1) X(:,1)];
8 A = A.*C; %A = bsxfun(@times,A,C);
9 r = X(:,2).*C;
10
11 p = ((A'*A)\(A'*r))';
inlier.m
1 function C = inlier(X,p,tau_e)
2 % Input: p=[a,b] s.t. a+b*x-y=0
3
4 m = size(X,1);
5 C = zeros(m,1);
6
7 a = p(1); b=p(2);
8 factor = 1./sqrt(b^2+1);
9 for i=1:m
10 xi = X(i,1); yi = X(i,2);
11 dist = abs(a+b*xi-yi)*factor; %distance from point to line
12 if dist<=tau_e, C(i)=1; end
13 end
6
C HAPTER
Fundamentals of AI
Contents of Chapter 6
6.1. What is Artificial Intelligence (AI)? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.2. Constituents of AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.3. Designing Artificial Brains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.4. Future of AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Exercises for Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
131
132 Chapter 6. Fundamentals of AI
6.2. Constituents of AI
134 Chapter 6. Fundamentals of AI
6.4. Future of AI
136 Chapter 6. Fundamentals of AI
6.1.
7
C HAPTER
Python Basics
Contents of Chapter 7
7.1. Why Python? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.2. Python in an Hour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.3. Zeros of Polynomials in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
7.4. Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.5. A Machine Learning Modelcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Exercises for Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
137
138 Chapter 7. Python Basics
Advantages of Python
Python has the following characteristics.
• Easy to learn and use
• Flexible and reliable
• Extensively used in Data Science
• Handy for Web Development purposes
• Having Vast Libraries support
• Among the fastest-growing programming languages in the tech
industry
Disadvantage of Python
Python is an interpreted and dynamically-typed language. The line-by-
line execution of code, built with a high flexibility, most likely leads to
slow execution. Python is slower than Matlab that is slower than C.
• You yourself may create and import your own C-module into Python.
If you extend Python with pieces of compiled C-code, then the re-
sulting code is easily 100× faster than Python. Best choice!
• Cython: It is designed as a C-extension for Python, which is
developed for users not familiar with C. For Cyphon implemetation,
see e.g. https://www.youtube.com/watch?v=JKMkhARcwdU, one of
Simon Funke’s YouTube videos.
7.1. Why Python? 139
∼/.python_startup.py
1 #.bashrc: export PYTHONSTARTUP=~/.python_startup.py
2 #.cshrc: setenv PYTHONSTARTUP ~/.python_startup.py
3 #---------------------------------------------------
4 print("\t^[[1;33m~/.python_startup.py")
5
13 import random
14 from sympy import *
15 x,y,z,t = symbols('x,y,z,t');
16 print("\tfrom sympy import *; x,y,z,t = symbols('x,y,z,t')")
17
Programming Features
• Python has no support pointers.
• Python codes are stored with .py extension.
• Indentation: Python uses indentation to define a block of code.
– A code block (body of a function, loop, etc.) starts with indenta-
tion and ends with the first unindented line.
– The amount of indentation is up to the user, but it must be consis-
tent throughout that block.
• Comments:
– The hash (#) symbol is used to start writing a comment.
– Multi-line comments: Python uses triple quotes, either ”’ or """.
7.2. Python in an Hour 141
• Reference semantics
>>> a = [1, 2, 3]
>>> b = a
>>> a.append(4)
>>> b
[1, 2, 3, 4]
Be aware with copying lists and numpy arrays!
• numpy, range, and iteration
>>> range(8)
[0, 1, 2, 3, 4, 5, 6, 7]
>>> import numpy as np
>>> for k in range(np.size(li)):
... li[k]
. . . <Enter>
’abc’
14
4.34
23
• numpy array and deepcopy
>>> from copy import deepcopy
>>> A = np.array([1,2,3])
>>> B = A
>>> C = deepcopy(A)
>>> A *= 4
>>> B
array([ 4, 8, 12])
>>> C
array([1, 2, 3])
7.2. Python in an Hour 143
12 ## Docstrings in Python
13 def double(num):
14 """Function to double the value"""
15 return 2*num
16 print(double.__doc__)
17 # Output: Function to double the value
18
35
36 ## Python Dictionary
37 d = {'key1':'value1', 'Seth':22, 'Alex':21}
38 print(d['key1'],d['Alex'],d['Seth'])
39 # Output: value1 21 22
40
41 ## Output Formatting
42 x = 5.1; y = 10
43 print('x = %d and y = %d' %(x,y))
44 print('x = %f and y = %d' %(x,y))
45 print('x = {} and y = {}'.format(x,y))
46 print('x = {1} and y = {0}'.format(x,y))
47 # Output: x = 5 and y = 10
48 # x = 5.100000 and y = 10
49 # x = 5.1 and y = 10
50 # x = 10 and y = 5.1
51
52 print("x=",x,"y=",y, sep="#",end="&\n")
53 # Output: x=#5.1#y=#10&
54
55 ## Python Input
56 C = input('Enter any: ')
57 print(C)
58 # Output: Enter any: Starkville
59 # Starkville
7.2. Python in an Hour 145
8 if __name__ == '__main__':
9 num = input('Enter a natural number: ')
10 cubes = get_cubes(int(num))
11 print(cubes)
3 cubes = get_cubes(8)
4 print(cubes)
Execusion
1 [Fri Jul.22] python call_get_cubes.py
2 [1, 8, 27, 64, 125, 216, 343, 512]
146 Chapter 7. Python Basics
Recall: Let’s begin with recalling how to find zeros of polynomials, pre-
sented in §3.4.
• Remark 3.42: When the Newton’s method is applied for finding an
approximate zero of P (x), the iteration reads
P (xn−1 )
xn = xn−1 − . (7.1)
P 0 (xn−1 )
Thus both P (x) and P 0 (x) must be evaluated in each iteration.
• Strategy 3.43: The derivative P 0 (x) can be evaluated by using
the Horner’s method with the same efficiency. Indeed, differen-
tiating (3.46)
P (x) = (x − x0 )Q(x) + P (x0 )
reads
P 0 (x) = Q(x) + (x − x0 )Q0 (x). (7.2)
Thus
P 0 (x0 ) = Q(x0 ). (7.3)
That is, the evaluation of Q at x0 becomes the desired quantity P 0 (x0 ).
7.3. Zeros of Polynomials in Python 147
7 print(P)
8 print(Pder)
9 print(np.roots(P))
10 print(P(3), Pder(3))
Output
1 4 3 2
2 1 x - 4 x + 7 x - 5 x - 2
3 3 2
4 4 x - 12 x + 14 x - 5
5 [ 2. +0.j 1.1378411+1.52731225j 1.1378411-1.52731225j -0.2756822+0.j ]
6 19 37
7 for i in range(1,n):
8 d = p + x0*d
9 p = A[i] +x0*p
10 return p,d
148 Chapter 7. Python Basics
11
12 def newton_horner(A,x0,tol,itmax):
13 """ input: A = [a_n,...,a_1,a_0]
14 output: x: P(x)=0 """
15 x=x0
16 for it in range(1,itmax+1):
17 p,d = horner(A,x)
18 h = -p/d;
19 x = x + h;
20 if(abs(h)<tol): break
21 return x,it
22
23 if __name__ == '__main__':
24 coeff = [1, -4, 7, -5, -2]; x0 = 3
25 tol = 10**(-12); itmax = 1000
26 x,it =newton_horner(coeff,x0,tol,itmax)
27 print("newton_horner: x0=%g; x=%g, in %d iterations" %(x0,x,it))
Execution
1 [Sat Jul.23] python Zeros-Polynomials-Newton-Horner.py
2 newton_horner: x0=3; x=2, in 7 iterations
Note: The above Python code must be compared with the Matlab code
in §3.4.
newton_horner.m
1 function [x,it] = newton_horner(A,x0,tol,itmax)
2 % input: A = [a_0,a_1,...,a_n]; x0: initial for P(x)=0
3 % outpue: x: P(x)=0
4
5 x = x0;
6 for it=1:itmax
7 [p,d] = horner(A,x);
8 h = -p/d;
9 x = x + h;
10 if(abs(h)<tol), break; end
11 end
7.3. Zeros of Polynomials in Python 149
horner.m
1 function [p,d] = horner(A,x0)
2 % input: A = [a_0,a_1,...,a_n]
3 % output: p=P(x0), d=P'(x0)
4
5 n = size(A(:),1);
6 p = A(n); d=0;
7
8 for i = n-1:-1:1
9 d = p + x0*d;
10 p = A(i) +x0*p;
11 end
Call_newton_horner.m
1 a = [-2 -5 7 -4 1];
2 x0=3;
3 tol = 10^-12; itmax=1000;
4 [x,it] = newton_horner(a,x0,tol,itmax);
5 fprintf(" newton_horner: x0=%g; x=%g, in %d iterations\n",x0,x,it)
6 Result: newton_horner: x0=3; x=2, in 7 iterations
Observation 7.5.
Python programming is as easy and simple as Matlab programming.
• In particular, numpy is developed for Matlab-like implementation,
with enhanced convenience.
• Python uses classes for object-oriented programming.
• Furthermore, Python is an open source (free) programming lan-
guage, which explains why Python is fastest-growing in use.
150 Chapter 7. Python Basics
7.4. Classes
Remark 7.6. Classes are a key concept in the so-called object-
oriented programming (OOP). Classes provide a means of
bundling data and functionality together.
• A class is a user-defined template or prototype from which real-
world objects are created.
• A class tells us what data an object should have, what are the ini-
tial/default values of the data, and what methods are associated
with the object to take actions on the objects using their data.
• An object is an instance of a class, and creating an object from a
class is called instantiation.
In the following, we would build a simple class, as Dr. Xu did in [11, Ap-
pendix B.5]; you will learn how to initiate, refine, and use classes.
7.4. Classes 151
Polynomial_01.py
1 class Polynomial():
2 """A class of polynomials"""
3
4 def __init__(self,coefficient):
5 """Initialize coefficient attribute of a polynomial."""
6 self.coeff = coefficient
7
8 def degree(self):
9 """Find the degree of a polynomial"""
10 return len(self.coeff)-1
11
12 if __name__ == '__main__':
13 p2 = Polynomial([1,2,3])
14 print(p2.coeff) # a variable; output: [1, 2, 3]
15 print(p2.degree()) # a method; output: 2
4 count = 0
5
6 def __init__(self):
7 """Initialize coefficient attribute of a polynomial."""
8 self.coeff = [1]
9 Polynomial.count += 1
10
11 def __del__(self):
12 """Delete a polynomial object"""
13 Polynomial.count -= 1
14
15 def degree(self):
16 """Find the degree of a polynomial"""
17 return len(self.coeff)-1
18
19 def evaluate(self,x):
20 """Evaluate a polynomial."""
21
22 n = self.degree()
23 eval = []
24 for xi in x:
25 p = self.coeff[0] #Horner's method
26 for k in range(1,n+1):
27 p = self.coeff[k]+ xi*p
28 eval.append(p)
29 return eval
30
31 if __name__ == '__main__':
32 poly1 = Polynomial()
33 print('poly1, default coefficients:', poly1.coeff)
34 poly1.coeff = [1,2,-3]
35 print('poly1, coefficients after reset:', poly1.coeff)
36 print('poly1, degree:', poly1.degree())
37
38 poly2 = Polynomial()
39 poly2.coeff = [1,2,3,4,-5]
40 print('poly2, coefficients after reset:', poly2.coeff)
41 print('poly2, degree:', poly2.degree())
42
7.4. Classes 153
47 print('poly2.evaluate([-1,0,1,2]):',poly2.evaluate([-1,0,1,2]))
Execution
1 [Sat Jul.23] python Polynomial_02.py
2 poly1, default coefficients: [1]
3 poly1, coefficients after reset: [1, 2, -3]
4 poly1, degree: 2
5 poly2, coefficients after reset: [1, 2, 3, 4, -5]
6 poly2, degree: 4
7 number of created polynomials: 2
8 number of polynomials after a deletion: 1
9 poly2.evaluate([-1,0,1,2]): [-7, -5, 5, 47]
154 Chapter 7. Python Basics
Inheritance
Note: If we want to write a class that is just a specialized version of
another class, we do not need to write the class from scratch.
• We call the specialized class a child class and the other general
class a parent class.
• The child class can inherit all the attributes and methods form the
parent class; it can also define its own special attributes and meth-
ods or even overrides methods of the parent class.
Classes.py
1 class Polynomial():
2 """A class of polynomials"""
3
4 def __init__(self,coefficient):
5 """Initialize coefficient attribute of a polynomial."""
6 self.coeff = coefficient
7
8 def degree(self):
9 """Find the degree of a polynomial"""
10 return len(self.coeff)-1
11
12 class Quadratic(Polynomial):
13 """A class of quadratic polynomial"""
14
15 def __init__(self,coefficient):
16 """Initialize the coefficient attributes ."""
17 super().__init__(coefficient)
18 self.power_decrease = 1
19
20 def roots(self):
21 a,b,c = self.coeff
22 if self.power_decrease != 1:
23 a,c = c,a
24 discriminant = b**2-4*a*c
25 r1 = (-b+discriminant**0.5)/(2*a)
26 r2 = (-b-discriminant**0.5)/(2*a)
27 return [r1,r2]
28
29 def degree(self):
30 return 2
7.4. Classes 155
• Line 12: We must include the name of the parent class in the paren-
theses of the definition of the child class (to indicate the parent-child
relation for inheritance).
• Line 17: The super() function is to give an child object all the at-
tributes defined in the parent class.
• Line 18: An additional child class attribute self.power_decrease is
initialized.
• Lines 20-27: define a new method called roots.
• Lines 29-30: The method degree() overrides the parent’s method.
call_Quadratic.py
1 from Classes import *
2
3 quad1 = Quadratic([2,-3,1])
4 print('quad1, roots:',quad1.roots())
5 quad1.power_decrease = 0
6 print('roots when power_decrease = 0:',quad1.roots())
7 # Output: quad1, roots: [1.0, 0.5]
8 # roots when power_decrease = 0: [2.0, 1.0]
156 Chapter 7. Python Basics
8 #=====================================================================
9 # DATA: Read & Preprocessing
10 # load_iris, load_wine, load_breast_cancer, ...
11 #=====================================================================
12 data_read = datasets.load_iris(); #print(data_read.keys())
13
14 X = data_read.data
15 y = data_read.target
16 dataname = data_read.filename
17 targets = data_read.target_names
18 features = data_read.feature_names
19
20 print('X.shape=',X.shape, 'y.shape=',y.shape)
21 #---------------------------------------------------------------------
22 # SETTING
23 #---------------------------------------------------------------------
24 N,d = X.shape; labelset=set(y)
25 nclass=len(labelset);
26 print('N,d,nclass=',N,d,nclass)
27
31 #=====================================================================
32 # CLASSIFICATION
33 #=====================================================================
34 btime = time.time()
35 Acc = np.zeros([run,1])
36 ##from sklearn.neighbors import KNeighborsClassifier
37 ##clf = KNeighborsClassifier(5)
38 from myCLF import myCLF ## My classifier
7.5. A Machine Learning Modelcode 157
39
40 for it in range(run):
41 Xtrain, Xtest, ytrain, ytest = train_test_split(
42 X, y, test_size=rtest, random_state=it, stratify = y)
43 ##clf.fit(Xtrain, ytrain);
44 clf = myCLF(Xtrain,ytrain); clf.fit(); ## My classifier
45 Acc[it] = clf.score(Xtest, ytest)
46
47 #-----------------------------------------------
48 # Print: Accuracy && E-time
49 #-----------------------------------------------
50 etime = time.time()-btime
51 print(' %s: Acc.(mean,std) = (%.2f,%.2f)%%; Average E-time= %.5f'
52 %(dataname,np.mean(Acc)*100,np.std(Acc)*100,etime/run))
53
54 #=====================================================================
55 # Scikit-learn Classifiers, for Comparisions
56 #=====================================================================
57 exec(open("sklearn_classifiers.py").read())
sklearn_classifiers.py
1 #=====================================================================
2 # Required: X, y, [dataname, run]
3 print('========= Scikit-learn Classifiers, for Comparisions =========')
4 #=====================================================================
5 from sklearn.preprocessing import StandardScaler
6 from sklearn.datasets import make_moons, make_circles, make_classification
7 from sklearn.neural_network import MLPClassifier
8 from sklearn.neighbors import KNeighborsClassifier
9 from sklearn.linear_model import LogisticRegression
10 from sklearn.svm import SVC
11 from sklearn.gaussian_process import GaussianProcessClassifier
12 from sklearn.gaussian_process.kernels import RBF
13 from sklearn.tree import DecisionTreeClassifier
14 from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
15 from sklearn.naive_bayes import GaussianNB
16 from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
17 from sklearn.inspection import DecisionBoundaryDisplay
18
19 #-----------------------------------------------
20 classifiers = [
21 LogisticRegression(max_iter = 1000),
22 KNeighborsClassifier(5),
23 SVC(kernel="linear", C=0.5),
158 Chapter 7. Python Basics
24 SVC(gamma=2, C=1),
25 RandomForestClassifier(max_depth=5, n_estimators=50, max_features=1),
26 MLPClassifier(alpha=1, max_iter=1000),
27 AdaBoostClassifier(),
28 GaussianNB(),
29 QuadraticDiscriminantAnalysis(),
30 GaussianProcessClassifier(),
31 ]
32 names = [
33 "Logistic Regr",
34 "KNeighbors-5 ",
35 "Linear SVM ",
36 "RBF SVM ",
37 "Random Forest",
38 "Deep-NN ",
39 "AdaBoost ",
40 "Naive Bayes ",
41 "QDA ",
42 "Gaussian Proc",
43 ]
44 #-----------------------------------------------
45 if dataname is None: dataname = 'No-dataname';
46 if run is None: run = 100;
47
48 #===============================================
49 acc_max=0
50 for name, clf in zip(names, classifiers):
51 Acc = np.zeros([run,1])
52 btime = time.time()
53
54 for it in range(run):
55 Xtrain, Xtest, ytrain, ytest = train_test_split(
56 X, y, test_size=rtest, random_state=it, stratify = y)
57
58 clf.fit(Xtrain, ytrain);
59 Acc[it] = clf.score(Xtest, ytest)
60
61 etime = time.time()-btime
62 accmean = np.mean(Acc)*100
63 print('%s: %s: Acc.(mean,std) = (%.2f,%.2f)%%; E-time= %.5f'
64 %(dataname,name,accmean,np.std(Acc)*100,etime/run))
65 if accmean>acc_max:
66 acc_max= accmean; algname = name
67 print('sklearn classifiers max: %s= %.2f' %(algname,acc_max))
7.5. A Machine Learning Modelcode 159
Note: A while loop has not been considered in the lecture. However, you can figure it out
easily by yourself.
7.3. Write a function that takes as input a list of values and returns the largest value. Do
this without using the Python max() function; you should combine a for loop and an
if statement.
7.4. Let P4 (x) = 2x4 − 5x3 − 11x2 + 20x + 10. Solve the following.
Hint : For plotting, you may import: “import matplotlib.pyplot as plt” then use
plt.plot(). You will see the Python plotting is quite similar to Matlab plotting.
160 Chapter 7. Python Basics
8
C HAPTER
Mathematical Optimization
xk+1 = xk + γk pk , k = 0, 1, · · · , (8.2)
Contents of Chapter 8
8.1. Gradient Descent (GD) Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.2. Newton’s Method for Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Exercises for Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
161
162 Chapter 8. Mathematical Optimization
f (xk+1 ) = f (xk + γk pk )
γk2 (8.3)
0
= f (xk ) + γk f (xk ) · pk + pk · f 00 (ξ)pk .
2
• Assume that f 00 is bounded. Then
pk = −f 0 (xk ), (8.6)
then
f 0 (xk ) · pk = −||f 0 (xk )||2 < 0, (8.7)
which satisfies (8.5) and therefore (8.4).
• Summary: In the GD method, the search direction is the negative
gradient, the steepest descent direction.
8.1. Gradient Descent (GD) Method 163
Picking the step length γ : Assume that the step length was chosen to
be independent of n, although one can play with other choices as well. The
question is how to select γ in order to make the best gain of the method. To
turn the right-hand side of (8.9) into a more manageable form, we invoke
Taylor’s Theorem:1
ˆ x+t
0
f (x + t) = f (x) + t f (x) + (x + t − s) f 00 (s) ds. (8.10)
x
00
Assuming that |f (s)| ≤ L, we have
t2 0
f (x + t) ≤ f (x) + t f (x) + L.
2
Now, letting x = xk and t = −γ f 0 (xk ) reads
f (xk+1 ) = f (xk − γ f 0 (xk ))
1
≤ f (xk ) − γ f 0 (xk ) f 0 (xk ) + L [γ f 0 (xk )]2 (8.11)
2
L
= f (xk ) − [f 0 (xk )]2 γ − γ 2 .
2
The gain (learning) from the method occurs when
L 2
γ − γ2 > 0 ⇒ 0 < γ < , (8.12)
2 L
and it will be best when γ − L2 γ 2 is maximal. This happens at the point
1
γ= . (8.13)
L
1
Taylor’s Theorem with integral remainder: Suppose f ∈ C n+1 [a, b] and x0 ∈ [a, b]. Then, for every
n ˆ
X f (k) (x0 ) 1 x
x ∈ [a, b], f (x) = (x − x0 )k + Rn (x), where Rn (x) = (x − s)n f (n+1) (s) ds.
k! n! x0
k=0
164 Chapter 8. Mathematical Optimization
f 0 (b
x) = lim f 0 (xk ) = 0, (8.18)
n→∞
Use the GD method to find the minimizer, starting with x0 = (−1, 2).
rosenbrock_2D_GD.py
1 import numpy as np; import time
2
6 def rosen(x):
7 return (1.-x[0])**2+100*(x[1]-x[0]**2)**2
8
9 def rosen_grad(x):
10 h = 1.e-5;
11 g1 = ( rosen([x[0]+h,x[1]]) - rosen([x[0]-h,x[1]]) )/(2*h)
12 g2 = ( rosen([x[0],x[1]+h]) - rosen([x[0],x[1]-h]) )/(2*h)
13 return np.array([g1,g2])
14
2
The Rosenbrock function in 3D is given as f (x, y, z) = [(1 − x)2 + 100 (y − x2 )2 ] + [(1 − y)2 + 100 (z − y 2 )2 ],
which has exactly one minimum at (1, 1, 1). Similarly, one can define the Rosenbrock function in gen-
eral N -dimensional spaces, for N ≥ 4, by adding one more component for each enlarged dimension.
N
X −1
(1 − xi )2 + 100(xi+1 − x2i )2 , where x = [x1 , x2 , · · · , xN ] ∈ RN . See Wikipedia
That is, f (x) =
i=1
(https://en.wikipedia.org/wiki/Rosenbrock_function) for details.
166 Chapter 8. Mathematical Optimization
Output
1 GD Method: it = 7687; E-time = 0.0521
2 [0.99994416 0.99988809]
The gradient descent algorithm with backtracking line search then becomes
Algorithm 8.6. (The Gradient Descent Algorithm, with Back-
tracking Line Search).
Note: The gradient descent method with partial updates is called the
stochastic gradient descent (SGD) method.
168 Chapter 8. Mathematical Optimization
rosenbrock_opt_Newton.py
1 import numpy as np; import time
2 from scipy import optimize as opt
3
4 x0 = np.array([-1., 2.])
5
Output
1 Method = Newton-CG; E-time = 0.0244
2 fun: 9.003798065813694e-20
3 jac: array([ 9.97569418e-08, -5.00786967e-08])
4 message: 'Optimization terminated successfully.'
5 nfev: 169
6 nhev: 148
7 nit: 148
8 njev: 169
9 status: 0
10 success: True
11 x: array([1., 1.])
170 Chapter 8. Mathematical Optimization
8.1.
9
C HAPTER
Contents of Chapter 9
9.1. Linear Indepencence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Exercises for Chapter 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
171
172 Chapter 9. Vector Spaces and Orthogonality
9.1.
174 Chapter 9. Vector Spaces and Orthogonality
10
C HAPTER
Contents of Chapter 10
10.1.Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
10.2.Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
10.3.Application of the SVD for LS Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Exercises for Chapter 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
175
176 Chapter 10. Principal Component Analysis
3 # Generate data
4 def generate_data(n):
5 # Normally distributed around the origin
6 x = np.random.normal(0,1, n)
7 y = np.random.normal(0,1, n)
8 S = np.vstack((x, y)).T
9 # Transform
10 sx, sy = 1, 3;
11 Scale = np.array([[sx, 0], [0, sy]])
12 theta = 0.25*np.pi; c,s = np.cos(theta), np.sin(theta)
13 Rot = np.array([[c, -s], [s, c]]).T #T, due to right multiplication
14
17 # Covariance
18 def cov(x, y):
19 xbar, ybar = x.mean(), y.mean()
20 return np.sum((x - xbar)*(y - ybar))/len(x)
21
22 # Covariance matrix
23 def cov_matrix(X):
24 return np.array([[cov(X[:,0], X[:,0]), cov(X[:,0], X[:,1])], \
25 [cov(X[:,1], X[:,0]), cov(X[:,1], X[:,1])]])
Covariance.py
1 import numpy as np
2 import matplotlib.pyplot as plt
3 from util_Covariance import *
4
5 # Generate data
6 n = 200
7 X = generate_data(n)
8 print('Generated data: X.shape =', X.shape)
9
10 # Covariance matrix
11 C = cov_matrix(X)
10.1. Principal Component Analysis 179
12 print('C:\n',C)
13
14 # Principal directions
15 eVal, eVec = np.linalg.eig(C)
16 xbar,ybar = np.mean(X,0)
17 print('eVal:\n',eVal); print('eVec:\n',eVec)
18 print('np.mean(X, 0) =',xbar,ybar)
19
20 # Plotting
21 plt.style.use('ggplot')
22 plt.scatter(X[:, 0],X[:, 1],c='#00a0c0',s=10)
23 plt.axis('equal');
24 plt.title('Generated Data')
25 plt.savefig('py-data-generated.png')
26
Output
1 Generated data: X.shape = (200, 2)
2 C:
3 [[ 5.10038723 -4.15289232]
4 [-4.15289232 4.986776 ]]
5 eVal:
6 [9.19686242 0.89030081]
7 eVec:
8 [[ 0.71192601 0.70225448]
9 [-0.70225448 0.71192601]]
10 np.mean(X, 0) = 4.986291809096116 2.1696690114181947
C = U DU −1 , (10.5)
Z = X W, (10.6)
and then 2 finding the weight vector which extracts the maximum variance
from this new data matrix
wk = arg max kX bk wk2 , (10.10)
kwk=1
where
U : n × d orthogonal (the left singular vectors of X.)
Σ : d × d diagonal (the singular values of X.)
V : d × d orthogonal (the right singular vectors of X.)
where σ1 ≥ σ2 ≥ · · · ≥ σd ≥ 0.
• In terms of this factorization, the matrix X T X reads
X T X = (U ΣV T )T U ΣV T = V ΣU T U ΣV T = V Σ2 V T . (10.13)
X = U ΣV T . (10.14)
2. Set
W = V. (10.15)
Then the score matrix, the set of principal components, is
Z = XW = XV = U ΣV T V = U Σ
(10.16)
= [σ1 u1 |σ2 u2 | · · · |σd ud ].
kX − Xk k2 = kU ΣV T − U Σk V T k2
= kU (Σ − Σk )V T k2 (10.20)
= kΣ − Σk k2 = σk+1 ,
Image Compression
• Dyadic Decomposition: The data matrix X ∈ Rm×n is expressed as
a sum of rank-1 matrices:
n
X
T
X = U ΣV = σi ui viT , (10.21)
i=1
where
V = [v1 , · · · , vn ], U = [u1 , · · · , un ].
k = 20 k = 50 k = 100
186 Chapter 10. Principal Component Analysis
σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0.
where
U : m × n orthogonal (the left singular vectors of A.)
Σ : n × n diagonal (the singular values of A.)
V : n × n orthogonal (the right singular vectors of A.)
Proof. (of Theorem 10.14) Use induction on m and n: we assume that the
SV D exists for (m − 1) × (n − 1) matrices, and prove it for m × n. We assume
A 6= 0; otherwise we can take Σ = 0 and let U and V be arbitrary orthogonal
matrices.
Av
• Let u = ||Av||2 , which is a unit vector. Choose Ũ , Ṽ such that
are orthogonal.
• Now, we write
" # " #
T T T
u u Av u AṼ
U T AV = · A · [v Ṽ ] =
Ũ T Ũ T Av Ũ T AṼ
Since
T (Av)T (Av) ||Av||22
u Av = = = ||Av||2 = ||A||2 ≡ σ,
||Av||2 ||Av||2
Ũ T Av = Ũ T u||Av||2 = 0,
we have
" # " #" #" #T
σ 0 1 0 σ 0 1 0
U T AV = = ,
0 U1 Σ1 V1T 0 U1 0 Σ1 0 V1
or equivalently
" #! " # " #!T
1 0 σ 0 1 0
A= U V . (10.27)
0 U1 0 Σ1 0 V1
U = [u1 u2 · · · un ],
Σ = diag(σ1 , σ2 , · · · , σn ),
V = [v1 v2 · · · vn ],
A = U Σ V T ⇐⇒ AV = U ΣV T V = U Σ,
we have
AV = A[v1 v2 ··· vn ] = [Av1 Av2 · · · Avn ]
σ1
...
= [u1 · · · ur · · · un ] σ r
(10.28)
...
0
= [σ1 u1 · · · σr ur 0 · · · 0].
Therefore,
(
Avj = σj uj , j = 1, 2, · · · , r
A = U ΣV T ⇔ (10.29)
Avj = 0, j = r + 1, · · · , n
• Equation (10.31) gives how to find the singular values {σj } and the
right singular vectors V , while (10.29) shows a way to compute the
left singular vectors U .
• (Dyadic decomposition) The matrix A ∈ Rm×n can be expressed as
n
X
A= σj uj vjT . (10.33)
j=1
When rank(A) = r ≤ n,
r
X
A= σj uj vjT . (10.34)
j=1
This property has been utilized for various approximations and ap-
plications, e.g., by dropping singular vectors corresponding to small
singular values.
10.2. Singular Value Decomposition 191
AT A = V ΛV T ,
Lemma 10.17. Let A ∈ Rn×n be symmetric. Then (a) all the eigenvalues
of A are real and (b) eigenvectors corresponding to distinct eigenvalues
are orthogonal.
192 Chapter 10. Principal Component Analysis
1 2
Example 10.18. Find the SV D for A = −2 1 .
3 2
Solution.
" #
14 6
1. AT A = .
6 9
λ1 = 18 and λ2 = 5,
√ √ √ √ √
3. σ1 = λ1 = 18 = 3 2, σ2 = λ2 = 5. So
"√ #
18 0
Σ= √
0 5
√7
" # 7
√3 234
4
1
4. u1 = √1 A 13
= √118 √113 −4 = − √234
σ1 Av1 =
18 √2
13
13 √13
234
4
" −2
# 4 √
65
√
1 √1 A 13 1 √1 √7
u2 = σ2 Av2 = 5 3
= 5 13 7 = 65 .
√
√
13
0 0
√7 √4
234 65 "√
√3 √2
#" #
4 7
18 0 13 13
5. A = U ΣV T = − √234 √ √
65 0 5 2 3
− √13 √
13
√13 0
234
10.3. Application of the SVD for LS Problems 193
where x
b called a least-squares solution of Ax = b.
b = (AT A)−1 AT b.
x (10.36)
A = U ΣV T .
x
b = V z. (10.41)
z = Σ+ + T
k c = Σk U b, (10.44)
where
Σ+ T
k = [1/σ1 , 1/σ2 , · · · , 1/σk , 0, · · · , 0] . (10.45)
Thus the corresponding LS solution reads
b = V z = V Σ+
x T
k U b. (10.46)
Note that x
b involves no components of the null space of A;
b is unique in this sense.
x
Remark 10.22.
• When rank(A) = k = n: It is easy to see that
−1 T
V Σ+ T
kU = VΣ U , (10.47)
A+ + T
k := V Σk U (10.48)
plays the role of the pseudoinverse of A. Thus we will call it the k-th
pseudoinverse of A.
4 %% Standardization
5 %%---------------------------------------------
6 S_mean = mean(A); S_std = std(A);
7 if S_std(1)==0, S_std(1)=1/S_mean(1); S_mean(1)=0; end
8 AS = (A-S_mean)./S_std;
9
18 sol_PCA = V*C*U'*b;
19 end
Regression_Analysis.m
1 clear all; close all;
2
3 %%-----------------------------------------------------
4 %% Setting
5 %%-----------------------------------------------------
6 regen_data = 0; %==1, regenerate the synthetic data
7 poly_n = 9;
8 npt=300; bx=5.0; sigma=0.50; %for synthetic data
9 datafile = 'synthetic-data.txt';
10
11 %%-----------------------------------------------------
12 %% Data: Generation and Read
13 %%-----------------------------------------------------
14 if regen_data || ~isfile(datafile)
15 DATA = util.get_data(npt,bx,sigma);
16 writematrix(DATA, datafile);
17 fprintf('%s: re-generated.\n',datafile)
18 end
19 DATA = readmatrix(datafile,"Delimiter",",");
20
21 %%-----------------------------------------------------
22 %% The system: A x = b
23 %%-----------------------------------------------------
24 A = util.get_A(DATA(:,1),poly_n+1);
25 b = DATA(:,2);
26
27 %%-----------------------------------------------------
28 %% Method of Noral Equations
29 %%-----------------------------------------------------
30 sol_NE = (A'*A)\(A'*b);
31 figure,
32 plot(DATA(:,1),DATA(:,2),'k.','MarkerSize',8);
33 axis tight; hold on
34 yticks(1:5); ax = gca; ax.FontSize=13; %ax.GridAlpha=0.25
35 title(sprintf('Synthetic Data: npt = %d',npt),'fontsize',13)
36 util.mysave(gcf,'data-synthetic.png');
37 x=linspace(min(DATA(:,1)),max(DATA(:,1)),51);
38 plot(x,util.predict_Y(x,sol_NE),'r-','linewidth',2);
39 Pn = ['P_',int2str(poly_n)];
40 legend('data',Pn, 'location','best','fontsize',13)
41 TITLE0=sprintf('Method of NE: npt = %d',npt);
42 title(TITLE0,'fontsize',13)
43 hold off
10.3. Application of the SVD for LS Problems 199
44 util.mysave(gcf,'data-synthetic-sol-NE.png');
45
46 %%-----------------------------------------------------
47 %% PCA Regression
48 %%-----------------------------------------------------
49 for npc=1:size(A,2);
50 [sol_PCA,S_mean,S_std] = pca_regression(A,b,npc);
51 figure,
52 plot(DATA(:,1),DATA(:,2),'k.','MarkerSize',8);
53 axis tight; hold on
54 yticks(1:5); ax = gca; ax.FontSize=13; %ax.GridAlpha=0.25
55 x=linspace(min(DATA(:,1)),max(DATA(:,1)),51);
56 plot(x,util.predict_Y(x,sol_PCA,S_mean,S_std),'r-','linewidth',2);
57 Pn = ['P_',int2str(poly_n)];
58 legend('data',Pn, 'location','best','fontsize',13)
59 TITLE0=sprintf('Method of PC: npc = %d',npc);
60 title(TITLE0,'fontsize',13)
61 hold off
62 savefile = sprintf('data-sol-PCA-npc-%02d.png',npc);
63 util.mysave(gcf,savefile);
64 end
Figure 10.3: The synthetic data and the LS solution P9 (x), overfitted.
200 Chapter 10. Principal Component Analysis
Figure 10.4: PCA regression of the data, with various numbers of principal components.
The best regression is achieved when npc = 3.
10.3. Application of the SVD for LS Problems 201
(a) Add lines to the code given, to verify (10.20), p.184. For example, set k = 5.
Wine_data.py
1 import numpy as np
2 from numpy import diag,dot
3 from scipy.linalg import svd,norm
4 import matplotlib.pyplot as plt
5
9 #-----------------------------------------------
10 # Standardization
11 #-----------------------------------------------
12 X_mean, X_std = np.mean(X,axis=0), np.std(X,axis=0)
13 XS = (X - X_mean)/X_std
14
15 #-----------------------------------------------
16 # SVD
17 #-----------------------------------------------
18 U, s, VT = svd(XS)
19 if U.shape[0]==U.shape[1]:
20 U = U[:,:len(s)] # cut the nonnecessary
21 Sigma = diag(s) # transform to a matrix
22 print('U:',U.shape, 'Sigma:',Sigma.shape, 'VT:',VT.shape)
Note:
• Line 12: np.mean and np.std are applied, with the option axis=0, to get
the quantities column-by-column vertically. Thus X_mean and X_std are row
vectors.
• Line 18: In Python, svd produces [U, s, VT], where VT = V T . If you would
like to get V , then V = VT.T.
Clue: The major reason that a class is used in the Matlab code in Example 10.23 is
to combine multiple functions to be saved in a file. In Python, you do not have to use
a class to save multiple functions in a file. You may start with the following.
util.py
1 mport numpy as np
2 import matplotlib.pyplot as plt
3
4 def get_data(npt,bx,sigma):
5 data = np.zeros([npt,2]);
6 data[:,0] = np.random.uniform(0,1,npt)*bx;
7 data[:,1] = np.maximum(bx/3,2*data[:,0]-bx);
8 r = np.random.normal(0,1,npt)*sigma;
9 theta = np.random.normal(0,1,npt)*np.pi;
10 noise = np.column_stack((r*np.cos(theta),r*np.sin(theta)));
11 data += noise;
12 return data
13
14 def mysave(filename):
15 plt.savefig(filename,bbox_inches='tight')
16 print('saved:',filename)
17
Regression_Analysis.py
1 import numpy as np
2 import numpy.linalg as la
3 import matplotlib.pyplot as plt
4 from os.path import exists
5 import util
6
7 ##-----------------------------------------------------
8 ## Setting
9 ##-----------------------------------------------------
10 regen_data = 1; #==1, regenerate the synthetic data
11 poly_n = 9;
12 npt=300; bx=5.0; sigma=0.50; #for synthetic data
13 datafile = 'synthetic-data.txt';
14 plt.style.use('ggplot')
15
16 ##-----------------------------------------------------
17 ## Data: Generation and Read
18 ##-----------------------------------------------------
19 if regen_data or not exists(datafile):
20 DATA = util.get_data(npt,bx,sigma);
21 np.savetxt(datafile,DATA,delimiter=',');
10.3. Application of the SVD for LS Problems 203
32 ##-----------------------------------------------------
33 ## The system: A x = b
34 ##-----------------------------------------------------
Note: The semi-colons (;) are not necessary in Python nor harmful; they are in-
cluded from copy-and-paste of Matlab lines. The ggplot style emulates “ggplot",
a popular plotting package for R. When Regression_Analysis.py is executed, you
will have a saved image:
Machine Learning
Contents of Chapter 11
11.1.What is Machine Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
11.2.Binary Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
11.3.Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
11.4.Multi-Column Least-Squares Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
Exercises for Chapter 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
205
206 Chapter 11. Machine Learning
4. Interpretability:
Although ML has come very far, researchers still don’t know exactly
how some algorithms (deep nets) work.
• If we don’t know how training nets actually work, how do we make
any real progress?
5. One-Shot Learning:
We still haven’t been able to achieve one-shot learning. Traditional
gradient-based networks need a huge amount of data, and are
often in the form of extensive iterative training.
• Instead, we should find a way to enable neural networks to learn,
using just a few examples.
210 Chapter 11. Machine Learning
Definition 11.4. Let {(x(i) , y (i) )} be labeled data, with x(i) ∈ Rd and
y (i) ∈ {0, 1}. A binary classifier finds a hyperplane in Rd that sepa-
rates data points X = {x(i) } to two classes; see Figure 11.2, p. 207.
Activation functions:
(
1, if z ≥ θ
Perceptron : φ(z) =
0, otherwise
Adaline : φ(z) = z (11.4)
1
Logistic Regression : φ(z) = σ(z) :=
1 + e−z
where “Adaline” stands for ADAptive LInear NEuron. The activation
function σ(z) called the standard logistic sigmoid function or simply
the sigmoid function.
Figure 11.5: Popular activation functions: (left) The standard logistic sigmoid function
and (right) the rectifier and softplus function.
11.2.1. Adaline
Algorithm 11.7. Adaline Learning:
From data {(x(i) , y (i) )}, learn the weights w and bias b, with
• Activation function: φ(z) = z (i.e., identity activation)
• Cost function: the SSE
1 X (i) (i)
2
J (w, b) = y − φ(z ) . (11.10)
2 i
The dominant algorithm for the minimization of the cost function is the the
Gradient Descent Method.
Algorithm 11.8. The Gradient Descent Method uses −∇J for the
search direction (update direction):
Thus, with φ = I,
X
∆w = −η∇w J (w, b) = η y − φ(z ) x(i) ,
(i) (i)
i
X
(i) (i)
(11.13)
∆b = −η∇b J (w, b) = η y − φ(z ) .
i
Hyperparameters
Definition 11.9. In ML, a hyperparameter is a parameter whose
value is set before the learning process begins. Thus it is an algorithmic
parameter. Examples are
• The learning rate (η)
• The number of maximum epochs/iterations (n_iter)
Note: There are effective searching schemes to set the learning rate η
automatically.
216 Chapter 11. Machine Learning
and therefore Xh i
∇w J (w) = − y − φ(z ) x(i) .
(i) (i)
(11.16)
i
Similarly, one can get
Xh i
(i) (i)
∇b J (w) = − y − φ(z ) . (11.17)
i
11.2. Binary Classifiers 217
i h
X
(i) (i)
i (11.19)
∆b = −η∇b J (w, b) = η y − φ(z ) .
i
Note: The above gradient descent rule for Logistic Regression is of the
same form as that of Adaline; see (11.13) on p. 215. Only the difference
is the activation function φ.
218 Chapter 11. Machine Learning
• − vs {◦, +} ⇒ weights w−
• + vs {◦, −} ⇒ weights w+
• ◦ vs {+, −} ⇒ weights w◦
=⇒
Figure 11.11: Segmentation.
It can do this by heavily weighting input pixels which overlap with the
image, and only lightly weighting the other inputs.
• Similarly, let’s suppose that the second, third, and fourth neurons
in the hidden layer detect whether or not the following images are
present
• As you may have guessed, these four images together make up the 0
image that we saw in the line of digits shown in Figure 11.11:
• So if all four of these hidden neurons are firing, then we can conclude
that the digit is a 0.
222 Chapter 11. Machine Learning
where W denotes the collection of all weights in the network, B all the
biases, and a(x(i) ) is the vector of outputs from the network when x(i)
is input.
• Gradient descent method
" # " # " #
W W ∆W
← + , (11.21)
B B ∆B
e(1) , x
x e(2) , · · · , x
e(m) ,
• For classification of hand-written digits for the MNIST data set, you
may choose: batch_size = 10.
13 class Network(object):
14 def __init__(self, sizes):
15 """The list ``sizes`` contains the number of neurons in the
16 respective layers of the network. For example, if the list
17 was [2, 3, 1] then it would be a three-layer network, with the
18 first layer containing 2 neurons, the second layer 3 neurons,
19 and the third layer 1 neuron. """
20
21 self.num_layers = len(sizes)
22 self.sizes = sizes
23 self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
24 self.weights = [np.random.randn(y, x)
25 for x, y in zip(sizes[:-1], sizes[1:])]
26
94 z = zs[-l]
95 sp = sigmoid_prime(z)
96 delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
97 nabla_b[-l] = delta
98 nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
99 return (nabla_b, nabla_w)
100
4 import network
5 n_neurons = 20
6 net = network.Network([784 , n_neurons, 10])
7
Validation Accuracy
Validation Accuracy
1 Epoch 0: 9006 / 10000
2 Epoch 1: 9128 / 10000
3 Epoch 2: 9202 / 10000
4 Epoch 3: 9188 / 10000
5 Epoch 4: 9249 / 10000
6 ...
7 Epoch 25: 9356 / 10000
8 Epoch 26: 9388 / 10000
9 Epoch 27: 9407 / 10000
10 Epoch 28: 9410 / 10000
11 Epoch 29: 9428 / 10000
Accuracy Comparisons
• scikit-learn’s SVM classifier using the default settings: 9435/10000
• A well-tuned SVM: ≈98.5%
• Well-designed Convolutional NN (CNN):
9979/10000 (only 21 missed!)
• Let the superscript in () denote the class. A point in the c-th class is
expressed as
(c) (c)
x(c) = [x1 , x2 ] = [x1 , x2 , c], c = 0, 1, 2.
where the j-th column weights heavily the point in the j-th class.
– Define the source matrix
B = [δci ,j ] ∈ RN ×3 . (11.28)
For example, if the i-th point is in the class 0, then the i-th row of
B is [1, 0, 0].
• Then the multi-column least-squares problem reads
c = arg min ||AW − B||2 ,
W (11.29)
W
Prediction
• Let [x1 , x2 ] be a new point.
• Compute
[1, x1 , x2 ] W
c = [p0 , p1 , p2 ], c ∈ R3×3 .
W (11.31)
Ideally, if the point [x1 , x2 ] is in class j, then pj is near 1, while others
would be near 0. Thus pj is the largest.
• Decide the class c:
4 N_D1 = 100
5 FORMAT = '%.3f','%.3f','%d'
6
7 SCALE = [[1,1],[1,2],[1.5,1]]
8 THETA = [0,-0.25*np.pi, 0]
9 TRANS = [[0,0],[6,0],[3,4]]
10 COLOR = ['r','b','c']
11 MARKER = ['.','s','+','*']
12 LINESTYLE = [['r--','r-'],['b--','b-'],['c--','c-']]
13
14 N_CLASS = len(SCALE)
15
16 DAT_FILENAME = 'synthetic.data'
17 FIG_FILENAME = 'synthetic-data.png'
18 FIG_INTERPRET = 'synthetic-data-interpret.png'
19
20 def myfigsave(figname):
21 plt.savefig(figname,bbox_inches='tight')
22 print(' saved: %s' %(figname))
11.4. Multi-Column Least-Squares Problem 231
synthetic_data.py
1 import numpy as np
2 import matplotlib.pyplot as plt
3 from GLOBAL_VARIABLES import *
4
5 def generate_data(n,scale,theta):
6 # Normally distributed around the origin
7 x = np.random.normal(0,1, n); y = np.random.normal(0,1, n)
8 P = np.vstack((x, y)).T
9 # Transform
10 sx,sy = scale
11 S = np.array([[sx,0],[0,sy]])
12 c,s = np.cos(theta), np.sin(theta)
13 R = np.array([[c,-s],[s,c]]).T #T, due to right multiplication
14 return P.dot(S).dot(R)
15
16 def synthetic_data():
17 N=0
18 plt.figure()
19 for i in range(N_CLASS):
20 scale = SCALE[i]; theta = THETA[i]; N+=N_D1
21 D1 = generate_data(N_D1,scale,theta) +TRANS[i]
22 D1 = np.column_stack((D1,i*np.ones([N_D1,1])))
23 if i==0: DATA = D1
24 else: DATA = np.row_stack((DATA,D1))
25 plt.scatter(D1[:,0],D1[:,1],s=15,c=COLOR[i],marker=MARKER[i])
26
27 np.savetxt(DAT_FILENAME,DATA,delimiter=',',fmt=FORMAT)
28 print(' saved: %s' %(DAT_FILENAME))
29
39 if __name__ == '__main__':
40 synthetic_data()
232 Chapter 11. Machine Learning
6 def set_MC_LS(X,y):
7 N,d = X.shape; nclass = len(set(y))
8 A = np.column_stack((np.ones([N,]),X))
9 b = np.zeros([N,nclass])
10 for i,v in enumerate(y): # one-hot encoding
11 b[i,int(v)] = 1
12 return A,b
13
14 def ls_solve(A,b,npc):
15 if npc==0:
16 return la.solve((A.T).dot(A),(A.T).dot(b))
17 else:
18 U, s, VT = svd(A)
19 V = VT.T
20 U = U[:,:npc]
21 C = diag(1/s[:npc])
22 V = V[:,:npc]
23 return V.dot(C.dot((U.T).dot(b)))
24
25 def prediction(A,sol):
26 forward = A.dot(sol)
27 return np.argmax(forward,axis=1)
28
29 def count_diff(u,v):
30 count =0
31 for i in range(len(u)):
32 if u[i] != v[i]: count+=1
33 return count
11.4. Multi-Column Least-Squares Problem 233
Multi_Column_LS.py
1 import numpy as np
2 import matplotlib.pyplot as plt
3 import time
4 from util_MC_LS import *; from GLOBAL_VARIABLES import *
5
6 #-----------------------------------------------
7 # Add Data: append(['name','delimiter',clabel])
8 #-----------------------------------------------
9 DLIST =[]
10 DLIST.append(['synthetic.data', ',', -1])
11 DLIST.append(['wine.data', ',', 0])
12 DLIST.append(['seeds_dataset.txt','\t',-1])
13
14 #-----------------------------------------------
15 # User Setting
16 #-----------------------------------------------
17 idata = 0
18 refigure = 1
19 rtrain = 0.7; run = 1000
20
21 #-----------------------------------------------
22 # DATA: Read & Preprocessing
23 #-----------------------------------------------
24 DATA =np.loadtxt(DLIST[idata][0], delimiter=DLIST[idata][1]);
25 clabel =int(DLIST[idata][2])
26
32 l0=int(min(labelset));
33 if l0: DATA[:,clabel]-=l0 # label begins with 0
34
35 #-----------------------------------------------
36 # Machine Learning: Multi-Column Least-Squares
37 #-----------------------------------------------
38 ntrain = int(N*rtrain)
39 Acc = np.zeros([run,1])
40 print(' Multi-Column Least-Squares: (rtrain,run) =(%.2f,%d)' %(rtrain,run))
41
42 btime = time.time()
43 for i in range(run):
234 Chapter 11. Machine Learning
54 # Multi-Column Least-Squares:
55 #-----------------------------------------------
56 A,b = set_MC_LS(Xtrain,ytrain)
57 sol = ls_solve(A,b,0)
58 if i==0 and DLIST[idata][0]=='synthetic.data': param = sol
59
60 # Prediction
61 #-----------------------------------------------
62 A1,b1 = set_MC_LS(Xtest,ytest)
63 predicted = prediction(A1,sol)
64 Acc[i] = 1-count_diff(predicted,ytest)/len(ytest)
65
66 etime = time.time()-btime
67 print(' Accuracy.(mean,std) = (%.2f,%.2f)%%'\
68 %(np.mean(Acc)*100,np.std(Acc)*100))
69 print(' Average Total Etime = %.5f' %(etime/run))
70
71 #-----------------------------------------------
72 # Figuring
73 #-----------------------------------------------
74 if DLIST[idata][0]=='synthetic.data':
75 if refigure:
76 plt.figure()
77 DATA = np.loadtxt(DAT_FILENAME,delimiter=',')
78 for i in range(nclass):
79 D1 = DATA[(i*N_D1):((i+1)*N_D1),:]
80 plt.scatter(D1[:,0],D1[:,1],s=15,c=COLOR[i],marker=MARKER[i])
81
• For wine.data, the best known algorithm can predict with accuracy about
95%, while the MC-LS can predict with accuracy about 98.5%.
• For seeds_dataset.txt, the best known algorithm can predict with accuracy
about 92%, while the new algorithm can achieve about 97% accuracy.
11.2. Now, modify synthetic_data.py to produce new synthetic datasets having 3-4 classes.
Set rtrain = 0.7.
• Generate a synthetic dataset of three classes where the centers of classes are in
a straight line.
• Generate a synthetic dataset of four classes, with class centers not on a straight
line.
(a) Modify Multi_Column_LS.py, if necessary, to process the new datasets and pro-
duce figures as in Figure 11.15, p. 235.
(b) How about accuracy? Is the MC-LS similarly good for the new datasets?
12
C HAPTER
Contents of Chapter 12
12.1.Scikit-Learn Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
12.2.Scikit-Learn – Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
12.3.Scikit-Learn – Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
Exercises for Chapter 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
237
238 Chapter 12. Scikit-Learn: A Popular Machine Learning Library
In practice :
• Each algorithm has its own quirks/characteristics and is based on cer-
tain assumptions.
• It is always recommended that you compare the performance of at
least a handful of different learning algorithms to select the best model
for the particular problem.
• No Free Lunch Theorem: No single classifier works best across all
possible scenarios.
Why Scikit-Learn?
• Nice documentation and usability
• Covers most machine-learning tasks
• Scikit-learn scales to most data problems
⇒ Easy-to-use, convenient, and powerful enough
An Example Code
iris_sklearn.py
1 #------------------------------------------------------
2 # Load Data
3 #------------------------------------------------------
4 from sklearn import datasets
5 # dir(datasets); load_iris, load_digits, load_breast_cancer, load_wine, ...
6
7 iris = datasets.load_iris()
8
9 feature_names = iris.feature_names
10 target_names = iris.target_names
11 print("## feature names:", feature_names)
12 print("## target names :", target_names)
13 print("## set(iris.target):", set(iris.target))
14
15 #------------------------------------------------------
16 # Create "model instances"
17 #------------------------------------------------------
18 from sklearn.linear_model import LogisticRegression
19 from sklearn.neighbors import KNeighborsClassifier
20 LR = LogisticRegression(max_iter = 1000)
21 KNN = KNeighborsClassifier(n_neighbors = 3)
22
240 Chapter 12. Scikit-Learn: A Popular Machine Learning Library
23 #------------------------------------------------------
24 # Split, train, and fit
25 #------------------------------------------------------
26 import numpy as np
27 from sklearn.model_selection import train_test_split
28
29 X = iris.data; y = iris.target
30 iter = 20; Acc = np.zeros([iter,2])
31
32 for i in range(iter):
33 X_train, X_test, y_train, y_test = train_test_split(
34 X, y, test_size=0.3, random_state=i, stratify=y)
35 LR.fit(X_train, y_train); Acc[i,0] = LR.score(X_test, y_test)
36 KNN.fit(X_train, y_train); Acc[i,1] = KNN.score(X_test, y_test)
37
38 acc_mean = np.mean(Acc,axis=0)
39 acc_std = np.std(Acc,axis=0)
40 print('## iris.Accuracy.LR : %.4f +- %.4f' %(acc_mean[0],acc_std[0]))
41 print('## iris.Accuracy.KNN: %.4f +- %.4f' %(acc_mean[1],acc_std[1]))
42
43 #------------------------------------------------------
44 # New Sample
45 #------------------------------------------------------
46 sample = [[5, 3, 2, 4],[4, 3, 3, 6]];
47 print('## New sample =',sample)
48 predL = LR.predict(sample); predK = KNN.predict(sample)
49 print(" ## sample.LR.predict :",target_names[predL])
50 print(" ## sample.KNN.predict:",target_names[predK])
Output
1 ## feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)',
2 'petal width (cm)']
3 ## target names : ['setosa' 'versicolor' 'virginica']
4 ## set(iris.target): {0, 1, 2}
5 ## iris.Accuracy.LR : 0.9667 +- 0.0294
6 ## iris.Accuracy.KNN: 0.9678 +- 0.0248
7 ## New sample = [[5, 3, 2, 4], [4, 3, 3, 6]]
8 ## sample.LR.predict : ['setosa' 'virginica']
9 ## sample.KNN.predict: ['versicolor' 'virginica']
3 iris = sbn.load_dataset('iris')
4 print(iris.head())
5 sbn.pairplot(iris, hue='species',height=3)
6 plt.savefig('seaborn-pairplot-iris.png',bbox_inches='tight')
7 #plt.show()
Output
1 sepal_length sepal_width petal_length petal_width species
2 0 5.1 3.5 1.4 0.2 setosa
3 1 4.9 3.0 1.4 0.2 setosa
4 2 4.7 3.2 1.3 0.2 setosa
5 3 4.6 3.1 1.5 0.2 setosa
6 4 5.0 3.6 1.4 0.2 setosa
12.1. Scikit-Learn Basics 243
5 #------------------------------------------------------
6 # Load Data
7 #------------------------------------------------------
8 iris = datasets.load_iris()
9 nclass = len(set(iris.target))
10 print('## iris: nclass =', nclass)
11
12 #------------------------------------------------------
13 # Use pandas, for data analysis
14 #------------------------------------------------------
15 data = pd.DataFrame(iris.data)
16 target = pd.DataFrame(iris.target)
17
26 #------------------------------------------------------
27 print('## Visualization: use seaborn + matplotlib.pyplot ##')
28 #------------------------------------------------------
29 sbn.heatmap(data.corr(), annot = True, cmap='Greys');
30 plt.title('iris.data.corr()');
31 plt.savefig('iris_data_corr.png',bbox_inches='tight')
32 #plt.show()
33 #------------------------------------------------------
34 print('## df = pd.concat([data, target], axis = 1) ##')
244 Chapter 12. Scikit-Learn: A Popular Machine Learning Library
35 #------------------------------------------------------
36 df = pd.concat([data, target], axis = 1)
37 print('## df.head(3):\n', df.head(3))
38
39 #------------------------------------------------------
40 print('## Check for Missing Values')
41 #------------------------------------------------------
42 print('## df.isnull().sum():\n',df.isnull().sum())
43 print('## df.describe():\n', df.describe())
44
45 #------------------------------------------------------
46 print("## Data Sepatation: C0 = df.loc[df['target']==0]")
47 #------------------------------------------------------
48 C0 = df.loc[df['target']==0]
49 print('## C0.describe():\n', C0.describe())
50 print('## C0.count()[0] , C0.mean()[0] =',C0.count()[0],',',C0.mean()[0])
51
52 y0 = C0.pop('target')
53 plt.figure() # new figure
54 sbn.heatmap(C0.corr(), annot = True, cmap='Greys');
55 plt.title('iris.C0.corr()');
56 plt.savefig('iris_C0_corr.png',bbox_inches='tight')
Output
1 ## iris: nclass = 3
2 ## data.head(3):
3 0 1 2 3
4 0 5.1 3.5 1.4 0.2
5 1 4.9 3.0 1.4 0.2
6 2 4.7 3.2 1.3 0.2
7 ## Re-assign data.columns and target[0] ##
8 ## data.head(3):
9 sepal_length sepal_width petal_length petal_width
10 0 5.1 3.5 1.4 0.2
11 1 4.9 3.0 1.4 0.2
12 2 4.7 3.2 1.3 0.2
13 ## target.head(3):
14 target
15 0 0
16 1 0
17 2 0
18 ## Visualization: use seaborn + matplotlib.pyplot ##
19 ## df = pd.concat([data, target], axis = 1) ##
20 ## df.head(3):
21 sepal_length sepal_width petal_length petal_width target
12.1. Scikit-Learn Basics 245
12.1.
250 Chapter 12. Scikit-Learn: A Popular Machine Learning Library
P
A PPENDIX
Projects
Contents of Chapter P
P.1. Edge Detection, using Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
P.2. Number Plate Detection, using Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
251
252 Appendix P. Projects
• with pre-smoothing
• without pre-smoothing
P.2. Number Plate Detection, using Python 253
[3] M. F ISCHLER AND R. B OLLES, Random sample consensus: A paradigm for model
fitting with applications to image analysis and automated cartography, Communica-
tions of the ACM, 24 (1981), pp. 381–395.
[4] B. G ROSSER AND B. L ANG, An o(n2 ) algorithm for the bidiagonal svd, Lin. Alg. Appl.,
358 (2003), pp. 45–70.
[7] M. N IELSEN, Neural networks and deep learning. (The online book can be found at
http://neuralnetworksanddeeplearning.com), 2013.
[8] F. R OSENBLATT, The Perceptron: A probabilistic model for information storage and
organization in the brain, Psychological Review, (1958), pp. 65–386.
[9] P. H. T ORR AND A. Z ISSERMAN, Mlesac: A new robust estimator with application
to estimating image geometry, Computer vision and image understanding, 78 (2000),
pp. 138–156.
[10] P. R. W ILLEMS, B. L ANG, AND C. V ÖMEL, Computing the bidiagonal SVD using
multiple relatively robust representations, SIAM Journal on Matrix Analysis and Ap-
plications, 28 (2006), pp. 907–926.
255
256 BIBLIOGRAPHY
Index
:, Python slicing, 141 class, 149, 150
:, in Matlab, 14 class attribute, 153
_ _init_ _() constructor, 151 Classes.py, 154
classification problem, 124
activation function, 212 clustering, 210
activation function, why?, 212 CNN, 227
activation functions, popular, 214 code block, 140
Adaline, 213, 214 coding, iii, 2
adaptive step size, 166 coding vs. programming, 5
algorithmic design, 4 coefficient matrix, 91
algorithmic parameter, 215 coefficients, 76
anonymous function, 25 cofactor, 101
anonymous_function.m, 25 cofactor expansion, 101
approximation, 114 common logarithm, 48
area_closed_curve.m, 33, 51 complex number system, 35
artificial neurons, 211 computer programming, iii, 2, 8
attributes, 151 consistent system, 91
augmented matrix, 91 constraint set, 161
average slope, 86 continue, 24
average speed, 54 contour, 36, 87
contour, in Matlab, 18
backbone of programming, 8 convergence of Newton’s method, 73
backtracking line search, 166 converges absolutely, 66
basis function, 64 correction term, 71
binary classifier, 211, 212 cost function, 212
break, 23 covariance, 177
covariance matrix, 176–178, 180, 191
call_get_cubes.py, 145 Covariance.py, 178
cancellation equations, 40 critical point, 164
chain rule, 62 csvwrite, 32
change of basis, 176 curse of dimensionality, 208
change of variables, 64, 122 cython, 138
change-of-base formula, 50
characteristic equation, 106 daspect, 32
characteristic polynomial, 106 data matrix, 181
charpoly, 107 data preparation, 241
child class, 154 data preprocessing, 241
circle.m, 32 data visualization, 241
257
258 INDEX