0% found this document useful (0 votes)

19 views66 pages

Nitin Sir Notes

NITIN SIR NOTES

Uploaded by

Tik4Tech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views66 pages

Nitin Sir Notes

NITIN SIR NOTES

Uploaded by

Tik4Tech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Module 2.

1: Biological Neurons

1
y • The most fundamental unit of a deep
neural network is called an artificial
neuron
σ
• Why is it called a neuron ? Where does
the inspiration come from ?
w1 w2 w3 • The inspiration comes from biology
x1 x2 x3 (more specifically, from the brain)
• biological neurons = neural cells = neural
Artificial Neuron
processing units
• We will first see what a biological neuron
looks like ...

2
• dendrite: receives signals from other
neurons
• synapse: point of connection to other
neurons
• soma: processes the information
• axon: transmits the output of this
neuron

Biological Neurons∗

∗
Image adapted from
https://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg

3
• Let us see a very cartoonish illustration
of how a neuron works
• Our sense organs interact with the out-
side world
• They relay information to the neurons
• The neurons (may) get activated and pro-
duces a response (laughter in this case)

4
• Of course, in reality, it is not just a single neuron
which does all this
• There is a massively parallel interconnected net-
work of neurons
• The sense organs relay information to the lowest
layer of neurons
• Some of these neurons may fire (in red) in re-
sponse to this information and in turn relay inform-
ation to other neurons they are connected to
• These neurons may also fire (again, in red) and
the process continues eventually resulting in a re-
sponse (laughter in this case)
• An average human brain has around 1011 (100 bil-
lion) neurons!
5
• This massively parallel network also ensures that
there is division of work
• Each neuron may perform a certain role or respond
to a certain stimulus

A simplified illustration
6
• The neurons in the brain are arranged in
a hierarchy
• We illustrate this with the help of visual
cortex (part of the brain) which deals
with processing visual information
• Starting from the retina, the information
is relayed to several layers (follow the ar-
rows)
• We observe that the layers V 1, V 2 to
AIT form a hierarchy (from identifying
simple visual forms to high level objects)

7
Sample illustration of hierarchical
processing∗
∗
Idea borrowed from Hugo Larochelle’s lecture slides

8
Disclaimer
• I understand very little about how the brain works!
• What you saw so far is an overly simplified explanation of how the brain works!
• But this explanation suffices for the purpose of this course!

9
Module 2.2: McCulloch Pitts Neuron

10
y ∈ {0, 1} • McCulloch (neuroscientist) and Pitts (logician)
proposed a highly simplified computational model
of the neuron (1943)
f • g aggregates the inputs and the function f takes
a decision based on this aggregation
g
• The inputs can be excitatory or inhibitory
• y = 0 if any xi is inhibitory, else
x1 x2 .. .. xn ∈ {0, 1}
n
X
g(x1 , x2 , ..., xn ) = g(x) = xi
i=1

y = f (g(x)) = 1 if g(x) ≥ θ
= 0 if g(x) < θ
• θ is called the thresholding parameter
• This is called Thresholding Logic 11
Let us implement some boolean functions using this McCulloch Pitts (MP) neuron ...

12
y ∈ {0, 1} y ∈ {0, 1} y ∈ {0, 1}

θ 3 1

x1 x2 x3 x1 x2 x3 x1 x2 x3

A McCulloch Pitts unit AND function OR function

y ∈ {0, 1} y ∈ {0, 1} y ∈ {0, 1}

1 0 0

x1 x2 x1 x2 x1

x1 AND !x2 ∗ NOR function NOT function

∗
circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0
13
• Can any boolean function be represented using a McCulloch Pitts unit ?
• Before answering this question let us first see the geometric interpretation of a MP unit
...

14
y ∈ {0, 1}
• A single MP neuron splits the input points (4
points for 2 binary inputs) into two halves
• Points lying on or above the line ni=1 xi − θ = 0
P
1
and points lying below this line
x1 x2
• In other words, all inputs which produce an output
OR function 0 will be on one side ( ni=1 xi < θ) of the line and
P
P2
x1 + x2 = i=1 xi ≥ 1 all inputs which produce an output 1 will lie on the
other side ( ni=1 xi ≥ θ) of this line
P
x2
• Let us convince ourselves about this with a few
(0, 1) (1, 1) more examples (if it is not already clear from the
math)
x1 + x2 = θ = 1

(0, 0) (1, 0) x1
15
y ∈ {0, 1} y ∈ {0, 1}

2 0

x1 x2 x1 x2

AND function Tautology (always ON)

P2
x1 + x2 = i=1 xi ≥ 2
x2
x2
(0, 1) (1, 1)
(0, 1) (1, 1)

x1 + x2 = θ = 0
x1 + x2 = θ = 2

(0, 0) (1, 0) x1
(0, 0) (1, 0) x1
16
y ∈ {0, 1}
• What if we have more than 2 inputs?
• Well, instead of a line we will have a plane
1 OR • For the OR function, we want a plane
such that the point (0,0,0) lies on one
x1 x2 x3
side and the remaining 7 points lie on the
x2
other side of the plane
(0, 1, 0) (1, 1, 0)

(0, 1, 1) (1, 1, 1)x1 + x2 + x3 = θ = 1

(0, 0, 0) (1, 0, 0) x1

(0, 0, 1) (1, 0, 1)
x3
17
The story so far ...
• A single McCulloch Pitts Neuron can be used to represent boolean functions which are
linearly separable
• Linear separability (for boolean functions) : There exists a line (plane) such that all in-
puts which produce a 1 lie on one side of the line (plane) and all inputs which produce
a 0 lie on other side of the line (plane)

18
Module 2.3: Perceptron

19
The story ahead ...
• What about non-boolean (say, real) inputs ?
• Do we always need to hand code the threshold ?
• Are all inputs equal ? What if we want to assign more weight (importance) to some
inputs ?
• What about functions which are not linearly separable ?

20
y • Frank Rosenblatt, an American psychologist, pro-
posed the classical perceptron model (1958)
• A more general computational model than McCul-
loch–Pitts neurons
• Main differences: Introduction of numerical
w1 w2 .. .. wn weights for inputs and a mechanism for learning
x1 x2 .. .. xn these weights
• Inputs are no longer limited to boolean values
• Refined and carefully analyzed by Minsky and Pa-
pert (1969) - their model is referred to as the per-
ceptron model here

21
y
n
X
y = 1 if wi ∗ x i ≥ θ
i=1
Xn
= 0 if wi ∗ x i < θ
i=1
w0 = −θ w1 w2 .. .. wn
Rewriting the above,
x0 = 1 x1 x2 .. .. xn
A more accepted convention, n
X
X n y = 1 if wi ∗ x i − θ ≥ 0
y = 1 if wi ∗ x i ≥ 0 i=1
n
i=0 X
Xn = 0 if wi ∗ x i − θ < 0
= 0 if wi ∗ x i < 0 i=1
i=0

where, x0 = 1 and w0 = −θ
22
We will now try to answer the following questions:

• Why are we trying to implement boolean functions?

• Why do we need weights ?
• Why is w0 = −θ called the bias ?

23
y • w0 is called the bias as it represents the prior (prejudice)
• A movie buff may have a very low threshold and may watch
any movie irrespective of the genre, actor, director [θ = 0]
• On the other hand, a selective viewer may only watch
thrillers starring Matt Damon and directed by Nolan [θ = 3]
w0 = −θ w1 w2 w3 • The weights (w1 , w2 , ..., wn ) and the bias (w0 ) will depend
x0 = 1 x1 x2 x3 on the data (viewer history in this case)

x1 = isActorDamon
x2 = isGenreT hriller
x3 = isDirectorN olan

24
What kind of functions can be implemented using the perceptron? Any difference from
McCulloch Pitts neurons?

25
McCulloch Pitts Neuron
• From the equations it should be clear that even
(assuming no inhibitory inputs)
a perceptron separates the input space into two
n
X halves
y = 1 if xi ≥ 0
i=0 • All inputs which produce a 1 lie on one side and all
Xn inputs which produce a 0 lie on the other side
= 0 if xi < 0
• In other words, a single perceptron can only be
i=0
used to implement linearly separable functions
• Then what is the difference? The weights (includ-
Perceptron ing threshold) can be learned and the inputs can
n be real valued
X
y = 1 if wi ∗ x i ≥ 0 • We will first revisit some boolean functions and
i=0
n
then see the perceptron learning algorithm (for
learning weights)
X
= 0 if wi ∗ x i < 0
i=0
26
x1 x2 OR x2
0 w0 + 2i=1 wi xi
P
0 0 <0
1 0 1 w0 + 2i=1 wi xi
P
≥0 (0, 1) (1, 1)
1 w0 + 2i=1 wi xi
P
0 1 ≥0
1 w0 + 2i=1 wi xi
P
1 1 ≥0 −1 + 1.1x1 + 1.1x2 = 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 (0, 0) (1, 0) x1

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
• Note that we can come up with
a similar set of inequalities and
• One possible solution to this set of inequalities is
find the value of θ for a McCul-
w0 = −1, w1 = 1.1, , w2 = 1.1 (and various
loch Pitts neuron also (Try it!)
other solutions are possible)
27
Module 2.4: Errors and Error Surfaces

28
• Let us fix the threshold (−w0 = 1) and try differ-
ent values of w1 , w2 −1 + (0.45)x1 + (0.45)x2 = 0
x2
• Say, w1 = −1, w2 = −1
−1 + 1.1x1 + 1.1x2 = 0
• What is wrong with this line? We make an error (0, 1) (1, 1)
on 1 out of the 4 inputs
• Lets try some more values of w1 , w2 and note how
many errors we make
w1 w2 errors
(0, 0) (1, 0) x1
-1 -1 3
1.5 0 1 −1 + (1.5)x1 + (0)x2 = 0
0.45 0.45 3
−1 + (−1)x1 + (−1)x2 = 0
• We are interested in those values of w0 , w1 , w2
which result in 0 error
• Let us plot the error surface corresponding to dif-
29
ferent values of w0 , w1 , w2
• For ease of analysis, we will keep w0 fixed
(-1) and plot the error for different values
of w1 , w2
• For a given w0 , w1 , w2 we will compute
−w0 + w1 ∗ x1 + w2 ∗ x2 for all com-
binations of (x1 , x2 ) and note down how
many errors we make
• For the OR function, an error occurs if
(x1 , x2 ) = (0, 0) but −w0 + w1 ∗ x1 +
w2 ∗ x2 ≥ 0 or if (x1 , x2 ) 6= (0, 0) but
−w0 + w1 ∗ x1 + w2 ∗ x2 < 0
• We are interested in finding an algorithm
which finds the values of w1 , w2 which
minimize this error
30
Module 2.5: Perceptron Learning Algorithm

31
• We will now see a more principled approach for learning these weights and threshold
but before that let us answer this question...
• Apart from implementing boolean functions (which does not look very interesting)
what can a perceptron be used for ?
• Our interest lies in the use of perceptron as a binary classifier. Let us see what this
means...

32
y • Let us reconsider our problem of deciding whether
to watch a movie or not
• Suppose we are given a list of m movies and a la-
bel (class) associated with each movie indicating
whether the user liked this movie or not : binary
w0 = −θ w1 w2 .. .. wn decision

x0 = 1 x1 x2 .. .. xn • Further, suppose we represent each movie with n

features (some boolean, some real valued)
x1 = isActorDamon
• We will assume that the data is linearly separable
x2 = isGenreT hriller
and we want a perceptron to learn how to make
x3 = isDirectorN olan this decision
x4 = imdbRating(scaled to 0 to 1) • In other words, we want the perceptron to find the
... ... equation of this separating plane (or find the val-
xn = criticsRating(scaled to 0 to 1) ues of w0 , w1 , w2 , .., wm )
33
Algorithm: Perceptron Learning Algorithm • Why would this work ?
P ← inputs with label 1; • To understand why this works we
N ← inputs with label 0; will have to get into a bit of Linear
Initialize w randomly; Algebra and a bit of geometry...
while !convergence do
Pick random x ∈ P ∪ N ;
Pn
if x ∈ P and i=0 wi ∗ xi < 0 then
w =w+x;
end
Pn
if x ∈ N and i=0 wi ∗ xi ≥ 0 then
w =w−x;
end
end
//the algorithm converges when all the inputs
are classified correctly
34
• Consider two vectors w and x • We are interested in finding the line
wT x = 0 which divides the input space
w = [w0 , w1 , w2 , ..., wn ] into two halves
x = [1, x1 , x2 , ..., xn ] • Every point (x) on this line satisfies the
n
X equation wT x = 0
w · x = wT x = wi ∗ x i
i=0 • What can you tell about the angle (α)
between w and any point (x) which lies
• We can thus rewrite the perceptron rule
on this line ?
as
wT x
• The angle is 90° (∵ cosα = ||w||||x|| = 0)
T
y = 1 if w x≥0 • Since the vector w is perpendicular to
= 0 if wT x < 0 every point on the line it is actually per-
pendicular to the line itself

35
• Consider some points (vectors) which lie in the x2
positive half space of this line (i.e., wT x ≥ 0)
p2 w
• What will be the angle between any such vector
wT x =0
and w ? Obviously, less than 90° p1
p3
• What about points (vectors) which lie in the neg-
ative half space of this line (i.e., wT x < 0) n1
• What will be the angle between any such vector x1
and w ? Obviously, greater than 90°
• Of course, this also follows from the formula
wT x
(cosα = ||w||||x|| )
n2 n3
• Keeping this picture in mind let us revisit the al-
gorithm

36
Algorithm: Perceptron Learning Algorithm • For x ∈ P if w.x < 0 then it means
P ← inputs with label 1; that the angle (α) between this x
N ← inputs with label 0; and the current w is greater than
Initialize w randomly; 90° (but we want α to be less than
while !convergence do 90°)
Pick random x ∈ P ∪ N ; • What happens to the new angle
if x ∈ P and w.x < 0 then (αnew ) when wnew = w + x
w =w+x;
end cos(αnew ) ∝ wnew T x
if x ∈ N and w.x ≥ 0 then ∝ (w + x)T x
w =w−x;
∝ w T x + xT x
end
∝ cosα + xT x
end
//the algorithm converges when all the inputs cos(αnew ) > cosα
are classified correctly • Thus αnew will be less than α and
37
cosα =
wT x this is exactly what we want
Algorithm: Perceptron Learning Algorithm • For x ∈ N if w.x ≥ 0 then it means
P ← inputs with label 1; that the angle (α) between this x
N ← inputs with label 0; and the current w is less than 90°
Initialize w randomly; (but we want α to be greater than
while !convergence do 90°)
Pick random x ∈ P ∪ N ; • What happens to the new angle
if x ∈ P and w.x < 0 then (αnew ) when wnew = w − x
w =w+x;
end cos(αnew ) ∝ wnew T x
if x ∈ N and w.x ≥ 0 then ∝ (w − x)T x
w =w−x;
∝ w T x − xT x
end
∝ cosα − xT x
end
//the algorithm converges when all the inputs cos(αnew ) < cosα
are classified correctly • Thus αnew will be greater than α and
38
cosα =
wT x this is exactly what we want
• We will now see this algorithm in action for a toy dataset

39
x2 • We initialized w to a random value
• We observe that currently, w · x < 0 (∵ angle >
p2 90°) for all the positive points and w · x ≥ 0 (∵
p1 angle < 90°) for all the negative points (the situ-
p3
ation is exactly oppsite of what we actually want
it to be)
n1
x1 • We now run the algorithm by randomly going over
the points
• The algorithm has converged

n2 n3

40
Module 2.6: Proof of Convergence

41
• Now that we have some faith and intuition about why the algorithm works, we will see
a more formal proof of convergence ...

42
Theorem
Definition: Two sets P and N of points in an n-dimensional space are called absolutely
linearly separable if n + 1 real numbers w0 , w1 , ..., wn exist such that every point
(x1 , x2 , ..., xn ) ∈ P satisfies ni=1 wi ∗ xi > w0 and every point (x1 , x2 , ..., xn ) ∈ N
P

satisfies ni=1 wi ∗ xi < w0 .

Proposition: If the sets P and N are finite and linearly separable, the perceptron learning
algorithm updates the weight vector wt a finite number of times. In other words: if the
vectors in P and N are tested cyclically one after the other, a weight vector wt is found
after a finite number of steps t which can separate the two sets.

Proof: On the next slide

43
Setup:
• If x ∈ N then -x ∈ P (∵ Algorithm: Perceptron Learning Algorithm
wT x < 0 =⇒ wT (−x) ≥ 0) P ← inputs with label 1;
N ← inputs with label 0;
• We can thus consider a single
N − contains negations of all points in N;
set P 0 = P ∪ N − and for every P 0 ← P ∪ N −;
element p ∈ P 0 ensure that Initialize w randomly;
wT p ≥ 0 while !convergence do
• Further we will normalize all the Pick random p ∈ P 0 ;
p
p ← ||p|| (so now,||p|| = 1) ;
p’s so that ||p|| = 1 (notice that
if w.p < 0 then
this does not affect the solu-
p w =w+p;
tion ∵ if wT ||p|| ≥ 0 then
T
end
w p ≥ 0)
end
• Let w∗ be the normalized solu- //the algorithm converges when all the inputs are
tion vector (we know one exists classified correctly
as the data is linearly separable) //notice that we do not need the other if condition
44
because by construction we want all points in P 0 to lie
Observations: Proof:
• w∗ is some optimal solution • Now suppose at time step t we inspected the
which exists but we don’t know point pi and found that wT · pi ≤ 0
what it is • We make a correction wt+1 = wt + pi
• We do not make a correction at • Let β be the angle between w∗ and wt+1
every time-step w∗ · wt+1
cosβ =
• We make a correction only if wT · ||wt+1 ||
pi ≤ 0 at that time step N umerator = w∗ · wt+1 = w∗ · (wt + pi )
• So at time-step t we would have = w∗ · wt + w∗ · pi
made only k (≤ t) corrections ≥ w ∗ · wt + δ (δ = min{w∗ · pi |∀i}
• Every time we make a correction ≥ w∗ · (wt−1 + pj ) + δ
a quantity δ gets added to the nu-
≥ w∗ · wt−1 + w∗ · pj + δ
merator
≥ w∗ · wt−1 + 2δ
• So by time-step t, a quantity kδ
≥ w∗ · w0 + (k)δ (By induction)
gets added to the numerator 45
Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

w∗ · wt+1
cosβ = (by definition)
||wt+1 ||
N umerator ≥ w∗ · w0 + kδ (proved by induction)
2 2
Denominator = ||wt+1 ||
= (wt + pi ) · (wt + pi )
= ||wt ||2 + 2wt · pi + ||pi ||2 )
≤ ||wt ||2 + ||pi ||2 (∵ wt · pi ≤ 0)
≤ ||wt ||2 + 1 (∵ ||pi ||2 = 1)
≤ (||wt−1 ||2 + 1) + 1
≤ ||wt−1 ||2 + 2
≤ ||w0 ||2 + (k) (By same observation that we made about δ)
46
Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

w∗ · wt+1
cosβ = (by definition)
||wt+1 ||
N umerator ≥ w∗ · w0 + kδ (proved by induction)
2 2
Denominator ≤ ||w0 || + k (By same observation that we made about δ)
w∗ · w0 + kδ
cosβ ≥ p
||w0 ||2 + k
√
• cosβ thus grows proportional to k
• As k (number of corrections) increases cosβ can become arbitrarily large
• But since cosβ ≤ 1, k must be bounded by a maximum number
• Thus, there can only be a finite number of corrections (k) to w and the algorithm will
converge!
47
Coming back to our questions ...
• What about non-boolean (say, real) inputs? Real valued inputs are allowed in per-
ceptron
• Do we always need to hand code the threshold? No, we can learn the threshold
• Are all inputs equal? What if we want to assign more weight (importance) to some
inputs? A perceptron allows weights to be assigned to inputs
• What about functions which are not linearly separable ? Not possible with a single
perceptron but we will see how to handle this ..

48
Module 2.7: Linearly Separable Boolean Functions

49
x1 x2 XOR x2
0 w0 + 2i=1 wi xi
P
0 0 <0
1 0 1 w0 + 2i=1 wi xi
P
≥0 (0, 1) (1, 1)
1 w0 + 2i=1 wi xi
P
0 1 ≥0
0 w0 + 2i=1 wi xi
P
1 1 <0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 (0, 0) (1, 0) x1

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
• And indeed you can see that it is
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
impossible to draw a line which
w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0 separates the red points from
the blue points
• The fourth condition contradicts conditions 2 and
3
• Hence we cannot have a solution to this set of in-
51
equalities
• Most real world data is not linearly separable and
will always contain some outliers
• In fact, sometimes there may not be any outliers
but still the data may not be linearly separable
• We need computational units (models) which can
deal with such data
• While a single perceptron cannot deal with such
data, we will show that a network of perceptrons
o can indeed deal with such data
o o o o o o oo
o + ++ o
o + + oo
o ++ + o
o
o + + o
o + + o
o + + oo
o +
oo + + + oo
oo o o oo
o
52
• Before seeing how a network of perceptrons can deal with linearly inseparable data, we
will discuss boolean functions in some more detail ...

53
• How many boolean functions can you design from 2 inputs ?
• Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16

0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

• Of these, how many are linearly separable ? (turns out all except XOR and !XOR - feel
free to verify)
n
• In general, how many boolean functions can you have for n inputs ? 22
n
• How many of these 22 functions are not linearly separable ? For the time being, it
suffices to know that at least some of these may not be linearly inseparable (I encourage
you to figure out the exact answer :-) )

54
Module 2.8: Representation Power of a Network of Perceptrons

55
• We will now see how to implement any boolean function using a network of per-
ceptrons ...

56
• For this discussion, we will assume True = +1
and False = -1
y
• We consider 2 inputs and 4 perceptrons
• Each input is connected to all the 4 per-
ceptrons with specific weights
w1 w2 w3 w4
• The bias (w0 ) of each perceptron is -2 (i.e.,
each perceptron will fire only if the weighted
sum of its input is ≥ 2)
bias =-2 • Each of these perceptrons is connected to an
x1 x2 output perceptron by weights (which need to
be learned)
red edge indicates w = -1
blue edge indicates w = +1 • The output of this perceptron (y) is the out-
put of this network

57
Terminology:
• This network contains 3 layers
y • The layer containing the inputs (x1 , x2 ) is
called the input layer
• The middle layer containing the 4 perceptrons
w1 w2 w3 w4 is called the hidden layer
h1 h2 h3 h4
• The final layer containing one output neuron
is called the output layer
bias =-2 • The outputs of the 4 perceptrons in the hid-
den layer are denoted by h1 , h2 , h3 , h4
x1 x2
• The red and blue edges are called layer 1
red edge indicates w = -1 weights
blue edge indicates w = +1
• w1 , w2 , w3 , w4 are called layer 2 weights

58
• We claim that this network can be used to im-
plement any boolean function (linearly separ-
y able or not) !
• In other words, we can find w1 , w2 , w3 , w4
such that the truth table of any boolean func-
w1 w2 w3 w4 tion can be represented by this network
h1 h2 h3 h4
• Astonishing claim! Well, not really, if you un-
-1,-1 -1,1 1,-1 1,1
derstand what is going on
bias =-2 • Each perceptron in the middle layer fires only
for a specific input (and no two perceptrons
x1 x2
fire for the same input)
red edge indicates w = -1
• Let us see why this network works by taking
blue edge indicates w = +1
an example of the XOR function

59
• Let w0 be the bias output of the neuron (i.e.,
it will fire if 4i=1 wi hi ≥ w0 )
P
y
P4
x1 x2 XOR h1 h2 h3 h4 i=1wi hi
0 0 0 1 0 0 0 w1
w1 w2 w3 w4 0 1 1 0 1 0 0 w2
h1 h2 h3 h4
1 0 1 0 0 1 0 w3
-1,-1 -1,1 1,-1 1,1 1 1 0 0 0 0 1 w4

bias =-2 • This results in the following four conditions to

implement XOR: w1 < w0 , w2 ≥ w0 , w3 ≥
x1 x2 w0 , w4 < w0

red edge indicates w = -1 • Unlike before, there are no contradictions now

blue edge indicates w = +1 and the system of inequalities can be satisfied
• Essentially each wi is now responsible for one of
the 4 possible inputs and can be adjusted to get
60
the desired output for that input
• It should be clear that the same network
can be used to represent the remaining 15
y boolean functions also
• Each boolean function will result in a dif-
ferent set of non-contradicting inequalities
w1 w2 w3 w4 which can be satisfied by appropriately set-
h1 h2 h3 h4
ting w1 , w2 , w3 , w4
-1,-1 -1,1 1,-1 1,1
• Try it!
bias =-2

x1 x2

red edge indicates w = -1

blue edge indicates w = +1

61
• What if we have more than 3 inputs ?

62
• Again each of the 8 perceptorns will fire only for one of the 8 inputs
• Each of the 8 weights in the second layer is responsible for one of the 8 inputs and can
be adjusted to produce the desired output for that input
y

w1 w2 w3 w4 w5 w6 w7 w8

bias =-3

x1 x2 x3

63
• What if we have n inputs ?

64
Theorem
Any boolean function of n inputs can be represented exactly by a network of perceptrons
containing 1 hidden layer with 2n perceptrons and one output layer containing 1
perceptron

Proof (informal:) We just saw how to construct such a network

Note: A network of 2n + 1 perceptrons is not necessary but sufficient. For example, we

already saw how to represent AND function with just 1 perceptron

Catch: As n increases the number of perceptrons in the hidden layers obviously increases
exponentially

65
• Again, why do we care about boolean functions ?
• How does this help us with our original problem: which was to predict whether we like
a movie or not? Let us see!

66
The story so far ...
• Networks of the form that we just saw (containing, an input, output and one or more
hidden layers) are called Multilayer Perceptrons (MLP, in short)
• More appropriate terminology would be“Multilayered Network of Perceptrons” but
MLP is the more commonly used name
• The theorem that we just saw gives us the representation power of a MLP with a single
hidden layer
• Specifically, it tells us that a MLP with a single hidden layer can represent any boolean
function

Neural Network and Fuzzy Logic
50% (2)
Neural Network and Fuzzy Logic
54 pages
Learning XOR - Gradient Based Learning - Hidden Units
No ratings yet
Learning XOR - Gradient Based Learning - Hidden Units
43 pages
Unit I Deeplearning
No ratings yet
Unit I Deeplearning
13 pages
Unit 1 Deep Learning
No ratings yet
Unit 1 Deep Learning
20 pages
MAT6007 Session4 MP Neuron Perceptrons
No ratings yet
MAT6007 Session4 MP Neuron Perceptrons
15 pages
Module 2.1: Biological Neurons: Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
No ratings yet
Module 2.1: Biological Neurons: Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
68 pages
Dave Reed: Connectionist Approach To AI
No ratings yet
Dave Reed: Connectionist Approach To AI
26 pages
Chapter 5 Artificial Neural Networks
No ratings yet
Chapter 5 Artificial Neural Networks
50 pages
Neural Networks
No ratings yet
Neural Networks
54 pages
McCulloch-Pitts Neuron
No ratings yet
McCulloch-Pitts Neuron
14 pages
03 NeuralNetworksI PDF
100% (1)
03 NeuralNetworksI PDF
78 pages
Motivation & Emotions
100% (1)
Motivation & Emotions
29 pages
Part7.2 Artificial Neural Networks
No ratings yet
Part7.2 Artificial Neural Networks
51 pages
The Dialectics of Ecology Socialism and Nature (John Bellamy Foster) (Z-Library)
No ratings yet
The Dialectics of Ecology Socialism and Nature (John Bellamy Foster) (Z-Library)
361 pages
DL Unit I & Unit II
No ratings yet
DL Unit I & Unit II
156 pages
Unit-7 ANN
No ratings yet
Unit-7 ANN
211 pages
Lecture 2
No ratings yet
Lecture 2
69 pages
CHP 9
No ratings yet
CHP 9
29 pages
NN Part1
No ratings yet
NN Part1
43 pages
FALLSEM2023-24 CSE4020 ETH VL2023240103694 2023-09-01 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE4020 ETH VL2023240103694 2023-09-01 Reference-Material-I
35 pages
NNDL
No ratings yet
NNDL
96 pages
03 NeuralNetworksI
No ratings yet
03 NeuralNetworksI
93 pages
Neural Networks
No ratings yet
Neural Networks
42 pages
Introduction DL
No ratings yet
Introduction DL
53 pages
4-Early Neural Network Architectures (MADALINE Network), and Application Domains.-16!12!2024
No ratings yet
4-Early Neural Network Architectures (MADALINE Network), and Application Domains.-16!12!2024
136 pages
Machine Learning
No ratings yet
Machine Learning
77 pages
02 Neural Network
No ratings yet
02 Neural Network
28 pages
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
No ratings yet
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
59 pages
20.NeuralNets Short
No ratings yet
20.NeuralNets Short
60 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
81 pages
Module 1
No ratings yet
Module 1
100 pages
Artificial Neural Network: Lecture Module 22
No ratings yet
Artificial Neural Network: Lecture Module 22
54 pages
Deep Leaning
No ratings yet
Deep Leaning
117 pages
Artificial Neu: 27. Artificial Neural Network Models
No ratings yet
Artificial Neu: 27. Artificial Neural Network Models
18 pages
A Beginners Guide To Using Open Access Data, 1st Edition Updated Edition Download
100% (9)
A Beginners Guide To Using Open Access Data, 1st Edition Updated Edition Download
16 pages
Preceptron
No ratings yet
Preceptron
17 pages
Unit 7 Neural Networks
No ratings yet
Unit 7 Neural Networks
92 pages
CFBC 718 e 2 C
No ratings yet
CFBC 718 e 2 C
30 pages
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
No ratings yet
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
65 pages
Lecture 10 Neural Network
No ratings yet
Lecture 10 Neural Network
34 pages
ML Module 5
No ratings yet
ML Module 5
14 pages
Module 4
No ratings yet
Module 4
55 pages
Module 5
No ratings yet
Module 5
91 pages
Neural Network
No ratings yet
Neural Network
82 pages
DL PDF 1
No ratings yet
DL PDF 1
69 pages
Lec 11
No ratings yet
Lec 11
11 pages
Lesson 7.0 Supervised Learning With Neural Networks
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks
22 pages
Module 5
No ratings yet
Module 5
27 pages
Basics of Neural Networks
No ratings yet
Basics of Neural Networks
17 pages
ML Unit 3 Study Material-1
No ratings yet
ML Unit 3 Study Material-1
32 pages
Deep Learning-Material For The Units 1,2,3
No ratings yet
Deep Learning-Material For The Units 1,2,3
36 pages
CH 12 - Artificial Neural Networks
No ratings yet
CH 12 - Artificial Neural Networks
39 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
CV 2025 Spring 14
No ratings yet
CV 2025 Spring 14
33 pages
ML Lec-21
No ratings yet
ML Lec-21
18 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
Deep Learning Quick Note
No ratings yet
Deep Learning Quick Note
17 pages
Softcomputing Assignment 1
No ratings yet
Softcomputing Assignment 1
7 pages
Artificial Neural Networks - 1
No ratings yet
Artificial Neural Networks - 1
9 pages
MP Neuron Perceptrons
No ratings yet
MP Neuron Perceptrons
11 pages
Cell Biology Practical
No ratings yet
Cell Biology Practical
14 pages
The Sixth Extinction An Unnatural Histor
0% (1)
The Sixth Extinction An Unnatural Histor
1 page
Chowdhury 2015
No ratings yet
Chowdhury 2015
4 pages
Human Osteology: Second Edition
No ratings yet
Human Osteology: Second Edition
14 pages
Examples of Essay Outline
100% (2)
Examples of Essay Outline
4 pages
Levels of Organization With QUIZ
No ratings yet
Levels of Organization With QUIZ
55 pages
Mbeya (Science) - Mock F6 2022
No ratings yet
Mbeya (Science) - Mock F6 2022
45 pages
Of Fate and Fire A Kingmaker Chronicles Novella Book 35 Amanda Bouchet Instructor Test Bank
No ratings yet
Of Fate and Fire A Kingmaker Chronicles Novella Book 35 Amanda Bouchet Instructor Test Bank
344 pages
Whale Strandings IELTS Reading Answers With Explanation: Dol Ielts Đình L C
50% (2)
Whale Strandings IELTS Reading Answers With Explanation: Dol Ielts Đình L C
6 pages
New Excretory Sytem
No ratings yet
New Excretory Sytem
14 pages
Schistocerca Americana - Wikipedia
No ratings yet
Schistocerca Americana - Wikipedia
6 pages
QUESTION (Simple, Complex, Compound)
No ratings yet
QUESTION (Simple, Complex, Compound)
4 pages
Leadership Training: 6 Part Series
No ratings yet
Leadership Training: 6 Part Series
3 pages
FRY, Tony - Design in Crisis
No ratings yet
FRY, Tony - Design in Crisis
49 pages
The Heart Worksheet Support
No ratings yet
The Heart Worksheet Support
2 pages
Decision Neuroscience An Integrative Perspective 1st Edition by Jean Claude Dreher, LÃ©on Tremblay ISBN 0128053313 9780128053317 Instant Download
100% (6)
Decision Neuroscience An Integrative Perspective 1st Edition by Jean Claude Dreher, LÃ©on Tremblay ISBN 0128053313 9780128053317 Instant Download
52 pages
The Parts of A Dicot Plant For SBI 3U - Revised Oct 2023
No ratings yet
The Parts of A Dicot Plant For SBI 3U - Revised Oct 2023
3 pages
Cryopreservation
No ratings yet
Cryopreservation
22 pages
00 DNA Reviewer Ragamuffin 2013
No ratings yet
00 DNA Reviewer Ragamuffin 2013
14 pages
Basic Research in Evolution and Ecology Enhances Forensics Tomberlin 2006
No ratings yet
Basic Research in Evolution and Ecology Enhances Forensics Tomberlin 2006
3 pages
Aerobic Respiration: Chemiosmosis and Electron Transport Chain
No ratings yet
Aerobic Respiration: Chemiosmosis and Electron Transport Chain
17 pages
Protocol For Generating A Mouse Model of Gastric MALT Lymphoma and The Identification of MALT Lymphoma Cell Populations by Immunostaining
No ratings yet
Protocol For Generating A Mouse Model of Gastric MALT Lymphoma and The Identification of MALT Lymphoma Cell Populations by Immunostaining
14 pages
BBA - Molecular Basis of Disease: Review
No ratings yet
BBA - Molecular Basis of Disease: Review
15 pages
NationalFinals IVB Report 2025 Phase1 Student UpdWFP
No ratings yet
NationalFinals IVB Report 2025 Phase1 Student UpdWFP
3 pages
Todd Jones Curriculum Vitae
No ratings yet
Todd Jones Curriculum Vitae
8 pages
Nanotechnology in Medicine and Healthcare Possibil
No ratings yet
Nanotechnology in Medicine and Healthcare Possibil
5 pages
Developmental Genetics
No ratings yet
Developmental Genetics
13 pages

Nitin Sir Notes

Uploaded by

Nitin Sir Notes

Uploaded by

Module 2.

A McCulloch Pitts unit AND function OR function

y ∈ {0, 1} y ∈ {0, 1} y ∈ {0, 1}

x1 AND !x2 ∗ NOR function NOT function

AND function Tautology (always ON)

(0, 1, 1) (1, 1, 1)x1 + x2 + x3 = θ = 1

• Why are we trying to implement boolean functions?

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 (0, 0) (1, 0) x1

x0 = 1 x1 x2 .. .. xn • Further, suppose we represent each movie with n

satisfies ni=1 wi ∗ xi < w0 .

Proof: On the next slide

So far we have, wT · pi ≤ 0 (and hence we made the correction)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 (0, 0) (1, 0) x1

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16

bias =-2 • This results in the following four conditions to

red edge indicates w = -1 • Unlike before, there are no contradictions now

red edge indicates w = -1

Proof (informal:) We just saw how to construct such a network

Note: A network of 2n + 1 perceptrons is not necessary but sufficient. For example, we

You might also like