0% found this document useful (0 votes)
106 views43 pages

Learning XOR - Gradient Based Learning - Hidden Units

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views43 pages

Learning XOR - Gradient Based Learning - Hidden Units

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

1

Perceptrons 𝑏
𝑤

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑥1 1
𝑇 Output:
𝑤2
𝑥2

𝑤𝐷
𝑥𝐷

[]
𝑥1
• A perceptron is a function that maps
𝑥2
D-dimensional vectors to real numbers. 𝐱 = 𝑥3
• For notational convenience, we add an

extra input, called the bias input. 𝑥𝐷
The bias input is always equal to 1.
• is called the bias weight. It is optimized during training.
• are also weights that are optimized during training. 1
1
Perceptrons 𝑏
𝑤

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑥1 1
𝑇 Output:
𝑤2
𝑥2

𝑤𝐷
𝑥𝐷
• A perceptron computes its output in two steps:
Step 1:

Step 2:

• is called an activation function.


• For example, could be the sigmoid function
2
1
Perceptrons 𝑏
𝑤

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑥1 1
𝑇 Output:
𝑤2
𝑥2

𝑤𝐷
𝑥𝐷
• A perceptron computes its output in two steps:
Step 1:

Step 2:
• In a single formula:

3
Notation for 1 𝑤
0
𝑤
Bias Weight
𝑧=h ( 𝒘 𝒙 )
𝑥1 1
𝑇 Output:
𝑤2
𝑥2

𝑤𝐷
𝑥𝐷

• There is an alternative representation that we will not use, where


is denoted as , and weight vector .
• Then, instead of writing we can simply write
• In our slides, we will denote the bias weight as and treat it
separately from the other weights. That will make life easier later.
4
Perceptrons 1 𝑏
and 𝑤

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑥1 1
𝑇 Output:
Neurons 𝑥 𝑤 2
2

𝑤𝐷
𝑥𝐷

• Perceptrons are inspired by neurons.


– Neurons are the cells forming the nervous system, and the brain.
– Neurons somehow sum up their inputs, and if the sum exceeds a
threshold, they "fire".
• Since brains are "intelligent", computer scientists have been
hoping that perceptron-based systems can be used to model
intelligence. 5
Activation
Functions
• A perceptron produces
output .
• One choice for the
activation function :
the step function.

• The step function is useful for providing some intuitive examples.


• It is not useful for actual real-world systems.
– Reason: it is not differentiable, it does not allow optimization via gradient
descent.

6
Activation
Functions
• A perceptron produces
output .
• Another choice for the
activation function :
the sigmoidal function.

• The sigmoidal is often used in real-world systems.


• It is a differentiable function, it allows use of gradient descent.

7
Example: The AND Perceptron
• Suppose we use the step function for activation.
• Suppose boolean value false is represented as number 0.
• Suppose boolean value true is represented as number 1.
• Then, the perceptron below computes the boolean AND function:

false AND false = false


1 false AND true = false
𝑏= true AND false = false
− 1. true AND true = true
5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 ) 𝑇 Output:


𝑤 2 =1
𝑥2 8
Example: The AND Perceptron
• Verification: If and :

–.
• Corresponds to case false AND false = false.
false AND false = false
1 false AND true = false
𝑏= true AND false = false
− 1. true AND true = true
5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇 Output:
𝑤 2 =1
𝑥2 9
Example: The AND Perceptron
• Verification: If and :

–.
• Corresponds to case false AND true = false.
false AND false = false
1 false AND true = false
𝑏= true AND false = false
− 1. true AND true = true
5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇 Output:
𝑤 2 =1
𝑥2 10
Example: The AND Perceptron
• Verification: If and :

–.
• Corresponds to case true AND false = false.
false AND false = false
1 false AND true = false
𝑏= true AND false = false
− 1. true AND true = true
5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇 Output:
𝑤 2 =1
𝑥2 11
Example: The AND Perceptron
• Verification: If and :

–.
• Corresponds to case true AND true = true.
false AND false = false
1 false AND true = false
𝑏= true AND false = false
− 1. true AND true = true
5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇 Output:
𝑤 2 =1
𝑥2 12
Example: The OR Perceptron
• Suppose we use the step function for activation.
• Suppose boolean value false is represented as number 0.
• Suppose boolean value true is represented as number 1.
• Then, the perceptron below computes the boolean OR function:

false OR false = false


1 false OR true = true
𝑏= true OR false = true
−0 true OR true = true
.5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 ) 𝑇 Output:


𝑤 2 =1
𝑥2 13
Example: The OR Perceptron
• Verification: If and :

–.
• Corresponds to case false OR false = false.
false OR false = false
1 false OR true = true
𝑏= true OR false = true
−0 true OR true = true
.5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇 Output:
𝑤 2 =1
𝑥2 14
Example: The OR Perceptron
• Verification: If and :

–.
• Corresponds to case false OR true = true.
false OR false = false
1 false OR true = true
𝑏= true OR false = true
−0 true OR true = true
.5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇 Output:
𝑤 2 =1
𝑥2 15
Example: The OR Perceptron
• Verification: If and :

–.
• Corresponds to case true OR false = true.
false OR false = false
1 false OR true = true
𝑏= true OR false = true
−0 true OR true = true
.5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇 Output:
𝑤 2 =1
𝑥2 16
Example: The OR Perceptron
• Verification: If and :

–.
• Corresponds to case true OR true = true.
false OR false = false
1 false OR true = true
𝑏= true OR false = true
−0 true OR true = true
.5
𝑤1 =1
𝑥1

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇 Output:
𝑤 2 =1
𝑥2 17
Example: The NOT Perceptron
• Suppose we use the step function for activation.
• Suppose boolean value false is represented as number 0.
• Suppose boolean value true is represented as number 1.
• Then, the perceptron below computes the boolean NOT function:

NOT(false) = true
1
𝑏= NOT(true) = false
0.5

𝑧=h ( 𝑏+𝒘 𝒙 ) 𝑇
𝑤1 =−1 Output:
𝑥1
18
Example: The NOT Perceptron
• Verification: If :

–.
• Corresponds to case NOT(false) = true.

NOT(false) = true
1
𝑏= NOT(true) = false
0.5

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇
𝑤1 =−1 Output:
𝑥1
19
Example: The NOT Perceptron
• Verification: If :

–.
• Corresponds to case NOT(true) = false.

NOT(false) = true
1
𝑏= NOT(true) = false
0.5

𝑧=h ( 𝑏+𝒘 𝒙 )
𝑇
𝑤1 =−1 Output:
𝑥1
20
The XOR
Function
false XOR false = false
false XOR true = true
true XOR false = true
true XOR true = false

• As before, we represent
false with 0 and true with 1.
• The figure shows the four input points of the XOR function.
– red corresponds to output value true.
– green corresponds to output value false.
• The two classes (true and false) are not linearly separable.
• Therefore, no perceptron can compute the XOR function.
21
Our First Neural Network: XOR
• A neural network is built using perceptrons as building blocks.
• The inputs to some perceptrons are outputs of other perceptrons.
• Here is an example neural network computing the XOR function.

Input unit, 𝑏 2, 1 =− 0.5 𝑏


outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 22
Our First Neural Network: XOR
• Terminology: inputs and perceptrons are all called “units”.
• Units are grouped in layers: layer 1 (input), layer 2, layer 3 (output).
• The input layer just represents the inputs to the network.
– There are two inputs: and .

Input unit, 𝑏 2, 1 =− 0.5 𝑏


outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 23
Our First Neural Network: XOR
• Such networks are called layered networks, more details later.
• Each unit is indexed by two numbers (layer index, unit index).
• Each bias weight is indexed by the same two numbers as its unit.
• Each weight is indexed by three numbers (layer, unit, weight).
Input unit, 𝑏 2, 1 =− 0.5 𝑏
outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 24
Our First Neural Network: XOR
• Note: every weight is associated with two units: it connects the
output of a unit with an input of another unit.
– Which of the two units do we use to index the weight?

Input unit, 𝑏 2, 1 =− 0.5 𝑏


outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 25
Our First Neural Network: XOR
• To index a weight , we use the layer number and unit number of
the unit for which is an incoming weight.
• Weights incoming to unit are indexed as , where ranges from 1 to
the number of incoming weights for unit .
Input unit, 𝑏 2, 1 =− 0.5 𝑏
outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 26
Our First Neural Network: XOR
• Weights incoming to unit are indexed as , where ranges from 1 to
the number of incoming weights for unit .
• Since the input layer (which is layer 1) has no incoming weights,
there are no weights indexed as .
Input unit, 𝑏 2, 1 =− 0.5 𝑏
outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 27
Our First Neural Network: XOR
• The XOR network shows how individual perceptrons can be
combined to perform more complicated functions.

Logical OR

Input unit, 𝑏 2, 1 =− 0.5 𝑏 Logical


1=
outputs 3,
= 1 Unit −0.5
(A AND (NOT B)
𝑤 2 , 1 ,1 2,1 𝑤
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 Logical AND 3,1
𝑤 − 1. 5
= − 1
,1 =
2, 2
Unit 1
1,2 𝑤2 𝑤 3 ,1 ,2
, 2 ,2 = Unit
1
Input unit, 2,2
outputs 28
Computing the Output: An Example
• Suppose that (corresponding to false XOR true).
• For Unit 2,1, which performs a logical OR:
– The output is .
– Assuming that is the step function, , so Unit 2,1 outputs 1.
Input unit, − 0.5 𝑏
outputs Unit
1
3,
1 =−
2,1 𝑤 0.5
0∗ 1 (OR) 3,
1, =1
Unit 1 Unit
1,1 ∗ 1𝑏 3,1 Output:
1 2, =
2
𝑤 −1 (A AND
.5 = − 1 (NOT B))
,1 =
2, 2
Unit 1
1,2 𝑤2 Unit 𝑤 3 ,1 ,2
, 2 ,2 =1 2,2
Input unit,
(AND)
outputs 29
Computing the Output: An Example
• Suppose that (corresponding to false XOR true).
• For Unit 2,2, which performs a logical AND:
– The output is .
– Since is the step function, , so Unit 2,2 outputs 0.
Input unit, − 0.5 𝑏
outputs Unit
1
3,
1 =−
2,1 𝑤 0.5
0∗ 1 (OR) 3,
1, =1
Unit 1 Unit
1,1 ∗ 1 3,1 Output:
1 −1
. (A AND
0∗ 5 = − 1 (NOT B))
Unit 1 𝑤 3 ,1 ,2
Unit
1,2 1 ∗1 2,2
0
Input unit,
(AND)
outputs 30
Computing the Output: An Example
• Suppose that 1 (corresponding to false XOR true).
• Unit 3,1 is the output unit, computing the A AND (NOT B) function:
– One input is the output of the OR unit, which is 1.
– The other input is the output of the AND unit, which is 0.
Input unit, − 0.5 𝑏
outputs Unit
1
3,
1 =−
2,1 𝑤 0.5
0∗ 1 (OR) 3,
1, =1
Unit 1 Unit
1,1 ∗ 1 3,1 Output:
1 −1
. (A AND
0∗ 5 = − 1 (NOT B))
Unit 1 𝑤 3 ,1 ,2
Unit
1,2 1 ∗1 2,2
0
Input unit,
(AND)
outputs 31
Computing the Output: An Example
• Suppose that 1 (corresponding to false XOR true).
• For the output unit (computing the A AND (NOT B) function):
– The output is .
– Since is the step function, , so Unit 3,1 outputs 1.
Input unit, − 0.5
outputs Unit −0
2,1 .5
0∗ 1 (OR) 1∗
1
Unit Unit
1,1 ∗ 1 3,1 Output:
1 −1
. (A AND 1
0∗ 5 − 1 ) (NOT B))
Unit 1 0∗ (
Unit
1,2 1 ∗1 2,2
Input unit,
(AND)
outputs 32
Verifying the XOR Network
• We can follow the same process to compute the output of this
network for the other three cases.
– Here we consider the case where (corresponding to false XOR false).
– The output is 0, as it should be.

Input unit, − 0.5


outputs Unit −0
2,1 .5
0∗ 1 (OR) 0∗
1
Unit Unit
1 Output:
1,1
0∗ − 3,1
1. 5 (A AND 0
0∗ − 1 ) (NOT B))
Unit 1 0∗ (
Unit
1,2 0∗1 2,2
Input unit,
(AND)
outputs 33
Verifying the XOR Network
• We can follow the same process to compute the output of this
network for the other three cases.
– Here we consider the case where (corresponding to true XOR false).
– The output is 1, as it should be.

Input unit, − 0.5


outputs Unit −0
2,1 .5
1∗ 1 (OR) 1∗
1
Unit Unit
1 Output:
1,1
0∗ − 3,1
1. 5 (A AND 1
1∗ − 1 ) (NOT B))
Unit 1 0∗ (
Unit
1,2 0∗1 2,2
Input unit,
(AND)
outputs 34
Verifying the XOR Network
• We can follow the same process to compute the output of this
network for the other three cases.
– Here we consider the case where (corresponding to true XOR true).
– The output is 0, as it should be.

Input unit, − 0.5


outputs Unit −0
2,1 .5
1∗ 1 (OR) 1∗
1
Unit Unit
1,1 ∗ 1 3,1 Output:
1 −1
. (A AND 0
1∗ 5 − 1 ) (NOT B))
Unit 1 1∗ (
Unit
1,2 1 ∗1 2,2
Input unit,
(AND)
outputs 35
Neural Networks
• Our XOR neural network consists of five units:
– Two input units, that just represent the two inputs to the network.
– Three perceptrons.

Input unit, 𝑏 2, 1 =− 0.5 𝑏


outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 36
Neural Network Layers
• Oftentimes, as in the XOR example, neural networks are organized into layers.
• The input layer is the initial layer of input units (units 1,1 and 1,2 in our
example).
• The output layer is at the end (unit 3,1 in our example).
• Zero, one or more hidden layers can be between the input and output layers.

Input unit, 𝑏 2, 1 =− 0.5 𝑏


outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 37
Neural Network Layers
• There is only one hidden layer in our example, containing units 2,1 and 2,2.
• Each hidden layer's inputs are outputs from the previous layer.
• Each hidden layer's outputs are inputs to the next layer.
• The first hidden layer's inputs come from the input layer.
• The last hidden layer's outputs are inputs to the output layer.
Input unit, 𝑏 2, 1 =− 0.5 𝑏
outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 38
Feedforward Networks
• Feedforward networks are networks where there are no directed loops.
• If there are no loops, the output of a unit cannot (directly or indirectly)
influence its input.
• While there are varieties of neural networks that are not feedforward or
layered, our main focus will be layered feedforward networks.

Input unit, 𝑏 2, 1 =− 0.5 𝑏


outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 39
Computing the Output
• Notation: is the number of layers. Layer 1 is the input layer, layer is the output
layer.
• The outputs of the units of layer 1 are simply the inputs to the network.
• For :
– Compute the outputs of layer , given the outputs of layer .

Input unit, 𝑏 2, 1 =− 0.5 𝑏


outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 40
Computing the Output
• To compute the outputs of layer (where ), we simply need to
compute the output of each perceptron belonging to layer .
– For each such perceptron, its inputs are coming from outputs of units at
layer , which we have already computed.
– Remember, we compute layer outputs in increasing order of .
Input unit, 𝑏 2, 1 =− 0.5 𝑏
outputs 3,
1 =−
= 1 Unit
𝑤 0.5
𝑤 2 , 1 ,1 2,1
= 1 3,
1 ,1 =
Unit 1 ,2 1
1,1 𝑤𝑏 2 ,
Unit Output:
2, =
2 3,1
𝑤 −1
Unit
2, 2
,1 =
.5 = −1
𝑤2 1 𝑤 3 ,1 ,2
1,2 , 2 ,2 = Unit
1
Input unit, 2,2
outputs 41
What Neural Networks Can Compute
• An individual perceptron is a linear classifier.
– The weights of the perceptron define a linear boundary between two classes.
• Layered feedforward neural networks with one hidden layer can
compute any continuous function.
• Layered feedforward neural networks with two hidden layers can
compute any mathematical function.
• This has been known for decades, and is one reason scientists have
been optimistic about the potential of neural networks to model
intelligent systems.
• Another reason is the analogy between neural networks and
biological brains, which have been a standard of intelligence we are
still trying to achieve.
• There is only one catch: How do we find the right weights?
42
Finding the Right Weights
• The goal of training a neural network is to figure out
good values for the weights of the units in the
network.

43

You might also like