AI17-Neural Networks
AI17-Neural Networks
Neural Networks
• Analogy to biological neural systems
• Attempt to understand natural biological systems through computational modeling
• Intelligent behavior as an “emergent” property of large number of simple units rather
than from explicitly encoded symbolic rules and algorithms
• A neural network is just a collection of units connected together; the properties of the
network are determined by its topology and the properties of the “neurons”
• Researchers in AI and statistics became interested in the more abstract properties of
neural networks, such as their ability to perform distributed computation, to tolerate
noisy inputs, and to learn
• Hence they aimed to create artificial neural networks. (Other names include
connectionism, parallel distributed processing, and neural computation.)
Real Neuron
• defines a hyperplane in the input space, so the perceptron returns 1 if and only if the input is
on one side of that hyperplane
• depending on the type of activation function used, the training processes will be
either the perceptron learning rule or gradient descent rule for the logistic
regression
• linearly separable functions constitute just a small fraction of all Boolean
functions
• Each cycle through the examples is called an epoch.
• Epochs are repeated until some stopping criterion is reached-typically, that the
weight changes have become very small.
Re-visiting weight updates
• perceptron learning rule
• the weight-update rule for the weights between the inputs and the hidden layer
is essentially identical to the update rule for the output layer:
• The back-propagation process can be summarized as follows:
• The gradient of loss with respect to weights connecting the hidden layer to the
output layer will be zero except for weights wj,k that connect to the kth output
unit. For those weights, we have
• To obtain the gradient with respect to the wi,j weights connecting the input layer
to the hidden layer, we have to expand out the activations aj and reapply the
chain rule.
Learning neural network structures
• How to find the best network structure?
• neural networks are subject to overfitting when there are too many parameters in the model
• Fully connected networks, the only choices to be made concern the number of hidden
layers and their sizes
• try several and keep the best
• cross-validation techniques are needed
• not fully connected, then need to find some effective search method through the very
large space of possible connection topologies
• optimal brain damage algorithm begins with a fully connected network and removes connections
from it
• After the network is trained for the first time, an information-theoretic approach identifies an
optimal selection of connections that can be dropped
• The network is then retrained, and if its performance has not decreased then the process is repeated
• It is also possible to remove units that are not contributing much to the result
Learning neural network structures …
• Several algorithms have been proposed for growing a larger network from a
smaller one.
• Tiling algorithm
• The idea is to start with a single unit that does its best to produce the correct output on as
many of the training examples as possible.
• Subsequent units are added to take care of the examples that the first unit got wrong.
• The algorithm adds only as many units as are needed to cover all the examples.