0% found this document useful (0 votes)
59 views67 pages

Learning Algorithms in AI Explained

The document discusses artificial intelligence and machine learning algorithms. It begins by defining a learning algorithm as one that predicts future data behavior based on past performance without being explicitly programmed with the data trends. It then provides examples of real-world applications of learning algorithms, including speech recognition, protein structure prediction, and autonomous vehicle navigation. The document goes on to explain several common machine learning algorithms in more detail, including Bayesian networks, hidden Markov models, genetic algorithms, and neural networks.

Uploaded by

asher
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views67 pages

Learning Algorithms in AI Explained

The document discusses artificial intelligence and machine learning algorithms. It begins by defining a learning algorithm as one that predicts future data behavior based on past performance without being explicitly programmed with the data trends. It then provides examples of real-world applications of learning algorithms, including speech recognition, protein structure prediction, and autonomous vehicle navigation. The document goes on to explain several common machine learning algorithms in more detail, including Bayesian networks, hidden Markov models, genetic algorithms, and neural networks.

Uploaded by

asher
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 67

Artificial Intelligence and

Learning Algorithms

Presented By Brian M. Frezza 12/1/05


Game Plan
• What’s a Learning Algorithm?
• Why should I care?
– Biological parallels
• Real World Examples
• Getting our hands dirty with the algorithms
– Bayesian Networks
– Hidden Markov Models
– Genetic Algorithms
– Neural Networks
• Artificial Neural Networks Vs Neuron Biology
– “Fraser’s Rules”
• Frontiers in AI
Hard
Math
What’s a Learning Algorithm?
• “An algorithm which predicts data’s future
behavior based on its past performance.”
– Programmer can be ignorant of the data’s
trends.
• Not rationally designed!
– Training Data
– Test Data
Why do I care?
• Use In Informatics
– Predict trends in “fuzzy” data
• Subtle patterns in data
• Complex patterns in data
• Noisy data
– Network inference
– Classification inference
• Analogies To Chemical Biology
– Evolution
– Immunological Response
– Neurology
• Fundamental Theories of Intelligence
– That’s heavy dude
Street Smarts
• CMU’s Navlab-5 (No Hands Across America)
– 1995 Neural Network Driven Car
– Pittsburgh to San Diego: 2,797 miles (98.2%)
– Single hidden layer backpropagation network!
• Subcellular location through fluorescence
– “A Neural network classifier capable of recognizing the patterns of all major subcellular
structures in fluorescence microscope images of HeLa cells” M. V. Boland, and R. F.
Murphy, Bioinformatics (2001) 17(12), 1213-1223
• Protein secondary structure prediction
• Intron/Exon predictions
• Protein/Gene network inference
• Speech recognition
• Face recognition
The Algorithms

• Bayesian Networks
• Hidden Markov Models
• Genetic Algorithms
• Neural Networks
Bayesian Networks: Basics
• Requires models of how data behaves
– Set of Hypothesis: {H}
• Keeps track of likelihood of each model
being accurate as data becomes available
– P(H)
• Predicts as a weighted average
– P(E) = Sum( P(H)*H(E) )
Bayesian Network Example
• What color hair will Paul Schaffer’s
kids have if he marries Redhead?
– Hypothesis
• Ha(rr) rr x rr: 100% Redhead
• Hb(Rr) rr x Rr: 50% Redhead 50% Not
• Hc(RR) rr x RR: 100% Not
• Initially clueless:
– So P(Ha) = P(Hb) = P(Hc) = 1/3
Bayesian Network: Trace
History Hypothesis

Redhead 0 Ha: 100% Redhead


Hb: 50% Redhead 50% Not
Not 0 Hc: 100% Not
Likelihood's
P(Ha) P(Hb) P(Hc)
1/3 1/3 1/3

Prediction: Will their next kid be a Redhead?


= P(red|Ha)*P(Ha) + P(red|Hb)*P(Hb) + P(red|Hc)*P(Hc)
= (1)*(1/3) + (1/2)*(1/3) + (0)(1/3)
=(1/2)
Bayesian Network:Trace
History Hypothesis

Redhead 1 Ha: 100% Redhead


Hb: 50% Redhead 50% Not
Not 0 Hc: 100% Not
Likelihood's
P(Ha) P(Hb) P(Hc)
1/2 1/2 0

Prediction: Will their next kid be a Redhead?


= P(red|Ha)*P(Ha) + P(red|Hb)*P(Hb) + P(red|Hc)*P(Hc)
= (1)*(1/2) + (1/2)*(1/2) + (0)(1/3)
=(3/4)
Bayesian Network: Trace
History Hypothesis

Redhead 2 Ha: 100% Redhead


Hb: 50% Redhead 50% Not
Not 0 Hc: 100% Not
Likelihood's
P(Ha) P(Hb) P(Hc)
3/4 1/4 0

Prediction: Will their next kid be a Redhead?


= P(red|Ha)*P(Ha) + P(red|Hb)*P(Hb) + P(red|Hc)*P(Hc)
= (1)*(3/4) + (1/2)*(1/4) + (0)(1/3)
=(7/8)
Bayesian Network: Trace
History Hypothesis

Redhead 3 Ha: 100% Redhead


Hb: 50% Redhead 50% Not
Not 0 Hc: 100% Not
Likelihood's
P(Ha) P(Hb) P(Hc)
7/8 1/8 0

Prediction: Will their next kid be a Redhead?


= P(red|Ha)*P(Ha) + P(red|Hb)*P(Hb) + P(red|Hc)*P(Hc)
= (1)*(7/8) + (1/2)*(1/8) + (0)(1/3)
=(15/16)
Bayesian Networks Notes
• Never reject hypothesis unless directly
disproved
• Learns based on rational models of
behavior
– Models can be extracted!
• Programmer needs to form hypothesis
beforehand.
The Algorithms

• Bayesian Networks
• Hidden Markov Models
• Genetic Algorithms
• Neural Networks
Hidden Markov Models(HMM)
• Discrete learning algorithm
– Programmer must be able to categorize predictions
• HMMs also assume a model of the world
working behind the data
• Models are also extractable
• Common Uses
– Speech Recognition
– Secondary structure prediction
– Intron/Exon predictions
– Categorization of data
Hidden Markov Models: Take a
Step Back
• 1st order Markov Models:
– Q{States}
– Pr{Transition}
– Sum of all P(T) out of state = 1

P3
Q2
P1 1
P2
Q1 Q3
1-P1-P2
1-P3
P4 Q4

1-P4
1 order Markov Model Setup
st

• Pick Initial state: Q1


• Pick Transition Probabilities:
P1 P2 P3 P4

0.6 0.2 0.9 0.4


P3
• For each time step P1
Q2
1
– Pick a random number 0.0-1.0 P2
Q1 Q3
1-P1-P2
1-P3
P4 Q4

1-P4
1 order Markov Model Trace
st

• Current State: Q1 Time Step = 1


• Transition probabilities:
P1 P2 P3 P4

0.6 0.2 0.9 0.4


P3
Q2
• Random Number: P1 1
– 0.22341 P2
Q1 Q3
• So Next State: 1-P1-P2
– 0.22341 < P1 1-P3
• Take P1 P4 Q4
– Q2
1-P4
1 order Markov Model Trace
st

• Current State: Q2 Time Step = 2


• Transition probabilities:
P1 P2 P3 P4

0.6 0.2 0.9 0.4


P3
Q2
• Random Number: P1 1
P2
– 0.64357 Q1 Q3
• So Next State: 1-P1-P2
1-P3
– No Choice, P = 1 P4 Q4
– Q3
1-P4
1 order Markov Model Trace
st

• Current State: Q3 Time Step = 3


• Transition probabilities:
P1 P2 P3 P4

0.6 0.2 0.9 0.4


P3
Q2
• Random Number: P1 1
– 0.97412 P2
Q1 Q3
• So Next State: 1-P1-P2
– 0.97412 > 0.9 1-P3
• Take 1-P3 P4 Q4
– Q4
1-P4
1 order Markov Model Trace
st

• Current State: Q4 Time Step = 4


• Transition probabilities:
P1 P2 P3 P4

0.6 0.2 0.9 0.4 P3


Q2
P1 1
• I’m going to stop here. P2
Q1 Q3
• Markov Chain: 1-P1-P2
1-P3
– Q1, Q2, Q3, Q4 P4 Q4

1-P4
What else can Markov do?
• Higher Order Models
– Kth order
• Metropolis-Hastings
– Determining thermodynamic equilibrium
• Continuous Markov Models
– Time step varies according to continuous
distribution
• Hidden Markov Models
– Discrete model learning
Hidden Markov Models (HMMs)
• A Markov Model drives the world but it is hidden
from direct observation and its status must be
inferred from a set of observables.
– Voice recognition
• Observable: Sound waves
• Hidden states: Words
– Intron/Exon prediction
• Observable: nucleotide sequence
• Hidden State: Exon, Intron, Non-coding
– Secondary structure prediction for protein
• Observable: Amino acid sequence
• Hidden State: Alpha helix, Beta Sheet, Unstructured
Hidden Markov Models: Example
• Secondary Structure Prediction

His Asp Arg Phe Ala Cis Ser Gln Glu Lys
Gly
Observable
Leu Met Asn Ser Tyr Thr Ile Trp Pro Val
States

Hidden
States
Alpha Unstructured
Beta
Helix Sheet
Hidden Markov Models: Smaller Example
• Exon/Intron Mapping
A P(A|It) P(T|Ex)
T P(T|It) P(G|Ex) G P(G|It) P(C|Ex) C
P(A|Ex) P(C|It)
P(T|Ig) P(G|Ig)
P(A|Ig) P(C|Ig)
Observable
States

Hidden
States

P(Ex|Ex) P(In|Ex) P(It|It)


P(Ig|It)

Intergenic
Exon P(Ex|Ig) P(Itr|Ig)
Intron
P(In|Ex)
P(Ig|Ig)
P(Ex|It)
Hidden Markov Models: Smaller Example
• Exon/Intron Mapping
Hidden State Transition Probabilities Observable State Probabilities

To Observable
A T G C

Hidden State
Ex Ig It
Ex 0.7 0.1 0.2 Ex 0.33 0.42 0.11 0.14
From

Ig 0.49 0.5 0.01 Ig 0.25 0.25 0.25 0.25


It 0.18 0.02 0.8 It 0.14 0.16 0.5 0.2

Starting Distribution

Ex Ig It
0.1 0.89 0.01
Hidden Markov Model
• How to predict outcomes from a HMM
• Brute force:
– Try every possible Markov chain
• Which chain has greatest probability of
generating observed data?
– Viterbi algorithm
• Dynamic programming approach
Viterbi Algorithm: Trace
Hidden State Transition Probabilities
Example Sequence: ATAATGGCGAGTG
To Exon Introgenic Intron
A 3.3*10-2 2.2*10-1 1.4*10-3
Ex Ig It
T
Ex 0.7 0.1 0.2
From

A
Ig 0.09 0.9 0.01
A
It 0.18 0.02 0.8
T
G

Observable State Probabilities G


C
Observable G
A T G C A
Hidden State

Ex 0.33 0.42 0.11 0.14 G


A
Ig 0.25 0.25 0.25 0.25
T
It 0.14 0.16 0.5 0.2
G

Starting Distribution
Exon = P(A|Ex) * Start Exon = 3.3*10-2
Ex Ig It
Introgenic = P(A|Ig) * Start Ig = 2.2*10-1
0.1 0.89 0.01
Intron = P(A|It) * Start It = 0.14 * 0.01 = 1.4*10-3
Viterbi Algorithm: Trace
Hidden State Transition Probabilities
Example Sequence: ATAATGGCGAGTG
Exon Introgenic Intron
To
A 3.3*10-2 2.2*10-1 1.4*10-3
Ex Ig It
T 4.6*10-2 2.8*10-2 1.1*10-3
Ex 0.7 0.1 0.2
From

A
Ig 0.49 0.5 0.01 A
It 0.18 0.02 0.8 T
G
G
Observable State Probabilities
C

Observable G

A T G C A
Hidden State

G
Ex 0.33 0.42 0.11 0.14
A
Ig 0.25 0.25 0.25 0.25
T
It 0.14 0.16 0.5 0.2 G

Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex)


Starting Distribution = 4.6*10-2
Ex Ig It Introgenic =Max( P(Ig|Ex)*P n-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig)
= 2.8*10-2
0.1 0.89 0.01 Intron = Max( P(It|Ex)*P n-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)
= 1.1*10-3
Viterbi Algorithm: Trace
Hidden State Transition Probabilities
Example Sequence: ATAATGGCGAGTG
Exon Introgenic Intron
To
A 3.3*10-2 2.2*10-1 1.4*10-3
Ex Ig It
T 4.6*10-2 2.8*10-2 1.1*10-3
Ex 0.7 0.1 0.2
From

A 1.1*10-2 3.5*10-3 1.3*10-3


Ig 0.49 0.5 0.01 A
It 0.18 0.02 0.8 T
G
G
Observable State Probabilities
C

Observable G

A T G C A
Hidden State

G
Ex 0.33 0.42 0.11 0.14
A
Ig 0.25 0.25 0.25 0.25
T
It 0.14 0.16 0.5 0.2 G

Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex)


Starting Distribution = 1.1*10-2
Ex Ig It Introgenic =Max( P(Ig|Ex)*P n-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig)
= 3.5*10-3
0.1 0.89 0.01 Intron = Max( P(It|Ex)*P n-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)
= 1.3*10-3
Viterbi Algorithm: Trace
Hidden State Transition Probabilities
Example Sequence: ATAATGGCGAGTG
Exon Introgenic Intron
To
A 3.3*10-2 2.2*10-1 1.4*10-3
Ex Ig It
T 4.6*10-2 2.8*10-2 1.1*10-3
Ex 0.7 0.1 0.2
From

A 1.1*10-2 3.5*10-3 1.3*10-3


Ig 0.49 0.5 0.01 A 2.4*10-3 4.3*10-4 2.9*10-4
It 0.18 0.02 0.8 T
G
G
Observable State Probabilities
C

Observable G

A T G C A
Hidden State

G
Ex 0.33 0.42 0.11 0.14
A
Ig 0.25 0.25 0.25 0.25
T
It 0.14 0.16 0.5 0.2 G

Starting Distribution
Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex)
Ex Ig It Introgenic =Max( P(Ig|Ex)*P n-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig)
0.1 0.89 0.01 Intron = Max( P(It|Ex)*P n-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)
Viterbi Algorithm: Trace
Hidden State Transition Probabilities
Example Sequence: ATAATGGCGAGTG
Exon Introgenic Intron
To
A 3.3*10-2 2.2*10-1 1.4*10-3
Ex Ig It
T 4.6*10-2 2.8*10-2 1.1*10-3
Ex 0.7 0.1 0.2
From

A 1.1*10-2 3.5*10-3 1.3*10-3


Ig 0.49 0.5 0.01 A 2.4*10-3 4.3*10-4 2.9*10-4
It 0.18 0.02 0.8 T 7.2*10-4 6.1*10-5 7.8*10-5
G
G
Observable State Probabilities
C

Observable G

A T G C A
Hidden State

G
Ex 0.33 0.42 0.11 0.14
A
Ig 0.25 0.25 0.25 0.25
T
It 0.14 0.16 0.5 0.2 G

Starting Distribution
Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex)
Ex Ig It Introgenic =Max( P(Ig|Ex)*P n-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig)
0.1 0.89 0.01 Intron = Max( P(It|Ex)*P n-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)
Viterbi Algorithm: Trace
Hidden State Transition Probabilities
Example Sequence: ATAATGGCGAGTG
Exon Introgenic Intron
To
A 3.3*10-2 2.2*10-1 1.4*10-3
Ex Ig It
T 4.6*10-2 2.8*10-2 1.1*10-3
Ex 0.7 0.1 0.2
From

A 1.1*10-2 3.5*10-3 1.3*10-3


Ig 0.49 0.5 0.01 A 2.4*10-3 4.3*10-4 2.9*10-4
It 0.18 0.02 0.8 T 7.2*10-4 6.1*10-5 7.8*10-5
G 5.5*10-5 1.8*10-5 7.2*10-5
G
Observable State Probabilities
C

Observable G

A T G C A
Hidden State

G
Ex 0.33 0.42 0.11 0.14
A
Ig 0.25 0.25 0.25 0.25
T
It 0.14 0.16 0.5 0.2 G

Starting Distribution
Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex)
Ex Ig It Introgenic =Max( P(Ig|Ex)*P n-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig)
0.1 0.89 0.01 Intron = Max( P(It|Ex)*P n-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)
Viterbi Algorithm: Trace
Hidden State Transition Probabilities
Example Sequence: ATAATGGCGAGTG
Exon Introgenic Intron
To
A 3.3*10-2 2.2*10-1 1.4*10-3
Ex Ig It
T 4.6*10-2 2.8*10-2 1.1*10-3
Ex 0.7 0.1 0.2
From

A 1.1*10-2 3.5*10-3 1.3*10-3


Ig 0.49 0.5 0.01 A 2.4*10-3 4.3*10-4 2.9*10-4
It 0.18 0.02 0.8 T 7.2*10-4 6.1*10-5 7.8*10-5
G 5.5*10-5 1.8*10-5 7.2*10-5
G 4.3*10-6 2.2*10-6 2.9*10-5
Observable State Probabilities
C

Observable G

A T G C A
Hidden State

G
Ex 0.33 0.42 0.11 0.14
A
Ig 0.25 0.25 0.25 0.25
T
It 0.14 0.16 0.5 0.2 G

Starting Distribution
Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex)
Ex Ig It Introgenic =Max( P(Ig|Ex)*P n-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig)
0.1 0.89 0.01 Intron = Max( P(It|Ex)*P n-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)
Viterbi Algorithm: Trace
Hidden State Transition Probabilities
Example Sequence: ATAATGGCGAGTG
Exon Introgenic Intron
To
A 3.3*10-2 2.2*10-1 1.4*10-3
Ex Ig It
T 4.6*10-2 2.8*10-2 1.1*10-3
Ex 0.7 0.1 0.2
From

A 1.1*10-2 3.5*10-3 1.3*10-3


Ig 0.49 0.5 0.01 A 2.4*10-3 4.3*10-4 2.9*10-4
It 0.18 0.02 0.8 T 7.2*10-4 6.1*10-5 7.8*10-5
G 5.5*10-5 1.8*10-5 7.2*10-5
G 4.3*10-6 2.2*10-6 2.9*10-5
Observable State Probabilities
C 7.2*10-7 2.8*10-7 4.6*10-6

Observable G 9.1*10-8 3.5*10-8 1.8*10-6

A T G C A 1.1*10-7 9.1*10-9 2.0*10-7


Hidden State

G 8.4*-9 2.7*10-9 8.2*10-8


Ex 0.33 0.42 0.11 0.14
A 4.9*-9 4.1*10-10 9.2*10-9
Ig 0.25 0.25 0.25 0.25
T 1.4*10-9 1.2*10-10 1.2*10-9
It 0.14 0.16 0.5 0.2 G 1.1*10-10 3.6*10-11 4.7*10-10

Starting Distribution
Exon = Max( P(Ex|Ex)*Pn-1(Ex), P(Ex|Ig)*Pn-1(Ig), P(Ex|It)*Pn-1(It) ) *P(T|Ex)
Ex Ig It Introgenic =Max( P(Ig|Ex)*P n-1(Ex), P(Ig|Ig)*Pn-1(Ig), P(Ig|It)*Pn-1(It) ) * P(T|Ig)
0.1 0.89 0.01 Intron = Max( P(It|Ex)*P n-1(Ex), P(It|Ig)*Pn-1(Ig), P(It,It)*Pn-1(It) ) * P(T|It)
Hidden Markov Models
• How to Train an HMM
– The forward-backward algorithm
• Ugly probability theory math:

CENSORED
• Starts with an initial guess of parameters
• Refines parameters by attempting to reduce the
errors it provokes with fitted to the data.
– Normalized probability of the “Forward probability” of
arriving at the state given the observable cross multiplied
by the backward probability of generating that
observable given the parameter.
The Algorithms

• Bayesian Networks
• Hidden Markov Models
• Genetic Algorithms
• Neural Networks
Genetic Algorithms
• Individuals are series of bits which
represent candidate solutions
– Functions
– Structures
– Images
– Code
• Based on Darwin evolution
– individuals mate, mutate, and are selected
based on a Fitness Function
Genetic Algorithms
• Encoding Rules
– “Gray” bit encoding
• Bit distance proportional to value distance
• Selection Rules
– Digital / Analog Threshold
– Linear Amplification Vs Weighted Amplification
• Mating Rules
– Mutation parameters
– Recombination parameters
Genetic Algorithms
• When are they useful?
– Movements in sequence space are funnel shaped
with fitness function
• Systems where evolution actually applies!
• Examples
– Medicinal chemistry
– Protein folding
– Amino acid substitutions
– Membrane trafficking modeling
– Ecological simulations
– Linear Programming
– Traveling salesman
The Algorithms

• Bayesian Networks
• Hidden Markov Models
• Genetic Algorithms
• Neural Networks
Neural Networks
• 1943 McCulloch and Pitts Model of how
Neurons process information
– Field immediately splits
• Studying brain’s
– Neurology
• Studying artificial intelligence
– Neural Networks
Neural Networks:
A Neuron, Node, or Unit

Wa,c

W0,c Activation Wc,n


(Bias) Σ(
a z
W)- W0,c
Function Output
(Bias)

Wb,c
Neural Networks:
Activation Functions
Sigmoid Function Threshold Function
(logistic function)

out out
+1 +1

In In

Zero point set by bias


Threshold Functions can make
Logic Gates with Neurons!
Wa,c = 1

W0,c = 1.5
Σ(W)- W0,c Output
Logical And (Bias) a z (Bias)

B
∩ 1 0 Wb,c = 1
A
1 1 0
0 0 0 If ( Σ(w) – Wo,c > 0 )
Then FIRE
Else
Don’t
And Gate: Trace

Off Wa,c = 1

W0,c = 1.5
(Bias) -1.5 -1.5 < 0 Off

Wb,c = 1
Off
And Gate: Trace

On Wa,c = 1

W0,c = 1.5
(Bias) -0.5 -0.5 < 0 Off

Wb,c = 1
Off
And Gate: Trace

Off Wa,c = 1

W0,c = 1.5
(Bias) -0.5 -0.5 < 0 Off

Wb,c = 1
On
And Gate: Trace

On Wa,c = 1

W0,c = 1.5
(Bias) 0.5 0.5 > 0 On

Wb,c = 1
On
Threshold Functions can make
Logic Gates with Neurons!
Wa,c = 1

W0,c = 0.5
(Bias)
Σ(W)- W0,c
Logical Or a z (Bias)
B
U 1 0 Wb,c = 1
A
1 1 1
0 1 0 If ( Σ(w) – Wo,c > 0 )
Then FIRE
Else
Don’t
Or Gate: Trace

Off Wa,c = 1

W0,c = 0.5
(Bias) -0.5 -0.5 < 0 Off

Wb,c = 1
Off
Or Gate: Trace

On Wa,c = 1

W0,c = 0.5
(Bias) 0.5 0.5 > 0 On

Wb,c = 1
Off
Or Gate: Trace

Off Wa,c = 1

W0,c = 0.5
(Bias) 0.5 0.5 > 0 On

Wb,c = 1
On
Or Gate: Trace

On Wa,c = 1

W0,c = 0.5
(Bias) 1.5 1.5 > 0 On

Wb,c = 1
On
Threshold Functions can make
Logic Gates with Neurons!
Wa,c = -1

W0,c = -0.5
Logical Not (Bias)
Σ(W)- W0,c
a z (Bias)

1 0
!
0 1
If ( Σ(w) – Wo,c > 0 )
Then FIRE
Else
Don’t
Not Gate: Trace

Off Wa,c = -1

W0,c = -0.5
(Bias) -0.5 0.5 > 0 On

0 – (-0.5) = 0.5
Not Gate: Trace

On Wa,c = -1

W0,c = -0.5
(Bias) -0.5 -0.5 < 0 Off

-1 – (-0.5) = -0.5
Feed-Forward Vs.
Recurrent Networks
• Feed-Forward • Recurrent
– No Cyclic connections – Cyclic connections
– Function of its current – Dynamic behavior
inputs • Stable
– No internal state other • Oscillatory
then weights of • Chaotic
connections – Response depends on
• “Out of time” current state
• “In time”
– Short term memory!
Feed-Forward Networks
• “Knowledge” is represented by weight on edges
– Modeless!
• “Learning” consists of adjusting weights
• Customary Arrangements
– One Boolean output for each value
– Arranged in Layers
• Layer 1 = inputs
• Layer 2 to (n-1) = Hidden
• Layer N = outputs
– “Perceptron” 2 layer Feed-Forward network
Layers

Input Output

Hidden layer
Perceptron Learning
• Gradient Decent used to reduce error

CENSORED

• Essentially:
– New Weight = Old Weight + adjustment
– Adjustment = α X error X input X d(activation function)
• α = Learning Rate
Hidden Network Learning
• Back-Propagation

CENSORED

• Essentially:
– Start with Gradient Decent from output
– Assign “blame” to inputting neurons proportional to
their weights
– Adjust weights at previous level using Gradient
decent based on “blame”
They don’t get it either:
Issues that aren’t well understood
• α (Learning Rate)
• Depth of network (number of layers)
• Size of hidden layers
– Overfitting
– Cross-validation
• Minimum connectivity
– Optimal Brain Damage Algorithm
• No extractable model!
How Are Neural Nets Different
From My Brain?
1. Neural nets are feed forward
– Brains can be recurrent with feedback loops
2. Neural nets do not distinguish between + or –
connections
– In brains excitatory and inhibitory neurons have different
properties
“Fraser’s” Rules
• Inhibitory neurons short-distance
3. Neural nets exist “Out of time”
– Our brains clearly do exist “in time”
4. Neural nets learn VERY differently
– We have very little idea how our brains are learning

“In theory one can, of course, implement biologically realistic neural networks, but this is
a mammoth task.  All kinds of details have to be gotten right, or you end up with a
network that completely decays to unconnectedness, or one that ramps up its
connections until it basically has a seizure.”
Frontiers in AI
• Applications of current algorithms
• New algorithms for determining
parameters from training data
– Backward-Forward
– Backpropagation
• Better classification of the mysteries of
neural networks
• Pathology modeling in neural networks
• Evolutionary modeling

You might also like