Machine Learning and Data Mining
8. Genetic Algorithms
Luc De Raedt
Virtually all slides taken from Tom Mitchell,
some material also from Milanie Mitchells book
Contents
Evolutionary computation
Prototypical GA
An example: GABIL
Schema theorem
Genetic Programming
Evolutionary Computation
Computational procedures patterned
after biological evolution
Search procedure that probabilistically
applies search operators to set of points
in the search space
Biological Evolution
Lamarck and others:
Species ``transmute'' over time
Darwin and Wallace:
Consistent, heritable variation among individuals
in population
Natural selection of the fittest
Mendel and genetics:
A mechanism for inheriting traits
genotype to phenotype mapping
Appeal of Evolution
Search through vast search spaces
Parallelism
Simple rules of random variation
(mutation, recombination and others)
and natural selection responsible for
extraordinary variety and complexity
Fitness Landscapes:
Smooth Hills
Fitness Landscapes:
Unimodal Hill, Fine Local Texture
Fitness Landscapes:
Coarse Local Structure
GA Operators
Selection:
chooses chromosomes in population for reproduction
Crossover:
randomly chooses locus; exchanges subsequences
before and after locus
Mutation:
randomly flips some of the bits in a chromosome
A Simple Genetic Algorithm
Representing hypotheses
Operators for Genetic Algorithms
Operators for Genetic Algorithms
Selecting Most Fit Hypotheses
Fitness proportionate selection:
... can lead to crowding
Tournament selection:
Pick h1 and h2 at random with uniform prob.
With probability p select the more fit.
Rank selection:
Sort all hypotheses by fitness
Prob of selection is proportional to rank
GABIL (De Jong et al. 93)
Learn disjunctive set of propositional
rules, competitive with C4.5
Genetic operators
want variable length rule sets
want only well-formed bitstring
hypotheses
Crossover with Variable-Length
Bitstrings
GABIL Extensions
Add new genetic operators, also applied probabilistically:
AddAlternative: generalize constraint on ai by changing a 0 to 1
DropCondition: generalize constraint on ai by changing every 0
to 1
And, add new field to bitstring to determine whether to
allow these
So now the learning strategy also evolves!
GABIL Results
Performance of GABIL comparable to
symbolic rule/tree learning methods C4.5,
ID5R, AQ14
Average performance on a set of 12 synthetic
problems:
GABIL without AA and DC operators: 92.1%
accuracy
GABIL with AA and DC operators: 95.2% accuracy
symbolic learning methods ranged from 91.2 to
96.6
Another Example
The Traveling Salesman problem
Given
N cities and their distances
Find
the shortest path that connects them all
How to represent hypotheses ?
a sequence of cities ?
ABCD ?
Genetic operators ?
Cross-over
135 | 26478
876 | 54321
yields 135 | 54321
876 | 26478
Better ?
135 | 26478
876 | 54321
yields 135 | 42876
876 | 24135
Mutation
13526478
gives 13726458
There is nothing sacred about Gen.
Operators
n-Queens
Place 4 queens on a 4 x 4 chessboard so that none can
take another.
Q1
Four variables Q1, Q2,
Q3, Q4 representing the
row of the queen in each
column. Domain of each
variable is {1,2,3,4}
One solution! -->
2
3
Q2
Q3
Q4
The 8 Queens Problem ?
Similar Encoding and Operators
Order matters
Fitness ?
Of a single queen : - number of queens it
attacks
Of a configuration : sum of fitness queens
Selection
Roulette-Wheel
Schema Theorem
How to characterize evolution of population in
GA?
Schema = string containing 0, 1, * (``don't
care'')
Typical schema: 10**0*
Instances of above schema: 101101, 100000, ...
Characterize population by number of
instances representing each possible schema
m(s,t) = number of instances of schema s in pop
at time t
Consider only Selection
Schema Theorem
Genetic Programming (Koza, 1992):
Evolving Lisp Programs
GAs to produce computer programs
Example:
program computing orbital period of planet
(Keplers third law: P2 = cA3)
QuickTime and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
The square of the sidereal period of an orbiting planet
is directly proportional to the cube of the orbit's semimajor axis.
Parse Tree for Expression
Kozas Algorithm
1. Choose set of functions and terminals
2. Generate initial population of random
trees
3. Calculate fitness by running programs
on fitness cases
4. Apply selection, cross-over, mutation
to form a new population
5. Go to step 3
Block-Stacking Problem
Block-Stacking Problem
Terminals:
CS (current stack)
TB (top correct block)
NN (next needed)
Functions:
MS(x) (move to stack)
MT(x) (move to table)
DU(expression1, expression2) (do until)
NOT(expression)
EQ(expression1, expression2)
Block-Stacking Problem
Generation 1:
(EQ (MS NN) (EQ (MS NN)(MS
NN)))
Generation 5:
(DU (MS NN) (NOT NN))
Generation 10:
(EQ (DU (MT CS) (NOT CS)) (DU
(MS NN) (NOT NN)))
Genetic Programming:
Discussion
Block-stacking: simple domain, high-level terminals
and functions
Additional feature: encapsulation (chunking) of useful
subtrees
Density of solutions or useful building blocks in Lisp
expressions?
Comparison with other methods?
Scalability?
Generalization performance?
Classifier Systems
Mixing ideas from
Genetic Algorithms
Rule based Systems
An early form of reinforcement learning
Now connected to Artificial Life
Source:
The Hitch-Hiker's Guide to Evolutionary Computation, Q.1.4
Better: John Holland, Escaping Brittleness, in Machine Learning:
An AI Approach, 1986
Agent Architecture
Control
Rules = Classifier
IF condition THEN action
Conditions and Actions are strings over {0,1,#}
Message passing:
inputs to system are messages (bitstrings)
actions are messages (bitstrings)
conditions match current message
Production system
quite powerful, cf. Post Correspondence Problem from
Theoretical Computer Science
Kermit
IF small, flying object to the left THEN send @
IF small, flying object to the right THEN send %
IF small, flying object centered THEN send $
IF large, looming object THEN send !
IF no large, looming object THEN send *
IF * and @ THEN move head 15 degrees left
IF * and % THEN move head 15 degrees right
IF * and $ THEN move in direction head pointing
IF ! THEN move rapidly away from direction head pointing
Kermit in CS
IF
THEN
0000, 00 00 00 00
0000, 00 01 00 01
0000, 00 10 00 10
1111, 01 ## 11 11
~1111, 01 ## 10 00
1000, 00 00 01 00
1000, 00 01 01 01
1000, 00 10 01 10
1111, ## ## 01 11
conditions
0000 small object
1111 large
00 flying object
01 looming object
00 left
01 right
10 center
~
not
# dont care
Kermit the Classifier System
IF
THEN
0000, 00 00 00 00
0000, 00 01 00 01
0000, 00 10 00 10
1111, 01 ## 11 11
~1111, 01 ## 10 00
1000, 00 00 01 00
1000, 00 01 01 01
1000, 00 10 01 10
1111, ## ## 01 11
actions
0000 @
0001 %
0010 $
1111 ! (danger)
1000 * (safe)
0100 (move left)
0101 (move right)
0110 (move ahead)
0111 (move away)
Classifier System Algorithm
(no learning)
t := 0;
initMessageList ML (t);
initClassifierPopulation P (t);
(Random)
while not done (test for fitness) do
t := t + 1;
ML := readDetectors (t);
ML' := matchClassifiers ML,P (t);
ML := sendEffectors ML' (t);
Learning CFS
t := 0;
ML (t);
initClassifierPopulation P (t);
while not done do
t := t + 1;ML := readDetectors (t);
ML' :=matchClassifiers ML,P (t);
ML' := selectMatchingClassifiers ML',P (t);
ML' :=taxPostingClassifiers ML',P (t);
ML := sendEffectors ML' (t);)
C := receivePayoff (t);
P' :=distributeCredit C,P (t);
At some points in time
P := generateNewRules P'(t);.
Idea of Reinforcement Learning
Bucket-Brigade
Classifiers have strength (fitness)
Complex Bidding System
before classifier is allowed to post message
bidding
highest bidder wins but has to pay with fitness
reinforcement received from environment is
divided across successful bidders
fitness goes down with time
Genetic Algorithm
Crossover among rules
the higher strength/fitness the more likely it is of
being selected
the lower the strength/fitness the more likely the
classifier is replaced
Many variants, ideas,
We: only the very basics
Evolutionary Programming
Conduct randomized, parallel, hillclimbing search through H
Approach learning as optimization
problem (optimize fitness)
Nice feature: evaluation of Fitness can
be very indirect