Introduction to Optimization Concepts
Introduction to Optimization Concepts
Session -
AIM OF THE SESSION
INSTRUCTIONAL OBJECTIVES
LEARNING OUTCOMES
• Early period of World War II. During the war, the British
military faced the problem of allocating very scarce and
limited resources (such as fighter airplanes, radars, and
submarines) to several activities (deployment to numerous
targets and destinations).
• Because there were no systematic methods available to
solve resource allocation problems, the military called upon
a team of mathematicians to develop methods for solving
the problem in a scientific manner.
• The methods developed by the team were instrumental in
the winning of the Air Battle by Britain. These methods, such
as linear programming, which were developed as a result of
research on (military) operations, subsequently became
known as the methods of operations research.
METHODS OF OPERATIONS RESEARCH
APPLICATIONS
• Knapsack problem.
• Travelling sales man problem.
• Job assignment problem.
• Weapon target assignment problem.
• Vehicle routing problem
OPTIMIZATION PROBLEM
OPTIMIZATION PROBLEM
DESIGN VECTOR
• Constraint Surface
– The constraint surface divides the design space
into two regions
g j( X ) Infeasible or unacceptable
0
g j( X ) feasible or acceptable
0
CONSTRAINT SURFACES IN A HYPOTHETICAL TWO-
DIMENSIONAL DESIGN SPACE.
DESIGN CONSTRAINTS
(a) KDD…
(b) OLTP
(c) Data Cube
(d) All of the above
(a) OLTP
(b)OLAP
(c) KDD
(d) None of the above
TERMINAL QUESTIONS
1. Describe Datawarehouse ?
2. List out the differences between Data warehouse, Data mart and Data Lake ?
Reference Books:
1. . Han J & Kamber M, “Data Mining: Concepts and Techniques”, Third Edition, Elsevier,
2011.
2. Anahory, Murray, “Data Warehousing in the Real World”, Pearson Education, 2008.
3. M.Humphires, M.Hawkins, M.Dy,“Data Warehousing: Architecture and
Implementation”, Pearson Education, 2009.
Team – BDO
Department of
COURSE NAME: BIGCSE
DATA OPTIMIZATION
COURSE CODE: 21CS3276R
TOPIC :
EVOLUTIONARY & GENETIC
ALGORITHMS
Session -
AIM OF THE SESSION
To familiarize students with the basic concept of EVOLUTIONARY & GENETIC ALGORITHMS
INSTRUCTIONAL OBJECTIVES
LEARNING OUTCOMES
3. Simulation
problem
WHAT IS AN EVOLUTIONARY
ALGORITHM?
• There are many different variants of evolutionary algorithms.
The common underlying idea behind all these techniques is the
same:
1. Given a population of individuals
2. The environmental pressure causes natural selection
(survival of the fittest), which causes a rise in the fitness of
the population.
WHAT IS AN EVOLUTIONARY
ALGORITHM?
WHAT IS AN EVOLUTIONARY
ALGORITHM?
PROPERTIES OF EVOLUTIONARY
ALGORITHM
• EAs are population based, i.e., they process a
whole collection of candidate solutions simultaneously.
3. Population
• Mutation
• Recombination
MUTATION
Item A B C D
Weight(wi) 40 10 20 10
Ratio (pi/wi) 7 10 6 5
0-1 KNAPSACK
After sorting, the itemsare as shown
in the following table.
Item B A C D
Weight(wi) 10 40 20 10
Ratio (pi/wi) 10 7 6 5
0-1 KNAPSACK
After sorting, the itemsare as shown
in the following table.
Item B A C D
Weight(wi) 10 40 20 10
Ratio (pi/wi) 10 7 6 5
28 784
25 625
27 729
20 400
2538
634.5
784
REPRESENTATION OF
1. INDIVIDUALS
Binary Representations
2. Integer Representations
4. Permutation Representations
Mutation
1. Mutation for Binary Representations
MUTATION OPERATORS FOR INTEGER
REPRESENTATIONS
2. Mutation Operators for Integer Representations
Random Resetting
•With probability Pm a new value is chosen at random from the set of permissible values in each position.
Creep Mutation
•Designed for ordinal attributes and works by adding a small (positive or negative) value to each gene with probability
p.
MUTATION OPERATORS FOR FLOATING-POINT
REPRESENTATIONS
2. Mutation Operators for Floating-Point Representations
Uniform Mutation
• The values of x’ are drawn uniformly randomly from [Li, Ui]
•Analogous to bit-flipping for binary encodings and the random
resetting sketched for integer encodings.
• Position wise mutation probability
MUTATION OPERATORS FOR FLOATING-POINT
REPRESENTATIONS
2. Mutation Operators for Floating-Point Representations
• Insert Mutation
• Scramble Mutation
• Inversion Mutation
1 2 3 4 5 6 7 8 9 1 5 4 3 2 6 7 8 9
RECOMBINATION
• Recombination, the process whereby a new individual solution
is created from the information contained within two (or
more) parent solutions.
• One-Point Crossover
RECOMBINATION OPERATORS FOR BINARY
REPRESENTATIONS
• N-Point Crossover
RECOMBINATION OPERATORS FOR BINARY
REPRESENTATIONS
• Uniform Crossover
• In each position, if the value is below a parameter p (usually
0.5), the gene is inherited from the first parent; otherwise
from the second. The second offspring is created using the
inverse mapping.
RECOMBINATION OPERATORS FOR INTEGER
REPRESENTATIONS
Answer: ABDFCE
RECOMBINATION OPERATORS FOR PERMUTATION
REPRESENTATIONS
• Order Crossover [Designed by Davis for order-based
permutation]
• problems.]
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS
• Order Crossover
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS
• Cycle Crossover
• The operator works by dividing the elements into cycles.
• A cycle is a subset of elements that has the property that each
element always occurs paired with another element of the
same cycle when the two parents are aligned.
• Having divided the permutation into cycles,
• The offspring are created by selecting alternate cycles from
each parent.
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS
• Cycle Crossover
• The procedure for constructing cycles is as follows:
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS
• Cycle Crossover
1 2 3 4 5 6 7 8 9 1 3 7 4 2 6 5 8 9
9 3 7 8 2 6 5 1 4 9 2 3 8 5 6 7 1 4
POPULATION MODEL
• Generational model
• In each generation we begin with a population of size
from which a mating pool of parents is selected.
• Next, offspring are created from the mating pool by
the application of variation operators, and evaluated.
• After each generation, the whole population is replaced by its
offspring, which is called the "next generation" .
POPULATION MODEL
• Steady-state model
• In the steady state model, the entire population is not changed at once, but rather a part
of it.
• Select from (λ+μ)
• Generational gap
– If μ parents and λ offspring Generation gap
= λ/μ
PARENT SELECTION
• Fitness Proportional Selection
• When fitness values are all very close together, there is almost no
selection pressure.
• Premature convergence.
PARENT
SELECTION
• Ranking Selection
– It preserves a constant selection pressure by sorting the population on
the basis of fitness, and then allocating selection probabilities to
individuals according to their rank, rather than according to their actual
fitness values.
PARENT SELECTION
• Tournament Selection
SURVIVOR SELECTION
• Age-Based Replacement
• Aged individuals will be changed
• Fitness-Based Replacement
• fitness proportionate and tournament selection
• Replace Worst (GENITOR)
• Elitism
CREDIT JASON
LOHN
Team – BDO
Department of
COURSE NAME: BIGCSE
DATA OPTIMIZATION
COURSE CODE: 21CS3276R
TOPIC :
DIFFERENTIAL EVOLUTION ALGORITHM
Session -
AIM OF THE SESSION
INSTRUCTIONAL OBJECTIVES
LEARNING OUTCOMES
1
2
1
DIFFERENTIAL EVOLUTION
• Inspired from the real worlds problem of Digital filter
coefficients
• In signal processing, a digital filter is a system that performs
mathematical operations on a sampled, discrete-time signal to
reduce or enhance certain aspects of that signal.
E=w(Exp-Actual)
1
2
2
DIFFERENTIAL EVOLUTION
• DE optimizes a problem by maintaining a population of
candidate solutions and creating new candidate solutions by
combining existing ones according to its simple formulae, and
then keeping whichever candidate solution has the best score
or fitness on the optimization problem at hand.
Global Minimum:
xi ∈ [-32.768,
32.768]
4
A BASIC DIFFERENTIAL EVOLUTION
ALGORITHM
1
2
4
A BASIC DIFFERENTIAL EVOLUTION
ALGORITHM
1
2
5
A BASIC DIFFERENTIAL EVOLUTION
ALGORITHM
• This algorithm is often referred to as classic DE.
• It is also called DE/rand/1/bin because the base vector, xri,
is randomly chosen; one vector difference (that is, F(xr2 -
xr3)) is added to xr1 and the number of mutant vector
elements that are contributed to the trial vector closely
follows a binomial distribution.
• In probability theory and statistics, the binomial
distribution with parameters n and p is discrete
probability
the distribution of the number of successes in
a sequence of n independent experiments, each asking a
no
yes– question, and each with its own Boolean-
valued outcome: yes or no (with probability q = 1 − p).
• It would exactly follow a binomial distribution if not for the "j = Jr" test
7
DIFFERENTIAL EVOLUTION VARIATIONS
• Trial Vectors
– DE/rand/1/L works by generating a random integer L ϵ [l,n], copying L
consecutive features from vi to ui, and then copying the remaining
features from Xi to Ui
12
7
DIFFERENTIAL EVOLUTION VARIATIONS
• For example, suppose that we have a seven-dimensional problem
(n = 7). The DE/rand/1/L algorithm works by first generating a
random integer L ϵ [l,n]; suppose that L = 3. We then generate a
random starting point s ϵ[l,n]; suppose that s = 6.
12
8
DIFFERENTIAL EVOLUTION VARIATIONS
129
DIFFERENTIAL EVOLUTION VARIATIONS
130
MUTANT VECTORS
13
1
MUTANT VECTORS
13
2
MUTANT VECTORS
• DE/rand/2/bin or DE/best/2/bin
• DE/rand/2/L or DE/best/2/L
13
3
MUTANT VECTORS
• DE/target/1/bin, DE/target/2/bin,
• DE/target/1/L, or DE/target/2/L
13
4
MUTANT VECTORS
• either-or algorithm
• We could combine various methods by randomly
deciding how to generate the mutant vector.
13
6
MUTANT VECTORS
13
7
MUTANT VECTORS
13
8
SCALE FACTOR
ADJUSTMENT
• DE's scale factor F determines the effect that difference vectors
have on the mutant vector.
• So far we have assumed that F is a constant.
• We can vary the DE scale factor two different ways.
• Dither
• Jitter
• Dither:- we can allow F to remain a scalar and
randomly
change it each time through the "for each individual“
• Jitter:-we can change F to an n-element vector and
randomly change each element of F in the "for each
individual" loop, so that each element of the mutant vector
v is modified by a uniquely-scaled component of the
difference vector. 13
9
SCALE FACTOR ADJUSTMENT
Dither
Jitter
14
0
SCALE FACTOR
ADJUSTMENT
14
2
MIXED-INTEGER DIFFERENTIAL
EVOLUTION
• One obvious approach to ensure that Vi ϵ D is to
simply project it onto D.
• For example, if D is the set of n-dimensional integer
vectors, then
14
4
DISCRETE DIFFERENTIAL
EVOLUTION
• Another way to modify DE for discrete problems is to change
the mutant vector generation method so that it directly
creates mutant vectors that lie in the discrete domain D.
14
5
DIFFERENTIAL EVOLUTION AND GENETIC
ALGORITHMS
14
6
DIFFERENTIAL EVOLUTION AND GENETIC
ALGORITHMS
14
7
SESSION INTRODUCTION
SESSION INTRODUCTION
SELF-ASSESSMENT QUESTIONS
1._________
(a) KDD…
(b) OLTP
(c) Data Cube
(d) All of the above
(a) OLTP
(b)OLAP
(c) KDD
(d) None of the above
TERMINAL QUESTIONS
1.
REFERENCES FOR FURTHER LEARNING OF THE SESSION
Reference Books:
1. . Han J & Kamber M, “Data Mining: Concepts and Techniques”, Third Edition, Elsevier,
2011.
2. Anahory, Murray, “Data Warehousing in the Real World”, Pearson Education, 2008.
3. M.Humphires, M.Hawkins, M.Dy,“Data Warehousing: Architecture and
Implementation”, Pearson Education, 2009.
Team – BDO
DEPARTMENT OF CSE H
Session - 16
CREATED BY K. VICTOR
BABU
HONEYBEE’S
FISH SCHOOLING ANT COLONY
ANT COLONY
AIM OF THE
SESSION
To familiarize students with Particle Swarm Algorithm and its applications
INSTRUCTIONAL OBJECTIVES
LEARNING OUTCOMES
As usually, the big fish is difficult to catch, hidden in the deepest part of the pond. At each time step, each
fisherman tells the other how deep the pond is at his place. At the very begining, as the depths are quite
similar, they both follow their own ways. Now, Fisherman 2 seems to be on a better place, so Fisherman 1
tends to go towards him quite rapidly. Now, the decision is a bit more difficult to make. On the one hand
Fisherman 2 is still on a better place, but on the other hand, Fisherman 1’s position is worse than before. So
Fisherman 1 comes to a compromise: he still goes towards Fisherman 2, but more slowly than before. As we
can see, doing that, he escapes from the local minimum.
Of course, this example is a caricatural one, but it presents the main features of a particle in basic PSO:
a position,
a velocity (or, more precisely an operator which can be applied to a position in order to modify it),
the ability to exchange information with its neighbours,
the ability to memorize a previous position, and
the ability to use information to make a decision.
Remember: All that as to remain simple.
THE BASIC IDEA
The particles in the swarm co-operate. They exchange information about what
they’ve discovered in the places they have visited
The co-operation is very simple. In basic PSO it is like this:
• A particle has a neighbourhood associated with it.
• A particle knows the fitnesses of those in its neighbourhooda and
• uses the position of the one with best fitness.
• This position is simply used to adjust the particle’s velocity
COOPERATION
THE BASIC INITIALIZATION: POSITIONS AND VELOCITIES
Initial Values
Boundary
Values
Movements
THE BASIC IDEA
geographic
al social
GLOBAL
Global
NEIGHBOURHOODS
Now, for each particle, we define what is called a neighbourhood. Although some variants use a “geographical”
neighbourhood, that is to say compute distances and take the nearest particles, the most widely used
neighbourhood is a “social” one: just a list of neighbours, regardless where they are.
So, you do not need to define a distance and that is a great advantage, for in some cases, particularly for discrete
spaces, such a definition would be quite arbitrary.
Note: It can be proved (and it is intuitively quite obvious) that if the process converges any social neighbourhood
tends to be also a geographical one.
Usually, in practice, social neighbourhoods are defined just once, at the very beginning, which is consistent with the
principle “simple rules for simple agents”.
Now, the size of the neighbourhood could be a problem. Fortunately, PSO is not very sensitive to this parameter and
most of users just take a value of 3 or 5 with good results.
Unlike for the swarm size, there is no mathematical formula, but like for the swarm size, there are some adaptive
variants.
NEIGHBOURHOODS
• The most commonly used neighbourhood is the circular one. The picture is almost self explanatory. Each
particle is numbered, put on a virtual circle according to its number and the neighbourhood of a given particle
is built by taking its neighbours on this circle.
• An important point for rule simplicity is that each particle belongs to its neighbourhood. For example if a rule
says “I have to check all my neighbours”, there is no need to add “and I have to check myself”. We will see
that more precisely later.
• The most commonly used neighbourhood is the circular one. The picture is almost self explanatory. Each
particle is numbered, put on a virtual circle according to its number and the neighbourhood of a given particle
is built by taking its neighbours on this circle.
• An important point for rule simplicity is that each particle belongs to its neighbourhood.
• For example if a rule says “I have to check all my neighbours”, there is no need to add “and I have to check
myself”. We will see that more precisely later.
THE CIRCULAR NEIGHBOURHOOD
Particle 1’s 3- 1
neighbourhoo 8 2
d
7 3
Virtual circle 4
6
5
PSYCHOSOCIAL COMPROMISE
What PSO formalizes is, how to combine these tendancies in order to be globally efficient.
PSYCHOSOCIAL COMPROMISE
Particles adjust their Positions according to a “Psychosocial Compromise’’ between what an Individual is
Comfortable with, and what Society Reckons
My best perf.
pi
Here I am! The best perf. of my
neighbours
x
pg
v
PSO ALGORITHM
• Each individual in the particle swarm is composed of three D-dimensional vectors, where D is the dimensionality of the
search space. These are the current position xi, the previous best position pi, and the velocity vi.
• The current position xi can be considered as a set of coordinates describing a point in space. On each iteration of the
algorithm, the current position is evaluated as a problem solution. If that position is better than any that has been
found so far, then the coordinates are stored in the second vector, pi. The value of the best function result so far is
stored in a variable that can be called pbesti (for “previous best”), for comparison on later iterations. The objective, of
course, is to keep finding better positions and updating pi and pbesti. New points are chosen by adding vi coordinates
to xi, and the algorithm operates by adjusting vi, which can effectively be seen as a step size.
Note: A particle by itself has almost no power to solve any problem; progress occurs only when the particles interact.
PSO ALGORITHM
The topology typically consists of bidirectional edges connecting pairs of particles, so that if j is in i’s
neighborhood, i is also in j’s. Each particle communicates with some other particles and is affected by the best
point found by any member of its topological neighborhood. This is just the vector pi for that best neighbor,
which we will denote with pg. The potential kinds of population “social networks” are hugely varied, but in
practice certain types have been used more frequently.
In the particle swarm optimization process,
• the velocity of each particle is iteratively adjusted so that the particle stochastically oscillates around pi and pg
locations.
Mathematical Model
• Each particle in particle swarm optimization has an associated position, velocity, fitness value.
• Each particle keeps track of the particle_bestFitness_value and particle_bestFitness_position.
• A record of global_bestFitness_position and global_bestFitness_value is maintained.
PSO ALGORITHM DATA
STRUCTURE
Data structure to store Swarm population Data structure to store ith particle of
Swarm
PSO ALGORITHM FLOWCHART
https://
www.baeldung.com/cs/pso
PSEUDOCODE
Choose the particle with the best fitness value of all as gBest Intertia
For each particle
Calculate particle velocity according equation (a) Personal Influence Social Influence
Update particle position according equation (b)
End
Step1: Randomly initialize Swarm population of N particles Xi d. Update new best of this particle and new best of Swarm
( i=1, 2, …, n) if swaInsensitive to scaling of design variables.rm[i].fitness <
Step 2: Select hyperparameter values swarm[i].bestFitness:
w, c1 and c2 swarm[i].bestFitness = swarm[i].fitness
Step 3: For Iter in range(max_iter): # loop max_iter times swarm[i].bestPos = swarm[i].position
For i in range(N): # for each particle:
a. Compute new velocity of ith particle if swarm[i].fitness < best_fitness_swarm
swarm[i].velocity = best_fitness_swarm = swarm[i].fitness
w*swarm[i].velocity + best_pos_swarm = swarm[i].position
r1*c1*(swarm[i].bestPos - swarm[i].position) + End-for
r2*c2*( best_pos_swarm - swarm[i].position) End -for
b. Compute new position of ith particle using its new Step 4: Return best particle of Swarm
velocity
swarm[i].position += swarm[i].velocity
c. If position is not in range [minx, maxx] then clip it
if swarm[i].position < minx: Particle Swarm Optimization (PSO) - An Overview - Geeksfor
Geeks
swarm[i].position = minx
elif swarm[i].position > maxx:
swarm[i].position = maxx
PARAMETERS OF PSO ALGORITHM
Parameters of problem:
• Number of dimensions (d)
• Lower bound (minx)
• Upper bound (maxx)
Hyperparameters of the algorithm:
• Number of particles (N)
• Maximum number of iterations (max_iter)
• Inertia (w)
• Cognition of particle (C1)
• Social influence of swarm (C2)
ADVANTAGES AND
DISADVANTAGES
Advantages of PSO:
1. Insensitive to scaling of design variables.
2. Derivative free.
3. Very few algorithm parameters.
4. Very efficient global search algorithm.
5. Easily parallelized for concurrent processing.
6. It is easy to implementation, so it can be applied both in scientific research and engineering problems.
7. It has a limited number of parameters and the impact of parameters to the solutions is small compared to other
optimization techniques.
8. The calculation in PSO algorithm is very simple.
9. Some techniques ensure convergence, and the optimum value of the problem calculates easily within a short time.
10. PSO is less dependent of a set of initial points than other optimization techniques.
11. It is conceptually very simple.
Disadvantages of PSO:
12. Slow convergence in the refined search stage (Weak local search ability).
13. PSO algorithm suffers from the partial optimism, which degrades the regulation of its speed and direction.
APPLICATIONS
Applications of PSO
• Detection and diagnosis of faults and recovery
from them
• Design or optimization of engineers and electrical
motors
• Applications in metallurgy
• Security and military applications
• Vehicle routing problems
• Signature verification
• Fuzzy neural networks
EXAMPLE
Example : Find the maximum of the function f(x) = x2 - 5x - 10 with -10 ≤ x ≤10 using the PSO algorithm. (one step)
Use 9 particles with the initial position
x1=-9, x2=-6, x3=-4, x4=-1, x5=0.6, x6=3, x7=3.8, x8=7, x9=10
Step 1: Choose the number of particles x1=-9, x2=-6, x3=-4, x4=-1, x5=0.6, x6=3, x7=3.8, x8=7, x9=10
The initial population (i.e., iteration number t=0 can be represented as x i, i=1, 2, 3, 4, 5, 6, 7, 8, 9
x10=-9, x20=-6, x30=-4, x40=-1, x50=0.6, x60=3, x70=3.8, x80=7, x90=10
Evaluate the objective function values are
f10= (-9)2-5*(-9) +10 = 46, f20= (-6)2-5*(-6) +10 = 16, f30= (-4)2-5*(-4) +10 = 6, f40= (-1)2-5*(-1) +10 = 6,
f50= (0.6)2-5*(0.6) +10 = 12.4, f60= (3)2-5*(3) +10 = 4, f70= (3.8)2-5*(3.8) +10 = 5.44, f80= (7)2-5*(7) +10 = 24
f90= (10)2-5*(10) +10 = 60
Let c1=c2=1. Set the initial velocities of each particle to zero.
Step 2: Set the iteration number as t=0+1 and go to step 3
Step 3: Find the personalbest(pbest)for each particle by,
P1best,1=-9, P1best,2=-6, P1best,3=-4, P1best,4=-1, P1best,5=0.6, P1best,6=3, P1best,7=3.8, P1best,8=7, P1best,9=10
Step 4: Find the Globalbest(gbest) by, Gbest= min {ptbest, t} Where i=1 to 9
Since the minimum personal best is P1best, 6 =3
EXAMPLE (Cont.)
Step 5: Considering the Random Numbers in the range(0,1) as r11=0.213 and r21=0.876 and
find the velocities of the particles by,
V1=0+0.213(9-9) +0.876(3+9) =10.512, V2=0+0.213(6-6) +0.876(3+6) =7.884
V3=0+0.213(2.6-2.6) +0.876(3+2.6) =4.9056, V4=0+0.213(-1+1) +0.876(3-1) =1.752
V5=0+0.213(0.6-0.6) +0.876(3-0.6) =2.1024, V6=0+0.213(3-3) +0.876(3-3) =0
V7=0+0.213(3.8-3.8) +0.876(3-3.8) =0.7008, V8=0+0.213(7-7) +0.876(3-7) =-3.504
V9=0+0.213(10-10) +0.876(3-10) =-6.132
Step 6: Find the new values of xi1,i=1 to 9 by
xi1+1= x it+vit+1
x 11=9+10.512=19.512, x 21=6+7.884=12.884, x 31=2.6+4.9056=7.5056, x 41=-1+1.752=0.752
x 51=0.6+2.1024=2.7024, x 61=3+0=3, x 71=3.8+0.7008=4.5008, x 81=7-3.504=4.504
x 91=10-6.132=4.132
Step 7: Stopping criteria
SUMMARY
(a) Uses a population-based approach, while other algorithms use an individual-based approach.
(b) Uses an individual-based approach, while other algorithms use a population-based approach.
(c) Uses a deterministic approach, while other algorithms use a stochastic approach.
(d) Uses a stochastic approach, while other algorithms use a deterministic approach.
CREATED
BY K.
TERMINAL QUESTIONS
Q1. Find the maximum of the function f(x) = 2x2 - 3x +5 with -5 ≤ x ≤ 5 using the PSO algorithm. Explain
upto 2 iterations considering five initial random points, x1= -2, x2= 0.3, x3= -1, x4= 3, x5= 0.1.
Q2. Explain in detail the PSO algorithm with flow chart and its mathematical components.
Q4. Explain in detail the parameters and hyper parameters used in the PSO algorithm.
Q5. Maximize the function f(x) = sin (x) with --1 ≤ x ≤ 1 using the PSO algorithm. Explain upto 2 iterations
considering five initial random points, x1= -0.2, x2= 0.3, x3= -0.5, x4= 0.2, x5= 0.1.
CREATED
BY K.
REFERENCES
Reference Books:
1. Chapter 5, Modern Optimization with R, Paulo Cortez, Second Edition, Springer, 2021
2. Nature-Inspired Algorithms and Applications, Anupriya Jain, Dinesh Goyal, S. Balamurugan, Sachin Sharma,
Seema Sharma, Sonia Duggal, Second Edition, Wiley
3. OPTIMIZATION Algorithms and Applications, Rajesh Kumar Arora, Second Edition, Taylor & Francis Group,
LLC
4. https://www.intechopen.com/chapters/69586
5. https://towardsdatascience.com/swarm-intelligence-coding-and-visualising-particle-swarm-optimisation-i
n-python-253e1bd00772
6. http://www.swarmintelligence.org/tutorials.php
7. https://www.geeksforgeeks.org/introduction-to-swarm-intelligence/
CREATED
BY K.
THANK
YOU
CREATED
BY K.
DEPARTMENT OF CSE H
Session - 17
CREATED BY K. VICTOR
BABU
AIM OF THE
SESSION
To familiarize students with Estimation of Distribution Algorithm and its applications
INSTRUCTIONAL OBJECTIVES
LEARNING OUTCOMES
Estimation of Distribution Algorithms (EDAs) constitute a powerful evolutionary algorithm for solving
continuous and combinatorial optimization problems.
Based on machine learning techniques, at each generation, EDAs estimate a joint probability distribution
associated with the set of most promising individuals, trying to explicitly express the interrelations between
the different variables of the problem.
Based on this general framework, EDAs have proved to be very competitive for solving combinatorial and
continuous optimization problems.
EDA
• Estimation of distribution algorithms (EDA) are optimization methods that combine ideas from evolutionary
computation, machine learning, and statistics.
• Estimation of distribution algorithms are stochastic optimization algorithms that explore the space of candidate
solutions by sampling an explicit probabilistic model constructed from promising solutions found so far.
• EDAs typically work with a population of candidate solutions to the problem, starting with the population
generated according to the uniform distribution over all admissible solutions.
• The population is then scored using a fitness function.
• This fitness function gives a numerical ranking for each string, with the higher the number the better the string.
• From this ranked population, a subset of the most promising solutions are selected by the selection operator.
• An example selection operator is truncation selection with threshold t = 50%, which selects the 50% best
solutions.
EDA
• The algorithm then constructs a probabilistic model which attempts to estimate the probability
distribution of the selected solutions.
• Once the model is constructed, new solutions are generated by sampling the distribution encoded by this
model.
• These new solutions are then incorporated back into the old population, possibly replacing it entirely.
• The process is repeated until some termination criteria is met (usually when a solution of sufficient quality
is reached or when the number of iterations reaches some threshold), with each iteration of this procedure
usually referred to as one generation of the EDA.
EDA FLOWCHART
FUNCTIONS USED IN THE ALGORITHM
• selection function goal is to choose the most interesting solutions. For instance, truncation selection chooses a
percentage of the best solutions from current population (P).
• The essential steps of EDA are the estimation and simulation of the search distribution, which is implemented by the
learn and sample functions.
• The learning estimates the structure and parameters of the probabilistic model (M) and the sampling is used to
generate new solutions (P’) from the probabilistic model.
• Finally, the replacement function defines the next population.
ALGORITHM
Algorithm Generic EDA pseudo-code implemented in copulaedas package, adapted from (Gonzalez-Fernandez and Soto, 2014)
1: Inputs: f, C ⊲ f is the fitness function, C includes control parameters (e.g., NP )
2: P ← initialization(C) ⊲ set initial population (seeding method)
3: if required then P ← local_optimization(P, f, C) ⊲ apply local optimization to P
4: end if
5: B ← best(P, f ) ⊲ best solution of the population
6: i ← 0 ⊲ i is the number of iterations of the method
7: while not termination_criteria(P, f, C) do
8: P ′ ← selection(P, f, C) ⊲ selected population P ′
9: M ← learn(P ′) ⊲ set probabilistic model M using a learning method
10: P ′ ← sample(M) ⊲ set sampled population from M using a sampling method
11: if required then P ′ ← local_optimization(P ′, f, C) ⊲ apply local optimization to P ′
12: end if
13: B ← best(B, P ′, f ) ⊲ update best solution (if needed)
14: P ← replacement(P, P ′, f, C) ⊲ create new population using a replacement method
15: i←i+1
16: end while
17: Output: B ⊲ best solution
EDA ALGORITHM SCHEME
EDA ALGORITHM SCHEME
TYPES OF EDA’S
Different types of EDAs have been proposed based on the complexity of their probabilistic
models. They can be categorized into univariate, bivariate, and multivariate factorization EDAs.
• Univariate factorization: EDAs, such as Univariate Marginal Distribution Algorithm (UMDA),
assume that decision variables are independent and rely on univariate statistics and marginal
probability distributions.
• Bivariate factorization: EDAs Such as Mutual Information Maximizing Input Clustering (MIMIC)
and Bivariate Marginal Distribution Algorithm (BMDA), allow for pairwise dependencies between
variables and model these dependencies through bivariate distributions.
• Multivariate factorization: EDAs Such as Extended Compact Genetic Algorithm (ECGA) and
Bayesian Optimization Algorithm (BOA), can model higher-order dependencies between
variables and factorize the joint probability distribution into multiple components.
ADVANTAGES OF EDA
1. Global optimization: EDA can effectively explore large solution spaces and search for global optima in complex
optimization problems.
2. Model-based approach: EDA builds and updates probabilistic models of the solution space, allowing it to capture the
underlying structure of the problem, which can lead to more efficient search and better solutions.
3. Adaptive search: EDA can adapt the search strategy by continuously updating the probability models as new
information is gathered during the optimization process.
4. Parameter tuning: EDA algorithms often have fewer user-defined parameters compared to other optimization
techniques, making them more accessible to users who may not have extensive domain knowledge.
5. Versatility: EDA can be applied to a wide range of optimization problems, including combinatorial, continuous, and
mixed-integer problems, making it a versatile approach.
6. Robustness: EDA is less sensitive to issues like local optima, and it can explore multiple promising regions of the
solution space simultaneously.
7. Parallelism: EDA can be parallelized effectively, which can speed up the optimization process and handle larger
problem instances.
DISADVANTAGES OF EDA
1. Computational complexity: Building and maintaining probabilistic models can be computationally expensive,
especially in high-dimensional spaces, which may slow down the optimization process.
2. Convergence issues: EDA algorithms may not converge to the optimal solution in some cases, and it can be
challenging to determine when to stop the search.
3. Model estimation errors: The quality of the probabilistic models depends on the quality and quantity of the data
used for estimation. Errors in model estimation can lead to suboptimal solutions.
4. Limited applicability: EDA may not be the best choice for simple, well-structured problems where other
optimization methods, such as gradient-based methods, may be more efficient.
5. High memory requirements: EDA algorithms often require substantial memory to store and update probabilistic
models, which can be a limitation for resource-constrained environments.
6. Lack of domain-specific knowledge: EDA may not be suitable for problems where domain-specific knowledge is
crucial, as it primarily relies on data-driven modeling.
7. Sensitivity to parameter settings: While EDA typically has fewer user-defined parameters than some other
algorithms, the choice of parameters, such as population size and model selection, can still impact performance and
may require some tuning.
EXAMPLE: SOLVING ONEMAX WITH A SIMPLE
EDA
Let us illustrate the basic EDA procedure with an example of a simple EDA solving the onemax problem.
In onemax, candidate solutions are represented binary strings of fixed length n>0. The Objective function is to
maximize onemax, which is defined as sum of the bits in the input binary string
…….
• The probability vector provides a fast and efficient model for solving the onemax problem and many other
optimization problems, mainly due to the fact that it is based on the assumption that all problem variables are
independent.
• To learn a probability vector, the probability pi of a 1 in each position i is set to the proportion of selected
solutions containing a 1 in this position.
• To generate a new binary string from the probability vector, for each position i, a 1 is generated in this position
with probability pi. For example, if p3 = 0.6, we generate a 1 in the third position of a new candidate solution with
the probability of 60%.
SOLVING ONEMAX WITH A SIMPLE EDA
THE BASIC IDEA
• It is clear from the first generation that the procedure is having positive effects. The offspring
population already contains significantly more 1s than the original population and also includes
several copies of the global optimum 11111.
• In addition, the probability of a 1 in any particular position has increased; consequently, the
probability of generating the global optimum has increased.
• The second generation leads to a probability vector that is even more strongly biased towards the
global optimum and if the simulation was continued for one more generation, the probability
vector would generate only the global optimum.
• The learning and sampling of the probabilistic model provides a mechanism for both
1. improving quality of new candidate solutions (under certain assumptions), and
2. facilitating exploration of the set of admissible solutions.
PROBABILITY VECTOR ON ONEMAX
ADVANTAGES AND
DISADVANTAGES
Advantages:
1.Global Search: EDAs are capable of conducting global search in complex solution spaces, making them suitable for a
wide range of optimization problems.
2.Adaptation: They can adapt to the problem's characteristics over time, allowing them to efficiently explore and
exploit the search space.
3.No Need for Gradients: Unlike gradient-based optimization methods, EDAs do not require gradients of the objective
function, making them applicable to non-differentiable or noisy functions.
4.Versatility: EDAs can be used for both continuous and discrete optimization problems, making them versatile in
various domains.
5.Scalability: They can be parallelized and applied to high-dimensional problems, which is crucial for real-world
applications in engineering, finance, and other fields.
6.Exploration and Exploitation: EDAs maintain a balance between exploration (searching for new solutions) and
exploitation (improving promising solutions), leading to efficient convergence.
ADVANTAGES AND
DISADVANTAGES
Disadvantages:
1. Computational Cost: EDAs can be computationally expensive, especially when dealing with large-scale
optimization problems due to the need for sampling and modeling the distribution.
2. Complexity: Implementing EDAs can be more complex and time-consuming compared to simpler optimization
algorithms like gradient descent.
3. Parameter Tuning: EDAs often require the tuning of various parameters, such as population size and selection
criteria, which can be challenging and time-consuming.
4. Convergence: In some cases, EDAs might have slower convergence rates compared to more specialized
optimization algorithms tailored to a specific problem.
5. Limited Success for Certain Problems: While versatile, EDAs may not always outperform other optimization
methods for certain types of problems, especially if the problem structure is well-understood and can be
exploited effectively by specialized algorithms.
6. Sensitivity to Initialization: The performance of EDAs can be sensitive to the initial population and parameter
settings, requiring careful setup.
VARIANTS OF EDA
1. Univariate Marginal Distribution Algorithm (UMDA): UMDA is one of the simplest EDAs. It models the probability
distribution of each variable independently. It creates a probabilistic model for each variable by calculating the
marginal distribution.
2. Multivariate Normal Distribution Algorithm (MNDA): MNDA assumes that the probability distribution of the
solution space follows a multivariate normal distribution. It estimates the mean and covariance matrix to
represent the distribution and generate new solutions accordingly.
3. Bayesian Optimization Algorithm (BOA): BOA employs Bayesian networks to model the dependencies among
variables in the problem. It learns the structure and parameters of the Bayesian network to represent the
probabilistic model and guide the search.
4. Compact Genetic Algorithm (cGA): cGA is a type of EDA that uses a compact binary encoding of solutions. It
models the dependencies among variables by considering the interactions between subsets of variables and aims
to discover building blocks that lead to better solutions.
5. Improved Estimation of Distribution Algorithm (IEDA): IEDA is a variation of EDAs that incorporates techniques
like niching, local search, or other enhancements to improve the overall performance of the algorithm in various
optimization tasks.
6. Iterated Local Search with EDA (ILS-EDA): ILS-EDA combines Estimation of Distribution Algorithms with Iterated
Local Search techniques to create a hybrid algorithm that leverages the strengths of both approaches.
APPLICATIONS
Q2. 4. In EDA, what does the term "marginal distribution" refer to?
CREATED
BY K.
TERMINAL QUESTIONS
Q1. (Subset Sum) Given a set of integers and a weight, find a subset of the set so that the sum of its
elements equals the weight.
e.g. given {1,3,5,6,8,10}, W=14
solutions: {1,3,10},{3,5,6},{6,8},{1,5,8}
Q2. Explain in detail the EDA algorithm with flow chart and its components.
Q4. (Optimization)Maximize the function f(x) = x2 with 0 ≤ x ≤ 10 using the EDA algorithm.
a) Find the population of the solution.
b) Explain the result for 2 iterations considering three initial random parents, (0,1,1,0), (1,1,0,0),
(1,0,0,1)
CREATED
BY K.
REFERENCES
Reference Books:
1. Chapter 5, Modern Optimization with R, Paulo Cortez, Second Edition, Springer, 2021
2. Nature-Inspired Algorithms and Applications, Anupriya Jain, Dinesh Goyal, S. Balamurugan, Sachin Sharma,
Seema Sharma, Sonia Duggal, Second Edition, Wiley
3. OPTIMIZATION Algorithms and Applications, Rajesh Kumar Arora, Second Edition, Taylor & Francis Group,
LLC
4. https://www.sidshakya.com/tutorials/
5. https://en.wikipedia.org/wiki/Estimation_of_distribution_algorithm
6. https://www.semanticscholar.org/paper/Estimation-of-Distribution-Algorithms%3A-A-New-for-Bengoetxe
a-Larra%C3%B1aga/9322d11835043d21ce5f812d03da09a9800f8918
CREATED
BY K.
THANK
YOU
CREATED
BY K.