0% found this document useful (0 votes)
28 views217 pages

Introduction to Optimization Concepts

The document provides an introduction to optimization, focusing on its definition, methods, and applications in operations research. It discusses various optimization problems, including linear and nonlinear programming, and highlights the significance of evolutionary and genetic algorithms inspired by biological processes. The session aims to familiarize students with these concepts and their practical implications in decision-making and problem-solving.

Uploaded by

dagos14839
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views217 pages

Introduction to Optimization Concepts

The document provides an introduction to optimization, focusing on its definition, methods, and applications in operations research. It discusses various optimization problems, including linear and nonlinear programming, and highlights the significance of evolutionary and genetic algorithms inspired by biological processes. The session aims to familiarize students with these concepts and their practical implications in decision-making and problem-solving.

Uploaded by

dagos14839
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Department of

COURSE NAME: BIGCSE


DATA OPTIMIZATION
COURSE CODE: 21CS3276R
TOPIC :
INTRODUCTION TO OPTIMIZATION

Session -
AIM OF THE SESSION

To familiarize students with the basic concept of optimization

INSTRUCTIONAL OBJECTIVES

This Session is designed to:


1. Demonstrate the introduction to the optimization
2. Describe the optimization with an example

LEARNING OUTCOMES

At the end of this session, you should be able to:


1. Define optimization
2. Describe the optimization with an example
SESSION INTRODUCTION

• Optimization can be defined as the process of


finding the conditions that give the maximum
or minimum value of a function.
OPERATIONS RESEARCH

• Branch of mathematics concerned with the


application of scientific methods and techniques to
decision making problems and with establishing
the best or optimal solutions.

• The optimum seeking methods are also known as


mathematical programming techniques and are
generally studied as a part of operations research.
OPERATIONS RESEARCH

• Early period of World War II. During the war, the British
military faced the problem of allocating very scarce and
limited resources (such as fighter airplanes, radars, and
submarines) to several activities (deployment to numerous
targets and destinations).
• Because there were no systematic methods available to
solve resource allocation problems, the military called upon
a team of mathematicians to develop methods for solving
the problem in a scientific manner.
• The methods developed by the team were instrumental in
the winning of the Air Battle by Britain. These methods, such
as linear programming, which were developed as a result of
research on (military) operations, subsequently became
known as the methods of operations research.
METHODS OF OPERATIONS RESEARCH
APPLICATIONS

• Knapsack problem.
• Travelling sales man problem.
• Job assignment problem.
• Weapon target assignment problem.
• Vehicle routing problem
OPTIMIZATION PROBLEM
OPTIMIZATION PROBLEM
DESIGN VECTOR

• Any engineering system or component is defined


by a set of quantities some of which are viewed
as variables during the design process.
• In general, certain quantities are usually fixed at
the outset and these are called preassigned
parameters.
• All the other quantities are treated as variables in
the design process and are called design or
decision variables
DESIGN CONSTRAINTS

• The restrictions that must be satisfied to produce an


acceptable are collectively called design
design
constraints.
• Constraint
– Consider Surface
an optimization problem with only inequality
constraints. g j( X )  0

– The set of values of X that satisfy the equation g j( X ) 

0 forms a hyper-surface in the design space and is


called a constraint surface.
CONSTRAINT SURFACES IN A HYPOTHETICAL
TWO-DIMENSIONAL DESIGN SPACE.
DESIGN CONSTRAINTS

• Constraint Surface
– The constraint surface divides the design space
into two regions

g j( X )  Infeasible or unacceptable
0

g j( X )  feasible or acceptable
0
CONSTRAINT SURFACES IN A HYPOTHETICAL TWO-
DIMENSIONAL DESIGN SPACE.
DESIGN CONSTRAINTS

• Composite Constraint Surface


– The collection of all the constraint surfaces
g j ( X )  , j = 1, 2, . . . , m, which separates
the 0acceptable region is called the composite
constraint surface.
CONSTRAINT SURFACES IN A HYPOTHETICAL
TWO-DIMENSIONAL DESIGN SPACE.
DESIGN CONSTRAINTS

• Bound point and Active constraint


• – A design point that lies on one or more than one constraint
surface is called a bound point, and the
associated constraint is called an active
constraint.
• free points
– Design points that do not lie on
any constraint surface are known as free
points.
CONSTRAINT SURFACES IN A HYPOTHETICAL
TWO-DIMENSIONAL DESIGN SPACE
OBJECTIVE FUNCTION

• The criterion with respect to which the design


is optimized, when expressed as a function of
the design variables, is known as the
criterion or merit or objective function.
• The choice of objective function is governed
by the nature of problem.
OBJECTIVE FUNCTION

• An optimization problem involving multiple


objective functions is known as a multi-
objective programming problem.
• With multiple objectives there arises a
possibility of conflict, and one simple way to
handle the problem is to construct an overall
objective function as a linear combination of
the conflicting multiple objective functions.
OBJECTIVE FUNCTION SURFACES

• The locus of all points satisfying


f(X) = C = constant
forms a hyper surface in the design space,
and each value of C corresponds to a different
member of a family of surfaces. These
surfaces, called objective function surfaces
OBJECTIVE FUNCTION SURFACES
TRAVELING SALESMAN PROBLEMS
• Traveling salesman problems
– Given a set of cities and a cost to travel from one city to
another, seeks to identify the tour that will allow a salesman
to visit each city only once, starting and ending in the
same city, at the minimum cost.
TRAVELING SALESMAN PROBLEMS
• Traveling salesman problems
– Given a set of cities and a cost to travel from one city to another, seeks to identify the tour
that will allow a salesman to visit each city only once, starting and ending in the same
city, at the minimum cost.

Each city be arrived at from exactly one


other city
each city there is a departure to
exactly one other city.
There is only a single tour covering all
cities
TRAVELING SALESMAN PROBLEMS
TRAVELING SALESMAN PROBLEMS
MULTI-OBJECTIVE PROGRAMMING
PROBLEM
CONES PROBLEM
SESSION INTRODUCTION
SESSION INTRODUCTION
LINEAR PROGRAMMING PROBLEM
• Linear programming (LP, also called linear optimization) is a
method to achieve the best outcome (such as maximum profit or
lowest cost) in a mathematical model whose requirements are
represented by linear relationships.
LINEAR PROGRAMMING PROBLEM

• Suppose that a farmer has a piece of farm land, say L km


square, to be planted with either wheat or barley or some
combination of the two. The farmer has a limited amount of
fertilizer, F kilograms, and pesticide, P kilograms. Every
square kilometer of wheat requires F1 kilograms of fertilizer
and P1 kilograms of pesticide, while every square kilometer
of barley requires F2 kilograms of fertilizer and P2
kilograms of pesticide. Let S1 be the selling price of wheat
per square kilometer, and S2 be the selling price of barley.
If we denote the area of land planted with wheat and barley by
x1 and x2 respectively, then profit can be maximized by
choosing optimal values for x1 and x2.
LINEAR PROGRAMMING PROBLEM
NONLINEAR PROGRAMMING PROBLEM

• If any of the functions among the objective and constraint functions is


nonlinear, the problem is called a nonlinear programming (NLP)
problem.

• This the most general


problem
is other problems can
programming
considered and all be cases of the NLP
as special
problem.
NONLINEAR PROGRAMMING PROBLEM
• Nonlinear System: Change of the output is
not proportional to the change of the input.
• The behavior of a nonlinear system is described in
mathematics by a nonlinear system of equations.
• Nonlinear system of equations : unknowns appear
as variables of a polynomial of degree higher than
one.
• In a nonlinear system of equations, the equation(s) to
be solved cannot be written as a linear combination
of the unknown variables or functions that appear in
them.
NONLINEAR PROGRAMMING PROBLEM
• A simple problem can be defined by the constraints
NONLINEAR PROGRAMMING PROBLEM

• Another simple problem can be defined by


the constraints
CLASSICAL
OPTIMIZATION
TECHNIQUES
NONLINEAR PROGRAMMING PROBLEM

• The classical methods of optimization are useful in


finding the optimum solution of continuous and
differentiable functions.
• Since some of the practical problems involve
objective functions that are not continuous and/or
differentiable, the classical optimization techniques
have limited scope in practical applications.
SINGLE-VARIABLE OPTIMIZATION
SELF-ASSESSMENT QUESTIONS

1.Data mining is an integral part of _________

(a) KDD…
(b) OLTP
(c) Data Cube
(d) All of the above

2. The data is stored, retrieved & updated in ____________

(a) OLTP
(b)OLAP
(c) KDD
(d) None of the above
TERMINAL QUESTIONS

1. Describe Datawarehouse ?

2. List out the differences between Data warehouse, Data mart and Data Lake ?

3. Analyze the Data warehouse 3 tier Architecture ?

4. Summarize the OLAP Server architecture ?


REFERENCES FOR FURTHER LEARNING OF THE SESSION

Reference Books:
1. . Han J & Kamber M, “Data Mining: Concepts and Techniques”, Third Edition, Elsevier,

2011.
2. Anahory, Murray, “Data Warehousing in the Real World”, Pearson Education, 2008.
3. M.Humphires, M.Hawkins, M.Dy,“Data Warehousing: Architecture and
Implementation”, Pearson Education, 2009.

Sites and Web links:


1. https://www.youtube.com/watch?v=G4NYQox4n2g&ab_channel=nptelhrd
2.https://www.youtube.com/watch?v=maKj5ovDfg&list=PL97D13C16B8A3C304
&ab_channel=nptelhrd
THANK YOU

Team – BDO
Department of
COURSE NAME: BIGCSE
DATA OPTIMIZATION
COURSE CODE: 21CS3276R
TOPIC :
EVOLUTIONARY & GENETIC
ALGORITHMS
Session -
AIM OF THE SESSION

To familiarize students with the basic concept of EVOLUTIONARY & GENETIC ALGORITHMS

INSTRUCTIONAL OBJECTIVES

This Session is designed to:


1. Demonstrate the introduction to the optimization
2. Describe the optimization with an example

LEARNING OUTCOMES

At the end of this session, you should be able to:


1. Define optimization
2. Describe the optimization with an example
EVOLUTIONARY
ALGORITHMS
THE INSPIRATION FROM
BIOLOGY
• Darwinian Evolution
• Given an environment that can host only a limited
number of individuals, and the basic instinct of
individuals to reproduce, selection becomes inevitable
if the population size is not to grow exponentially.
• Natural selection favors those individuals that compete
for the given resources most effectively, in other
words, those that are adapted or fit to the
environmental conditions best.
• This phenomenon is also known as survival of the
fittest.
EVOLUTIONARY COMPUTING:
WHY?
• Developing automated problem solvers (that is,
algorithms) is one of the central themes of
mathematics and computer science.
• Nature's solutions has always been a source of
inspiration, copying "natural problem solvers”
• The most powerful natural problem solver, there
are two rather straight forward candidates:
– The human brain-neurocomputing
– The evolutionary process-evolutionary computing
EVOLUTIONARY COMPUTING:
WHY?
1. Optimization problems

2. Modeling or system identification


problem

3. Simulation
problem
WHAT IS AN EVOLUTIONARY
ALGORITHM?
• There are many different variants of evolutionary algorithms.
The common underlying idea behind all these techniques is the
same:
1. Given a population of individuals
2. The environmental pressure causes natural selection
(survival of the fittest), which causes a rise in the fitness of
the population.
WHAT IS AN EVOLUTIONARY
ALGORITHM?
WHAT IS AN EVOLUTIONARY
ALGORITHM?
PROPERTIES OF EVOLUTIONARY
ALGORITHM
• EAs are population based, i.e., they process a
whole collection of candidate solutions simultaneously.

• EAs mostly use recombination to mix information of


more candidate solutions into a new one.

• EAs are stochastic.


– Having a random probability distribution or
pattern that may be analyzed statistically but may
not be predicted precisely.
COMPONENTS OF EVOLUTIONARY ALGORITHMS
1. Representation (definition of individuals)

2. Evaluation function (or fitness function)

3. Population

4. Parent selection mechanism

5. Variation operators, recombination and mutation

6. Survivor selection mechanism (replacement)


REPRESENTATION (DEFINITION OF
INDIVIDUALS)
• The first step in defining an EA is to link the "real world" to the
"EA world“.
– Phenotypes - Objects forming possible solutions
within the
original problem context.
– Genotypes - Objects encoding, that is, the individuals within
the EA.
• Representation
– Specifying a mapping from the phenotypes onto
a set of genotypes that are said to represent these
phenotypes.
– In case of set of integers, 18 would be seen as a
phenotype, and 10010 as a genotype.
EVALUATION FUNCTION (FITNESS FUNCTION)
• It is a function or procedure that assigns a quality
measure to genotypes.

• To maximize square(x) fitness of the genotype


10010 could be defined as the square of its
corresponding phenotype: Square(18)=324.

• Also called objective function.


POPULATION
• The role of the population is to hold (the
representation of) possible solutions.
• A population is a multiset of genotypes.
• Defining a population can be as simple as specifying
how many individuals are in it, that is, setting the
population size.
• Best individual of the given population is chosen to seed
the next generation, or the worst individual of the
given population is chosen to be replaced by a new one.
• The diversity of a population is a measure of the
number of different solutions present.
PARENT SELECTION
• TheMECHANISM
role of parent selection or mating selection is to
distinguish among individuals based on their quality, in
particular, to allow the better individuals to become
parents of the next generation.
• An individual is a parent if it has been selected to
undergo variation in order to create offspring.
• High-quality individuals get a higher chance to
become parents than those with low quality.
• Nevertheless, low-quality individuals are often given
a small, but positive chance; otherwise the whole
search could become too greedy and get stuck in a
local optimum.
VARIATION
OPERATORS
• The role of variation operators is to create
new individuals from old ones.

• Mutation

• Recombination
MUTATION

• A unary variation operator is commonly called


mutation.
• It is applied to one genotype and delivers a
(slightly) modified mutant, the child or offspring of
it.
RECOMBINATION

• A binary variation operator is called recombination


or crossover.
• As the names indicate, such an operator merges
information from two parent genotypes into one or
two offspring genotypes.
SURVIVOR SELECTION MECHANISM
(REPLACEMENT)
• The role of survivor selection or environmental
selection is to distinguish among individuals based
on their quality.
• Survivor selection is also often called replacement
or replacement strategy.
INITIALIZATION

• Initialization is kept simple in most EA applications:


The first population is seeded by randomly
generated individuals.
• In principle, problem specific heuristics can be used
in this step aiming at an initial population with
higher fitness.
TERMINATION CONDITION

• If the problem has a known optimal fitness level, probably


coming from a known optimum of the given objective
function, then reaching this level (perhaps only with a given
precision E > 0) should be used as stopping condition.

• The maximally allowed CPU time elapses.


• The total number of fitness evaluations reaches a given
limit.
• For a given period of time (i.e, for a number of
generations or fitness evaluations), the fitness
improvement remains under a threshold value.
• The population diversity drops under a given
threshold.
THE EIGHT-QUEENS PROBLEM

• Our candidate solutions are complete, rather than partial,


board configurations where all eight queens are placed.
• The quality q(p) of any phenotype can be simply quantified
by the number of checking queen pairs.
• q(p) = 0, indicates a good solution.
• As for mutation we can use an operator that selects two
positions in a given chromosome randomly and swaps the
values standing on those positions.
THE EIGHT-QUEENS PROBLEM
• we select two parents delivering two children and the new
population of size n will contain the best n of the resulting n +
2 individuals.
• Parent selection will be done by choosing five
individuals randomly from the population and taking the
best two as parents that undergo crossover.
THE EIGHT-QUEENS PROBLEM
• The strategy we will use merges the population and offspring,
then ranks them according to fitness, and deletes the worst
two.
• Terminate the search if we find a solution or 10,000 fitness
evaluations have elapsed.
THE EIGHT-QUEENS PROBLEM
0-1 KNAPSACK
Let us consider that the capacity of the knapsack
W = 60 and the list of provided items are
shown in the following table.

Item A B C D

Profit (pi) 280 100 120 50

Weight(wi) 40 10 20 10

Ratio (pi/wi) 7 10 6 5
0-1 KNAPSACK
After sorting, the itemsare as shown
in the following table.

Item B A C D

Profit (pi) 100 280 120 50

Weight(wi) 10 40 20 10

Ratio (pi/wi) 10 7 6 5
0-1 KNAPSACK
After sorting, the itemsare as shown
in the following table.
Item B A C D

Profit (pi) 100 280 120 50

Weight(wi) 10 40 20 10

Ratio (pi/wi) 10 7 6 5

The total weight of the selected items is 10 + 40 + 10= 60


And the total profit is 100 + 280 + 50= 380 + 50 = 430
THE KNAPSACK
PROBLEM
GENETIC ALGORITHMS
GENETIC ALGORITHMS
• Genetic Algorithms (GAs) were developed by Prof. John
Holland and his students at the University of Michigan during
the 1960s and 1970s.
GENETIC
• ALGORITHMS
Maximizing the values of x^2 for x in the range 0-31.
GENETIC ALGORITHMS
GENETIC
ALGORITHMS

28 784
25 625
27 729

20 400
2538
634.5
784
REPRESENTATION OF
1. INDIVIDUALS
Binary Representations

2. Integer Representations

3. Real-Valued or Floating-Point Representation

4. Permutation Representations
Mutation
1. Mutation for Binary Representations
MUTATION OPERATORS FOR INTEGER
REPRESENTATIONS
2. Mutation Operators for Integer Representations

Random Resetting

• “Bit-flipping" mutation of binary encodings is extended to "random resetting”

•With probability Pm a new value is chosen at random from the set of permissible values in each position.

Creep Mutation

• Tended to make small changes relative to the range of permissible values.

•Designed for ordinal attributes and works by adding a small (positive or negative) value to each gene with probability
p.
MUTATION OPERATORS FOR FLOATING-POINT
REPRESENTATIONS
2. Mutation Operators for Floating-Point Representations

Change the allele value of each gene randomly within its


domain given by a lower Li and upper Ui bound, resulting in
the following transformation:

Uniform Mutation
• The values of x’ are drawn uniformly randomly from [Li, Ui]
•Analogous to bit-flipping for binary encodings and the random
resetting sketched for integer encodings.
• Position wise mutation probability
MUTATION OPERATORS FOR FLOATING-POINT
REPRESENTATIONS
2. Mutation Operators for Floating-Point Representations

Change the allele value of each gene randomly within its


domain given by a lower Li and upper Ui bound, resulting in
the following transformation:

Non-uniform Mutation with a Fixed Distribution


• Analogous to the creep mutation.
•Adding to the current gene value an amount drawn randomly
from a Gaussian distribution with mean zero and user-specified
standard deviation, and then curtailing the resulting value to
the range [Li, Ui ] if necessary.
MUTATION OPERATORS FOR PERMUTATION
REPRESENTATIONS
• Swap Mutation

• Insert Mutation

• Scramble Mutation

• Inversion Mutation
1 2 3 4 5 6 7 8 9 1 5 4 3 2 6 7 8 9
RECOMBINATION
• Recombination, the process whereby a new individual solution
is created from the information contained within two (or
more) parent solutions.

• Recombination Operators for Binary Representations

• One-Point Crossover
RECOMBINATION OPERATORS FOR BINARY
REPRESENTATIONS
• N-Point Crossover
RECOMBINATION OPERATORS FOR BINARY
REPRESENTATIONS
• Uniform Crossover
• In each position, if the value is below a parameter p (usually
0.5), the gene is inherited from the first parent; otherwise
from the second. The second offspring is created using the
inverse mapping.
RECOMBINATION OPERATORS FOR INTEGER
REPRESENTATIONS

• Same set of operators as for binary representations.


RECOMBINATION OPERATORS FOR FLOATING-
POINT REPRESENTATIONS
• Arithmetic Recombination
– Three types of arithmetic
recombination
• Simple Recombination
RECOMBINATION OPERATORS FOR FLOATING-POINT
REPRESENTATIONS
• Single Arithmetic Recombination
– Pick a random allele k. At that position, take
the
arithmetic average of the two parents.
RECOMBINATION OPERATORS FOR FLOATING-POINT
REPRESENTATIONS
• Whole Arithmetic Recombination
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS
• Partially Mapped Crossover
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS
• Partially Mapped Crossover
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS
• Partially Mapped Crossover
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS
• Partially Mapped Crossover
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS
• Edge Crossover
• Edge crossover is based on the idea that an offspring should
be created as far as possible using only edges that are
present in one or more parent.

• Most commonly used version: edge-3 crossover after Whitley,


which is designed to ensure that common edges are preserved.
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS
• Edge Crossover
RECOMBINATION OPERATORS FOR PERMUTATION
REPRESENTATIONS
• Edge Crossover
1. Let K be the empty list Let N be the first node of a random parent.
2. While Length(K) < Length(Parent):
1. K := K, N (append N to K)
2. Remove N from all neighbor lists
3. If N's neighbor list is non-empty
4. then let N* be the neighbor of N with the fewest neighbors in its list
(or a random one, should there be multiple)
5. else let N* be a randomly chosen node that is not in K 6. N := N*
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS
• Edge Crossover [1 2 3 4 5 6 7 8 9] and [9 3 7 8 2 6 5 1 4]
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS
• Edge Crossover [1 2 3 4 5 6 7 8 9] and [9 3 7 8 2 6 5 1 4]
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS

• Edge Crossover CABDEF and ABCEFD


RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS

• Edge Crossover CABDEF and ABCEFD

Answer: ABDFCE
RECOMBINATION OPERATORS FOR PERMUTATION
REPRESENTATIONS
• Order Crossover [Designed by Davis for order-based
permutation]
• problems.]
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS
• Order Crossover
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS
• Cycle Crossover
• The operator works by dividing the elements into cycles.
• A cycle is a subset of elements that has the property that each
element always occurs paired with another element of the
same cycle when the two parents are aligned.
• Having divided the permutation into cycles,
• The offspring are created by selecting alternate cycles from
each parent.
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS
• Cycle Crossover
• The procedure for constructing cycles is as follows:
RECOMBINATION OPERATORS FOR
PERMUTATION REPRESENTATIONS
• Cycle Crossover

1 2 3 4 5 6 7 8 9 1 3 7 4 2 6 5 8 9

9 3 7 8 2 6 5 1 4 9 2 3 8 5 6 7 1 4
POPULATION MODEL

• Generational model
• In each generation we begin with a population of size
from which a mating pool of parents is selected.
• Next, offspring are created from the mating pool by
the application of variation operators, and evaluated.
• After each generation, the whole population is replaced by its
offspring, which is called the "next generation" .
POPULATION MODEL
• Steady-state model
• In the steady state model, the entire population is not changed at once, but rather a part
of it.
• Select from (λ+μ)

• Generational gap
– If μ parents and λ offspring Generation gap
= λ/μ
PARENT SELECTION
• Fitness Proportional Selection

• The selection probability depends on the absolute fitness value of the


individual compared to the absolute fitness values of the rest of the
population.

• When fitness values are all very close together, there is almost no
selection pressure.
• Premature convergence.
PARENT
SELECTION
• Ranking Selection
– It preserves a constant selection pressure by sorting the population on
the basis of fitness, and then allocating selection probabilities to
individuals according to their rank, rather than according to their actual
fitness values.
PARENT SELECTION

• Tournament Selection
SURVIVOR SELECTION

• Age-Based Replacement
• Aged individuals will be changed
• Fitness-Based Replacement
• fitness proportionate and tournament selection
• Replace Worst (GENITOR)
• Elitism
CREDIT JASON
LOHN

NASA ST5 Mission had challenging


requirements for antenna of 3 small
spacecraft.
EA designs outperformed human
expert ones
and are nearly spacebound.
REFERENCE

• Prof. Dr. A. E. Eiben, Dr. J. E. Smith auth.


Introduction to Evolutionary Computing, Corrected
second printing,Springer-ACM,2007
THANK YOU

Team – BDO
Department of
COURSE NAME: BIGCSE
DATA OPTIMIZATION
COURSE CODE: 21CS3276R
TOPIC :
DIFFERENTIAL EVOLUTION ALGORITHM

Session -
AIM OF THE SESSION

To familiarize students with the basic concept of differential evolution algorithm

INSTRUCTIONAL OBJECTIVES

This Session is designed to:


1. Demonstrate the differential evolution algorithm
2. Describe the algorithm with an example

LEARNING OUTCOMES

At the end of this session, you should be able to:


1. Define differential evolution algorithm
2. Describe the algithm with an example
DIFFERENTIAL
EVOLUTION
DIFFERENTIAL
EVOLUTION
• Differential evolution (DE) was developed by Rainer Storn and
Kenneth V. Price around 1995.
• DE is a unique evolutionary algorithm because it is not
biologically motivate
• DE is used for multidimensional real-valued functions but does
not use the gradient of the problem being optimized, which
means DE does not require the optimization problem to be
differentiable.
• DE can therefore also be used on optimization problems that are
not even continuous, are noisy, change over time, etc.

1
2
1
DIFFERENTIAL EVOLUTION
• Inspired from the real worlds problem of Digital filter
coefficients
• In signal processing, a digital filter is a system that performs
mathematical operations on a sampled, discrete-time signal to
reduce or enhance certain aspects of that signal.

E=w(Exp-Actual)

1
2
2
DIFFERENTIAL EVOLUTION
• DE optimizes a problem by maintaining a population of
candidate solutions and creating new candidate solutions by
combining existing ones according to its simple formulae, and
then keeping whichever candidate solution has the best score
or fitness on the optimization problem at hand.

Global Minimum:

xi ∈ [-32.768,
32.768]

4
A BASIC DIFFERENTIAL EVOLUTION
ALGORITHM

• DE is based on the idea of taking the difference vector


between two individuals, and adding a scaled version of the
difference vector to a third individual to create a new
candidate solution.

1
2
4
A BASIC DIFFERENTIAL EVOLUTION
ALGORITHM

1
2
5
A BASIC DIFFERENTIAL EVOLUTION
ALGORITHM
• This algorithm is often referred to as classic DE.
• It is also called DE/rand/1/bin because the base vector, xri,
is randomly chosen; one vector difference (that is, F(xr2 -
xr3)) is added to xr1 and the number of mutant vector
elements that are contributed to the trial vector closely
follows a binomial distribution.
• In probability theory and statistics, the binomial
distribution with parameters n and p is discrete
probability
the distribution of the number of successes in
a sequence of n independent experiments, each asking a
no
yes– question, and each with its own Boolean-
valued outcome: yes or no (with probability q = 1 − p).
• It would exactly follow a binomial distribution if not for the "j = Jr" test

7
DIFFERENTIAL EVOLUTION VARIATIONS
• Trial Vectors
– DE/rand/1/L works by generating a random integer L ϵ [l,n], copying L
consecutive features from vi to ui, and then copying the remaining
features from Xi to Ui

12
7
DIFFERENTIAL EVOLUTION VARIATIONS
• For example, suppose that we have a seven-dimensional problem
(n = 7). The DE/rand/1/L algorithm works by first generating a
random integer L ϵ [l,n]; suppose that L = 3. We then generate a
random starting point s ϵ[l,n]; suppose that s = 6.

12
8
DIFFERENTIAL EVOLUTION VARIATIONS

Under what conditions is the expected number of mutant


vector elements copied to the trial vector equal for the
bin and L options?

129
DIFFERENTIAL EVOLUTION VARIATIONS

Under what conditions is the expected number of mutant


vector elements copied to the trial vector equal for the
bin and L options?

130
MUTANT VECTORS

• Instead of randomly choosing the base vector xr1 , it may be


beneficial to always use the best individual in the population
as the base vector.
• That way the entire set of trial vectors for i ϵ [1, n] is
Ui comprised of mutations of the best
individual.
• This approach is called DE/best/1/bin.

• where xb is the best individual in the population.

13
1
MUTANT VECTORS

13
2
MUTANT VECTORS

• Another option is to use two difference vectors to create the


mutant vector [Storn and sPrice, 1996]

• DE/rand/2/bin or DE/best/2/bin
• DE/rand/2/L or DE/best/2/L

13
3
MUTANT VECTORS

• DE can also be implemented by using the current Xi as the base


vector

• DE/target/1/bin, DE/target/2/bin,
• DE/target/1/L, or DE/target/2/L

13
4
MUTANT VECTORS

• Yet another option is to create the difference vector by using


the best individual in the population, xb.
• This tends to create mutant vectors that all move toward xb.
The vector that is subtracted from xb could be a random
individual or the base individual.

•If the last equation above is used to generate Vi,


the algorithm is called DE/target-to-best/1/bin
13
5
MUTANT VECTORS

• either-or algorithm
• We could combine various methods by randomly
deciding how to generate the mutant vector.

• K = (F + l)/2 gives good results in benchmark problems.

13
6
MUTANT VECTORS

DE performance on the 20-dimensional Ackley function.

13
7
MUTANT VECTORS

DE performance on the 20-dimensional Ackley function.

13
8
SCALE FACTOR
ADJUSTMENT
• DE's scale factor F determines the effect that difference vectors
have on the mutant vector.
• So far we have assumed that F is a constant.
• We can vary the DE scale factor two different ways.
• Dither
• Jitter
• Dither:- we can allow F to remain a scalar and
randomly
change it each time through the "for each individual“
• Jitter:-we can change F to an n-element vector and
randomly change each element of F in the "for each
individual" loop, so that each element of the mutant vector
v is modified by a uniquely-scaled component of the
difference vector. 13
9
SCALE FACTOR ADJUSTMENT

Dither

Jitter

14
0
SCALE FACTOR
ADJUSTMENT

• DE performance on the 20-dimensional Ackley function with crossover rate c


= 0.9. The traces show the cost of the best individual at each generation,
averaged over 100 Monte Carlo simulations. The use of a constant scale factor
F performs slightly better than dithering or jittering.
14
1
DISCRETE OPTIMIZATION

• The only place that discrete domains cause a problem


in DE is in the generation of the mutant vector.

• Since F ϵ [0,1], Vi might not belong the


problem
domain D.

14
2
MIXED-INTEGER DIFFERENTIAL
EVOLUTION
• One obvious approach to ensure that Vi ϵ D is to
simply project it onto D.
• For example, if D is the set of n-dimensional integer
vectors, then

round function operates element-by-element on a


vector
• A more general way to do this is

where P is a projection operator such that P(x) ϵ D for all x.


14
3
MIXED-INTEGER DIFFERENTIAL EVOLUTION
• P could be more complicated

Projection of the continuous-valued vector x onto a discrete-valued vector a.

14
4
DISCRETE DIFFERENTIAL
EVOLUTION
• Another way to modify DE for discrete problems is to change
the mutant vector generation method so that it directly
creates mutant vectors that lie in the discrete domain D.

14
5
DIFFERENTIAL EVOLUTION AND GENETIC
ALGORITHMS

14
6
DIFFERENTIAL EVOLUTION AND GENETIC
ALGORITHMS

14
7
SESSION INTRODUCTION
SESSION INTRODUCTION
SELF-ASSESSMENT QUESTIONS

1._________

(a) KDD…
(b) OLTP
(c) Data Cube
(d) All of the above

2. The data is stored, retrieved & updated in ____________

(a) OLTP
(b)OLAP
(c) KDD
(d) None of the above
TERMINAL QUESTIONS

1.
REFERENCES FOR FURTHER LEARNING OF THE SESSION

Reference Books:
1. . Han J & Kamber M, “Data Mining: Concepts and Techniques”, Third Edition, Elsevier,

2011.
2. Anahory, Murray, “Data Warehousing in the Real World”, Pearson Education, 2008.
3. M.Humphires, M.Hawkins, M.Dy,“Data Warehousing: Architecture and
Implementation”, Pearson Education, 2009.

Sites and Web links:


1. https://www.youtube.com/watch?v=G4NYQox4n2g&ab_channel=nptelhrd
2.https://www.youtube.com/watch?v=maKj5ovDfg&list=PL97D13C16B8A3C304
&ab_channel=nptelhrd
THANK YOU

Team – BDO
DEPARTMENT OF CSE H

BIG DATA OPTIMIZATION


21CS3276R
Topic:
Particle Swarm Optimization (PSO) for Sphere

Session - 16

CREATED BY K. VICTOR
BABU
HONEYBEE’S
FISH SCHOOLING ANT COLONY
ANT COLONY
AIM OF THE
SESSION
To familiarize students with Particle Swarm Algorithm and its applications

INSTRUCTIONAL OBJECTIVES

This Session is designed to:


1. Demonstrate Particle Swarm Algorithm
2. Describe the applications of Particle Swarm Algorithm
3. List out the properties of Particle Swarm Algorithm
4. Solving optimization problems using Particle Swarm Algorithm

LEARNING OUTCOMES

At the end of this session, you should be able to:


1. Define Particle Swarm Algorithm
2. Describe the properties and applications of Particle Swarm Algorithm
3. Summarize the concepts with their applications
OVERVIEW OF PARTICLE SWARM OPTIMIZATION
ALGORITHM
The initial ideas on particle swarms of Kennedy (a social psychologist) and Eberhart (an electrical engineer) were
essentially aimed at producing computational intelligence by exploiting simple analogues of social interaction, rather
than purely individual cognitive abilities in 1995.
It involved analogues of bird flocks searching for corn.
In PSO a number of simple entities, called particles, are placed in the search space of some problem or function,
and each evaluates the objective function at its current location.
Each particle then determines its movement through the search space by combining some aspect of the history of
its own current and best (best-fitness) locations with those of one or more members of the swarm, with some
random perturbations.
The next iteration takes place after all particles have been moved. Eventually the swarm as a whole, like a flock of
birds collectively foraging for food, is likely to move close to an optimum of the fitness function.
Swarm: A large group of insects, like bees or locusts, flying or moving together
PARTICLE SWARM ALGORITHM

To illustrate what “cooperation” means in PSO, here is a simplistic example.

As usually, the big fish is difficult to catch, hidden in the deepest part of the pond. At each time step, each
fisherman tells the other how deep the pond is at his place. At the very begining, as the depths are quite
similar, they both follow their own ways. Now, Fisherman 2 seems to be on a better place, so Fisherman 1
tends to go towards him quite rapidly. Now, the decision is a bit more difficult to make. On the one hand
Fisherman 2 is still on a better place, but on the other hand, Fisherman 1’s position is worse than before. So
Fisherman 1 comes to a compromise: he still goes towards Fisherman 2, but more slowly than before. As we
can see, doing that, he escapes from the local minimum.
Of course, this example is a caricatural one, but it presents the main features of a particle in basic PSO:
 a position,
 a velocity (or, more precisely an operator which can be applied to a position in order to modify it),
 the ability to exchange information with its neighbours,
 the ability to memorize a previous position, and
 the ability to use information to make a decision.
Remember: All that as to remain simple.
THE BASIC IDEA

Each particle is searching for the optimum


Each particle is moving and hence has a velocity.
Each particle remembers the position it was in where it had its best result so far
(its personal best)
But this would not be much good on its own; particles need help in figuring out
where to search.

The particles in the swarm co-operate. They exchange information about what
they’ve discovered in the places they have visited
The co-operation is very simple. In basic PSO it is like this:
• A particle has a neighbourhood associated with it.
• A particle knows the fitnesses of those in its neighbourhooda and
• uses the position of the one with best fitness.
• This position is simply used to adjust the particle’s velocity
COOPERATION
THE BASIC INITIALIZATION: POSITIONS AND VELOCITIES

Initial Values

Boundary
Values
Movements
THE BASIC IDEA

WHAT A PARTICLE DOES?

• In each timestep, a particle has to move to a new


position. It does this by adjusting its velocity.
• The adjustment is essentially this:
• The current velocity PLUS
• A weighted random portion in the direction of its personal best PLUS
• A weighted random portion in the direction of the neighbourhood best.
• Having worked out a new velocity, its position is simply its old
position plus the new velocity.
THE BASIC IDEA

Here you have another nice search space.


• First step: you put some particles on it. You can do it at random or on a regular way, or both. How
many? In practice, for most real problems with dimension between 2 and 100, a swarm size of 20
particles works quite well.
There are some mathematical ways to give an estimation, but a bit beyond the scope of this talk.
Also, as we will see some variants use an adaptive swarm size.
• Second step: you define a velocity for each particle, usually at random. You can set all initial velocities
to zero but, experimentally, it is usually not the best choice.
Remember: What we call “velocity” is in fact a move, just because time is discretized.
THE BASIC INITIALIZATION:
NEIGHBOURHOODS
POSITIONS AND VELOCITIES

geographic
al social
GLOBAL

Global
NEIGHBOURHOODS

Now, for each particle, we define what is called a neighbourhood. Although some variants use a “geographical”
neighbourhood, that is to say compute distances and take the nearest particles, the most widely used
neighbourhood is a “social” one: just a list of neighbours, regardless where they are.
So, you do not need to define a distance and that is a great advantage, for in some cases, particularly for discrete
spaces, such a definition would be quite arbitrary.
Note: It can be proved (and it is intuitively quite obvious) that if the process converges any social neighbourhood
tends to be also a geographical one.
Usually, in practice, social neighbourhoods are defined just once, at the very beginning, which is consistent with the
principle “simple rules for simple agents”.
Now, the size of the neighbourhood could be a problem. Fortunately, PSO is not very sensitive to this parameter and
most of users just take a value of 3 or 5 with good results.
Unlike for the swarm size, there is no mathematical formula, but like for the swarm size, there are some adaptive
variants.
NEIGHBOURHOODS

• The most commonly used neighbourhood is the circular one. The picture is almost self explanatory. Each
particle is numbered, put on a virtual circle according to its number and the neighbourhood of a given particle
is built by taking its neighbours on this circle.
• An important point for rule simplicity is that each particle belongs to its neighbourhood. For example if a rule
says “I have to check all my neighbours”, there is no need to add “and I have to check myself”. We will see
that more precisely later.
• The most commonly used neighbourhood is the circular one. The picture is almost self explanatory. Each
particle is numbered, put on a virtual circle according to its number and the neighbourhood of a given particle
is built by taking its neighbours on this circle.
• An important point for rule simplicity is that each particle belongs to its neighbourhood.
• For example if a rule says “I have to check all my neighbours”, there is no need to add “and I have to check
myself”. We will see that more precisely later.
THE CIRCULAR NEIGHBOURHOOD

Particle 1’s 3- 1
neighbourhoo 8 2
d

7 3

Virtual circle 4
6
5
PSYCHOSOCIAL COMPROMISE

Suppose, you are a particle.


• By the way, Jim Kennedy has designed a nice game in which you compete with such stupid particles. I have it here, and
if we have time, you will see it is almost impossible to beat it.
• You can compute how good is your position (compute the objective function at the place you are).
• You remember the best position you ever found (and the objective function value). You can ask your neighbours for
this information they also have memorized, and choose the best one.

Now, you have three tendencies,


- audacious, following your own way (just using your own velocity)
- conservative, going back more or less towards your best previous position
- sheeplike, going more or less towards your best neighbour

What PSO formalizes is, how to combine these tendancies in order to be globally efficient.
PSYCHOSOCIAL COMPROMISE

Particles adjust their Positions according to a “Psychosocial Compromise’’ between what an Individual is
Comfortable with, and what Society Reckons

My best perf.

pi
Here I am! The best perf. of my
neighbours
x
pg

v
PSO ALGORITHM

• Each individual in the particle swarm is composed of three D-dimensional vectors, where D is the dimensionality of the
search space. These are the current position xi, the previous best position pi, and the velocity vi.

• The current position xi can be considered as a set of coordinates describing a point in space. On each iteration of the
algorithm, the current position is evaluated as a problem solution. If that position is better than any that has been
found so far, then the coordinates are stored in the second vector, pi. The value of the best function result so far is
stored in a variable that can be called pbesti (for “previous best”), for comparison on later iterations. The objective, of
course, is to keep finding better positions and updating pi and pbesti. New points are chosen by adding vi coordinates
to xi, and the algorithm operates by adjusting vi, which can effectively be seen as a step size.

• The particle swarm is more than just a collection of particles.

Note: A particle by itself has almost no power to solve any problem; progress occurs only when the particles interact.
PSO ALGORITHM

The topology typically consists of bidirectional edges connecting pairs of particles, so that if j is in i’s
neighborhood, i is also in j’s. Each particle communicates with some other particles and is affected by the best
point found by any member of its topological neighborhood. This is just the vector pi for that best neighbor,
which we will denote with pg. The potential kinds of population “social networks” are hugely varied, but in
practice certain types have been used more frequently.
In the particle swarm optimization process,
• the velocity of each particle is iteratively adjusted so that the particle stochastically oscillates around pi and pg
locations.
Mathematical Model
• Each particle in particle swarm optimization has an associated position, velocity, fitness value.
• Each particle keeps track of the particle_bestFitness_value and particle_bestFitness_position.
• A record of global_bestFitness_position and global_bestFitness_value is maintained.
PSO ALGORITHM DATA
STRUCTURE

Data structure to store Swarm population Data structure to store ith particle of
Swarm
PSO ALGORITHM FLOWCHART

https://
www.baeldung.com/cs/pso
PSEUDOCODE

For each particle


Initialize particle Equation (a):
END v[i+1] = v[i]
+ c1 * rand() * (pbest[i] - present[i])
Do
For each particle + c2 * rand() * (gbest[i] - present[i])
Calculate fitness value
If the fitness value is better than its peronal best
set current value as the new pBest Equation (b):
End present[i+1] = present[i] + v[i]

Choose the particle with the best fitness value of all as gBest Intertia
For each particle
Calculate particle velocity according equation (a) Personal Influence Social Influence
Update particle position according equation (b)
End

While maximum iterations or minimum error criteria is not attained


Particles' velocities on each dimension are clamped to a maximum velocity Vmax. If the sum of accelerations would
cause the velocity on that dimension to exceed Vmax, which is a parameter specified by the user. Then the velocity on
that dimension is limited to Vmax.
PSO ALGORITHM- DETAILS

Step1: Randomly initialize Swarm population of N particles Xi d. Update new best of this particle and new best of Swarm
( i=1, 2, …, n) if swaInsensitive to scaling of design variables.rm[i].fitness <
Step 2: Select hyperparameter values swarm[i].bestFitness:
w, c1 and c2 swarm[i].bestFitness = swarm[i].fitness
Step 3: For Iter in range(max_iter): # loop max_iter times swarm[i].bestPos = swarm[i].position
For i in range(N): # for each particle:
a. Compute new velocity of ith particle if swarm[i].fitness < best_fitness_swarm
swarm[i].velocity = best_fitness_swarm = swarm[i].fitness
w*swarm[i].velocity + best_pos_swarm = swarm[i].position
r1*c1*(swarm[i].bestPos - swarm[i].position) + End-for
r2*c2*( best_pos_swarm - swarm[i].position) End -for
b. Compute new position of ith particle using its new Step 4: Return best particle of Swarm
velocity
swarm[i].position += swarm[i].velocity
c. If position is not in range [minx, maxx] then clip it
if swarm[i].position < minx: Particle Swarm Optimization (PSO) - An Overview - Geeksfor
Geeks
swarm[i].position = minx
elif swarm[i].position > maxx:
swarm[i].position = maxx
PARAMETERS OF PSO ALGORITHM

Parameters of problem:
• Number of dimensions (d)
• Lower bound (minx)
• Upper bound (maxx)
Hyperparameters of the algorithm:
• Number of particles (N)
• Maximum number of iterations (max_iter)
• Inertia (w)
• Cognition of particle (C1)
• Social influence of swarm (C2)
ADVANTAGES AND
DISADVANTAGES

Advantages of PSO:
1. Insensitive to scaling of design variables.
2. Derivative free.
3. Very few algorithm parameters.
4. Very efficient global search algorithm.
5. Easily parallelized for concurrent processing.
6. It is easy to implementation, so it can be applied both in scientific research and engineering problems.
7. It has a limited number of parameters and the impact of parameters to the solutions is small compared to other
optimization techniques.
8. The calculation in PSO algorithm is very simple.
9. Some techniques ensure convergence, and the optimum value of the problem calculates easily within a short time.
10. PSO is less dependent of a set of initial points than other optimization techniques.
11. It is conceptually very simple.

Disadvantages of PSO:
12. Slow convergence in the refined search stage (Weak local search ability).
13. PSO algorithm suffers from the partial optimism, which degrades the regulation of its speed and direction.
APPLICATIONS

Applications of PSO
• Detection and diagnosis of faults and recovery
from them
• Design or optimization of engineers and electrical
motors
• Applications in metallurgy
• Security and military applications
• Vehicle routing problems
• Signature verification
• Fuzzy neural networks
EXAMPLE

Example : Find the maximum of the function f(x) = x2 - 5x - 10 with -10 ≤ x ≤10 using the PSO algorithm. (one step)
Use 9 particles with the initial position
x1=-9, x2=-6, x3=-4, x4=-1, x5=0.6, x6=3, x7=3.8, x8=7, x9=10
Step 1: Choose the number of particles x1=-9, x2=-6, x3=-4, x4=-1, x5=0.6, x6=3, x7=3.8, x8=7, x9=10
The initial population (i.e., iteration number t=0 can be represented as x i, i=1, 2, 3, 4, 5, 6, 7, 8, 9
x10=-9, x20=-6, x30=-4, x40=-1, x50=0.6, x60=3, x70=3.8, x80=7, x90=10
Evaluate the objective function values are
f10= (-9)2-5*(-9) +10 = 46, f20= (-6)2-5*(-6) +10 = 16, f30= (-4)2-5*(-4) +10 = 6, f40= (-1)2-5*(-1) +10 = 6,
f50= (0.6)2-5*(0.6) +10 = 12.4, f60= (3)2-5*(3) +10 = 4, f70= (3.8)2-5*(3.8) +10 = 5.44, f80= (7)2-5*(7) +10 = 24
f90= (10)2-5*(10) +10 = 60
Let c1=c2=1. Set the initial velocities of each particle to zero.
Step 2: Set the iteration number as t=0+1 and go to step 3
Step 3: Find the personalbest(pbest)for each particle by,
P1best,1=-9, P1best,2=-6, P1best,3=-4, P1best,4=-1, P1best,5=0.6, P1best,6=3, P1best,7=3.8, P1best,8=7, P1best,9=10
Step 4: Find the Globalbest(gbest) by, Gbest= min {ptbest, t} Where i=1 to 9
Since the minimum personal best is P1best, 6 =3
EXAMPLE (Cont.)

Step 5: Considering the Random Numbers in the range(0,1) as r11=0.213 and r21=0.876 and
find the velocities of the particles by,
V1=0+0.213(9-9) +0.876(3+9) =10.512, V2=0+0.213(6-6) +0.876(3+6) =7.884
V3=0+0.213(2.6-2.6) +0.876(3+2.6) =4.9056, V4=0+0.213(-1+1) +0.876(3-1) =1.752
V5=0+0.213(0.6-0.6) +0.876(3-0.6) =2.1024, V6=0+0.213(3-3) +0.876(3-3) =0
V7=0+0.213(3.8-3.8) +0.876(3-3.8) =0.7008, V8=0+0.213(7-7) +0.876(3-7) =-3.504
V9=0+0.213(10-10) +0.876(3-10) =-6.132
Step 6: Find the new values of xi1,i=1 to 9 by
xi1+1= x it+vit+1
x 11=9+10.512=19.512, x 21=6+7.884=12.884, x 31=2.6+4.9056=7.5056, x 41=-1+1.752=0.752
x 51=0.6+2.1024=2.7024, x 61=3+0=3, x 71=3.8+0.7008=4.5008, x 81=7-3.504=4.504
x 91=10-6.132=4.132
Step 7: Stopping criteria
SUMMARY

In this session, the concepts of PSO have described


1. Define PSO algorithm and its properties
2. Explain the PSO algorithm with flow chart.
3. List out the advantages and disadvantages of PSO algorithm.
4. List out the Applications of PSO algorithm.
SELF-ASSESSMENT QUESTIONS

Q1. What is the full form of PSO?

(a) Particle Swarm Optimization


(b) Particle System Optimization
(c) Particle System Operation
(d) Particle Swarm Operation
Q2. What is the difference between PSO and other optimization algorithms?

(a) Uses a population-based approach, while other algorithms use an individual-based approach.
(b) Uses an individual-based approach, while other algorithms use a population-based approach.
(c) Uses a deterministic approach, while other algorithms use a stochastic approach.
(d) Uses a stochastic approach, while other algorithms use a deterministic approach.

CREATED
BY K.
TERMINAL QUESTIONS

Q1. Find the maximum of the function f(x) = 2x2 - 3x +5 with -5 ≤ x ≤ 5 using the PSO algorithm. Explain
upto 2 iterations considering five initial random points, x1= -2, x2= 0.3, x3= -1, x4= 3, x5= 0.1.

Q2. Explain in detail the PSO algorithm with flow chart and its mathematical components.

Q3. Compare and contrast on Genetic Algorithm with PSO algorithm.

Q4. Explain in detail the parameters and hyper parameters used in the PSO algorithm.

Q5. Maximize the function f(x) = sin (x) with --1 ≤ x ≤ 1 using the PSO algorithm. Explain upto 2 iterations
considering five initial random points, x1= -0.2, x2= 0.3, x3= -0.5, x4= 0.2, x5= 0.1.

CREATED
BY K.
REFERENCES
Reference Books:
1. Chapter 5, Modern Optimization with R, Paulo Cortez, Second Edition, Springer, 2021
2. Nature-Inspired Algorithms and Applications, Anupriya Jain, Dinesh Goyal, S. Balamurugan, Sachin Sharma,
Seema Sharma, Sonia Duggal, Second Edition, Wiley
3. OPTIMIZATION Algorithms and Applications, Rajesh Kumar Arora, Second Edition, Taylor & Francis Group,
LLC

Sites and Web links:

4. https://www.intechopen.com/chapters/69586
5. https://towardsdatascience.com/swarm-intelligence-coding-and-visualising-particle-swarm-optimisation-i
n-python-253e1bd00772

6. http://www.swarmintelligence.org/tutorials.php
7. https://www.geeksforgeeks.org/introduction-to-swarm-intelligence/

CREATED
BY K.
THANK
YOU

Team – BDO Even Semester


2023- 24

CREATED
BY K.
DEPARTMENT OF CSE H

BIG DATA OPTIMIZATION


21CS3276R
Topic:
Estimation of Distribution Algorithm (EDA)

Session - 17

CREATED BY K. VICTOR
BABU
AIM OF THE
SESSION
To familiarize students with Estimation of Distribution Algorithm and its applications

INSTRUCTIONAL OBJECTIVES

This Session is designed to:


1. Demonstrate Estimation of Distribution Algorithm
2. Describe the applications of Estimation of Distribution Algorithm
3. List out the properties of Estimation of Distribution Algorithm
4. Solving optimization problemsusing Estimation of Distribution Algorithm

LEARNING OUTCOMES

At the end of this session, you should be able to:


1. Define Estimation of Distribution Algorithm
2. Describe the properties and applications of Estimation of Distribution Algorithm
3. Summarize the concepts with their applications
WHY EDA ALGORITHM

• To overcome the Simple GA framework EDA framework


negative effective of
Initial Population
the crossover and
mutation approach Initial Population
of variation, a
probabilistic Evaluation
approach of Evaluation
variation has been
proposed. Selection
• Algorithm using Selection
such approach is
known as EDA (or Probabilistic Model
PMBGA) Crossover Building
• PMBGA- Probabilistic
Model-building
Genetic Algorithm Sampling Child
Mutation Population
EDA

Estimation of Distribution Algorithms (EDAs) constitute a powerful evolutionary algorithm for solving
continuous and combinatorial optimization problems.
Based on machine learning techniques, at each generation, EDAs estimate a joint probability distribution
associated with the set of most promising individuals, trying to explicitly express the interrelations between
the different variables of the problem.
Based on this general framework, EDAs have proved to be very competitive for solving combinatorial and
continuous optimization problems.
EDA

• Estimation of distribution algorithms (EDA) are optimization methods that combine ideas from evolutionary
computation, machine learning, and statistics.
• Estimation of distribution algorithms are stochastic optimization algorithms that explore the space of candidate
solutions by sampling an explicit probabilistic model constructed from promising solutions found so far.
• EDAs typically work with a population of candidate solutions to the problem, starting with the population
generated according to the uniform distribution over all admissible solutions.
• The population is then scored using a fitness function.
• This fitness function gives a numerical ranking for each string, with the higher the number the better the string.
• From this ranked population, a subset of the most promising solutions are selected by the selection operator.
• An example selection operator is truncation selection with threshold t = 50%, which selects the 50% best
solutions.
EDA

• The algorithm then constructs a probabilistic model which attempts to estimate the probability
distribution of the selected solutions.
• Once the model is constructed, new solutions are generated by sampling the distribution encoded by this
model.
• These new solutions are then incorporated back into the old population, possibly replacing it entirely.
• The process is repeated until some termination criteria is met (usually when a solution of sufficient quality
is reached or when the number of iterations reaches some threshold), with each iteration of this procedure
usually referred to as one generation of the EDA.
EDA FLOWCHART
FUNCTIONS USED IN THE ALGORITHM

• selection function goal is to choose the most interesting solutions. For instance, truncation selection chooses a
percentage of the best solutions from current population (P).
• The essential steps of EDA are the estimation and simulation of the search distribution, which is implemented by the
learn and sample functions.
• The learning estimates the structure and parameters of the probabilistic model (M) and the sampling is used to
generate new solutions (P’) from the probabilistic model.
• Finally, the replacement function defines the next population.
ALGORITHM

Algorithm Generic EDA pseudo-code implemented in copulaedas package, adapted from (Gonzalez-Fernandez and Soto, 2014)
1: Inputs: f, C ⊲ f is the fitness function, C includes control parameters (e.g., NP )
2: P ← initialization(C) ⊲ set initial population (seeding method)
3: if required then P ← local_optimization(P, f, C) ⊲ apply local optimization to P
4: end if
5: B ← best(P, f ) ⊲ best solution of the population
6: i ← 0 ⊲ i is the number of iterations of the method
7: while not termination_criteria(P, f, C) do
8: P ′ ← selection(P, f, C) ⊲ selected population P ′
9: M ← learn(P ′) ⊲ set probabilistic model M using a learning method
10: P ′ ← sample(M) ⊲ set sampled population from M using a sampling method
11: if required then P ′ ← local_optimization(P ′, f, C) ⊲ apply local optimization to P ′
12: end if
13: B ← best(B, P ′, f ) ⊲ update best solution (if needed)
14: P ← replacement(P, P ′, f, C) ⊲ create new population using a replacement method
15: i←i+1
16: end while
17: Output: B ⊲ best solution
EDA ALGORITHM SCHEME
EDA ALGORITHM SCHEME
TYPES OF EDA’S

Different types of EDAs have been proposed based on the complexity of their probabilistic
models. They can be categorized into univariate, bivariate, and multivariate factorization EDAs.
• Univariate factorization: EDAs, such as Univariate Marginal Distribution Algorithm (UMDA),
assume that decision variables are independent and rely on univariate statistics and marginal
probability distributions.
• Bivariate factorization: EDAs Such as Mutual Information Maximizing Input Clustering (MIMIC)
and Bivariate Marginal Distribution Algorithm (BMDA), allow for pairwise dependencies between
variables and model these dependencies through bivariate distributions.
• Multivariate factorization: EDAs Such as Extended Compact Genetic Algorithm (ECGA) and
Bayesian Optimization Algorithm (BOA), can model higher-order dependencies between
variables and factorize the joint probability distribution into multiple components.
ADVANTAGES OF EDA

1. Global optimization: EDA can effectively explore large solution spaces and search for global optima in complex
optimization problems.
2. Model-based approach: EDA builds and updates probabilistic models of the solution space, allowing it to capture the
underlying structure of the problem, which can lead to more efficient search and better solutions.
3. Adaptive search: EDA can adapt the search strategy by continuously updating the probability models as new
information is gathered during the optimization process.
4. Parameter tuning: EDA algorithms often have fewer user-defined parameters compared to other optimization
techniques, making them more accessible to users who may not have extensive domain knowledge.
5. Versatility: EDA can be applied to a wide range of optimization problems, including combinatorial, continuous, and
mixed-integer problems, making it a versatile approach.
6. Robustness: EDA is less sensitive to issues like local optima, and it can explore multiple promising regions of the
solution space simultaneously.
7. Parallelism: EDA can be parallelized effectively, which can speed up the optimization process and handle larger
problem instances.
DISADVANTAGES OF EDA

1. Computational complexity: Building and maintaining probabilistic models can be computationally expensive,
especially in high-dimensional spaces, which may slow down the optimization process.
2. Convergence issues: EDA algorithms may not converge to the optimal solution in some cases, and it can be
challenging to determine when to stop the search.
3. Model estimation errors: The quality of the probabilistic models depends on the quality and quantity of the data
used for estimation. Errors in model estimation can lead to suboptimal solutions.
4. Limited applicability: EDA may not be the best choice for simple, well-structured problems where other
optimization methods, such as gradient-based methods, may be more efficient.
5. High memory requirements: EDA algorithms often require substantial memory to store and update probabilistic
models, which can be a limitation for resource-constrained environments.
6. Lack of domain-specific knowledge: EDA may not be suitable for problems where domain-specific knowledge is
crucial, as it primarily relies on data-driven modeling.
7. Sensitivity to parameter settings: While EDA typically has fewer user-defined parameters than some other
algorithms, the choice of parameters, such as population size and model selection, can still impact performance and
may require some tuning.
EXAMPLE: SOLVING ONEMAX WITH A SIMPLE
EDA

Let us illustrate the basic EDA procedure with an example of a simple EDA solving the onemax problem.
In onemax, candidate solutions are represented binary strings of fixed length n>0. The Objective function is to
maximize onemax, which is defined as sum of the bits in the input binary string
…….

f𝒐𝒏𝒆𝒎𝒂𝒙 (𝑿𝟏,𝑿𝟐, 𝑿𝟑, ……. 𝑿𝒏)


• The quality of a candidate solution improves with the number of 1’s in the input string and the optimum is string
of all 1’s.
• In this example our population size is set to N = 6, with n = 5 binary variables per solution.
• Truncation selection with threshold τ = 50% is used to select the subset of the most promising solutions
(the 50% best solutions are selected).
• To estimate the probability distribution of these promising solutions, a probability vector is used that stores
the probability of a 1 in each position of the solution strings.
SOLVING ONEMAX WITH A SIMPLE EDA

• The probability vector provides a fast and efficient model for solving the onemax problem and many other
optimization problems, mainly due to the fact that it is based on the assumption that all problem variables are
independent.
• To learn a probability vector, the probability pi of a 1 in each position i is set to the proportion of selected
solutions containing a 1 in this position.
• To generate a new binary string from the probability vector, for each position i, a 1 is generated in this position
with probability pi. For example, if p3 = 0.6, we generate a 1 in the third position of a new candidate solution with
the probability of 60%.
SOLVING ONEMAX WITH A SIMPLE EDA
THE BASIC IDEA

Initial Selected Probabilit Offspring


Populatio Populatio y Vector Populatio
01011n n
01011 n
10011
(3) Selection (3) Sampling (3)
11011 11011 0.6, 0.6, 0.3, 1.0, 11111
(4) (4) 1.0 (5)
01010 01010 01011
(2) (2) (3)
10111 10111 10111
(4) (4) (4)
10000 10000 01011
(1) (1) (3)
10011 10011 11011
10010 10010
Selection (3) 11011
(3) (4)
Sampling(4)
(2) (2)
11111 11111 1.0, 0.6, 0.6, 1.0, 11111
(5) (5) 1.0 (5)
01011 01011 10011
(3) (3) (3)
10111 10111 11111
(4) (4) (5)
01011 01011 10111
SOLVING ONEMAX WITH A SIMPLE EDA

• It is clear from the first generation that the procedure is having positive effects. The offspring
population already contains significantly more 1s than the original population and also includes
several copies of the global optimum 11111.
• In addition, the probability of a 1 in any particular position has increased; consequently, the
probability of generating the global optimum has increased.
• The second generation leads to a probability vector that is even more strongly biased towards the
global optimum and if the simulation was continued for one more generation, the probability
vector would generate only the global optimum.
• The learning and sampling of the probabilistic model provides a mechanism for both
1. improving quality of new candidate solutions (under certain assumptions), and
2. facilitating exploration of the set of admissible solutions.
PROBABILITY VECTOR ON ONEMAX
ADVANTAGES AND
DISADVANTAGES

Advantages:
1.Global Search: EDAs are capable of conducting global search in complex solution spaces, making them suitable for a
wide range of optimization problems.
2.Adaptation: They can adapt to the problem's characteristics over time, allowing them to efficiently explore and
exploit the search space.
3.No Need for Gradients: Unlike gradient-based optimization methods, EDAs do not require gradients of the objective
function, making them applicable to non-differentiable or noisy functions.
4.Versatility: EDAs can be used for both continuous and discrete optimization problems, making them versatile in
various domains.
5.Scalability: They can be parallelized and applied to high-dimensional problems, which is crucial for real-world
applications in engineering, finance, and other fields.
6.Exploration and Exploitation: EDAs maintain a balance between exploration (searching for new solutions) and
exploitation (improving promising solutions), leading to efficient convergence.
ADVANTAGES AND
DISADVANTAGES
Disadvantages:
1. Computational Cost: EDAs can be computationally expensive, especially when dealing with large-scale
optimization problems due to the need for sampling and modeling the distribution.
2. Complexity: Implementing EDAs can be more complex and time-consuming compared to simpler optimization
algorithms like gradient descent.
3. Parameter Tuning: EDAs often require the tuning of various parameters, such as population size and selection
criteria, which can be challenging and time-consuming.
4. Convergence: In some cases, EDAs might have slower convergence rates compared to more specialized
optimization algorithms tailored to a specific problem.
5. Limited Success for Certain Problems: While versatile, EDAs may not always outperform other optimization
methods for certain types of problems, especially if the problem structure is well-understood and can be
exploited effectively by specialized algorithms.
6. Sensitivity to Initialization: The performance of EDAs can be sensitive to the initial population and parameter
settings, requiring careful setup.
VARIANTS OF EDA

1. Univariate Marginal Distribution Algorithm (UMDA): UMDA is one of the simplest EDAs. It models the probability
distribution of each variable independently. It creates a probabilistic model for each variable by calculating the
marginal distribution.
2. Multivariate Normal Distribution Algorithm (MNDA): MNDA assumes that the probability distribution of the
solution space follows a multivariate normal distribution. It estimates the mean and covariance matrix to
represent the distribution and generate new solutions accordingly.
3. Bayesian Optimization Algorithm (BOA): BOA employs Bayesian networks to model the dependencies among
variables in the problem. It learns the structure and parameters of the Bayesian network to represent the
probabilistic model and guide the search.
4. Compact Genetic Algorithm (cGA): cGA is a type of EDA that uses a compact binary encoding of solutions. It
models the dependencies among variables by considering the interactions between subsets of variables and aims
to discover building blocks that lead to better solutions.
5. Improved Estimation of Distribution Algorithm (IEDA): IEDA is a variation of EDAs that incorporates techniques
like niching, local search, or other enhancements to improve the overall performance of the algorithm in various
optimization tasks.
6. Iterated Local Search with EDA (ILS-EDA): ILS-EDA combines Estimation of Distribution Algorithms with Iterated
Local Search techniques to create a hybrid algorithm that leverages the strengths of both approaches.
APPLICATIONS

1.Combinatorial Optimization: To solve combinatorial optimization problems like the Traveling


Salesman Problem (TSP), Job Scheduling, and Vehicle Routing.
2.Function Optimization: EDAs can be applied to optimize functions with continuous or discrete
variables. This is useful in mathematical modeling, engineering design, and machine learning, where
finding the best parameter settings is crucial.
3.Machine Learning: Used for feature selection, hyperparameter tuning, and model selection in
machine learning. They help automate the process of finding the most suitable machine learning model
and its configuration for a given problem.
4.Structural Design: In fields such as structural engineering and architecture, EDAs can assist in the
design of complex structures by optimizing material usage and structural integrity.
5.Image Segmentation: Can be applied to image processing tasks, such as image segmentation, by
optimizing the selection of image regions based on certain criteria.
6.Bioinformatics: Can be used for protein structure prediction, gene selection, and other
bioinformatics applications. They help in finding optimal solutions in complex biological data analysis.
7.Network Design: Can help in designing efficient network topologies and routing strategies in
communication networks.
8.Robotics: Can be employed for robot path planning, helping robots navigate in complex
environments and find optimal paths.
9.Game Playing: Used in game playing, particularly in game tree search algorithms, it can help
generate and evaluate potential moves efficiently.
10.Data Clustering: Can be applied to cluster data points into groups with similar characteristics,
making them useful in data mining and pattern recognition tasks.
11.Global Optimization: Used for global optimization problems, where the goal is to find the global
SUMMARY

In this session, the concepts of PSO have described


1. Define EDA algorithm and its properties
2. Explain the EDA algorithm with flow chart.
3. List out the advantages and disadvantages of EDA algorithm.
4. List out the Applications of EDA algorithm.
SELF-ASSESSMENT QUESTIONS

Q1. 1. What is the primary goal of Estimation of Distribution Algorithm (EDA)?

a) To find the optimal solution using a genetic algorithm


b) To estimate the probability distribution of promising solutions
c) To perform local search on a given solution space
d) To generate random solutions and evaluate them

Q2. 4. In EDA, what does the term "marginal distribution" refer to?

a) The distribution of all solutions in the population


b) The distribution of the best solution in the population
c) The distribution of a single variable within a solution
d) The distribution of solutions after mutation and crossover

CREATED
BY K.
TERMINAL QUESTIONS

Q1. (Subset Sum) Given a set of integers and a weight, find a subset of the set so that the sum of its
elements equals the weight.
e.g. given {1,3,5,6,8,10}, W=14
solutions: {1,3,10},{3,5,6},{6,8},{1,5,8}

Q2. Explain in detail the EDA algorithm with flow chart and its components.

Q3. Compare and contrast on Genetic Algorithm with EDA algorithm.

Q4. (Optimization)Maximize the function f(x) = x2 with 0 ≤ x ≤ 10 using the EDA algorithm.
a) Find the population of the solution.
b) Explain the result for 2 iterations considering three initial random parents, (0,1,1,0), (1,1,0,0),
(1,0,0,1)

CREATED
BY K.
REFERENCES
Reference Books:
1. Chapter 5, Modern Optimization with R, Paulo Cortez, Second Edition, Springer, 2021
2. Nature-Inspired Algorithms and Applications, Anupriya Jain, Dinesh Goyal, S. Balamurugan, Sachin Sharma,
Seema Sharma, Sonia Duggal, Second Edition, Wiley
3. OPTIMIZATION Algorithms and Applications, Rajesh Kumar Arora, Second Edition, Taylor & Francis Group,
LLC

Sites and Web links:

4. https://www.sidshakya.com/tutorials/
5. https://en.wikipedia.org/wiki/Estimation_of_distribution_algorithm
6. https://www.semanticscholar.org/paper/Estimation-of-Distribution-Algorithms%3A-A-New-for-Bengoetxe
a-Larra%C3%B1aga/9322d11835043d21ce5f812d03da09a9800f8918

CREATED
BY K.
THANK
YOU

Team – BDO Even Semester


2023- 24

CREATED
BY K.

You might also like