Calculus of Variations FEM
Calculus of Variations FEM
Calculus of Variations
3.1 Introduction
The calculus of variations deals with functionals, which are functions of a function, to put it
simply. For example, the methods of calculus of variations can be used to find an unknown
function that minimizes or maximizes a functional. Many of its methods were developed over
two hundred years ago by Euler (1701-1783), Lagrange (1736-1813), and others. It continues
to the present day to bring important techniques to many branches of engineering and physics.
3.2 Functionals
As we have seen in the last section, there exist a great variety of physical problems that deals
with functionals, which are functions of a function. We are familiar with the definition of a
function. A function can be regarded as a rule that maps one number (or a set of numbers) to
another value. For example,
f (x) = x2 + 2x
is a function, which maps x = 2 to f (x) = 8, and x = 3 to f (x) = 15, etc. On the other hand, a
functional is a mapping from a function (or a set of functions) to a value. That is, a functional
is a rule that assigns a real number to each function y(x) in a well-defined class. Like a function,
a functional is a rule, but its domain is some set of functions rather than a set of real numbers.
We can consider F[y(x)] as a functional for the fixed values of x. For example,
F[y(x)] = 3y2 − y + 10
where
y(x) = ex + cos x − x for x=π
is a functional. Another class of functional has the form
Z b
J[y] = y(x) dx
a
15
16 CHAPTER 3. CALCULUS OF VARIATIONS
Here J gives the area under the curve y = y(x). Hence J is not a function of x and its value will
be a number. However, this number depends on the particular form of y(x) and hence J[y] is a
functional. For a = 0 and b = π , the value of the functional when y(x) = x is
Z π
π2
J[y] = x dx = ≈ 4.93
0 2
and when y(x) = sin x, Z π
J[y] = sin x dx = 2
0
Therefore the given functional J[y] maps y(x) = x to π 2 /2 and maps y(x) = sin x to 2. Because
an integral maps a function to a number, a functional usually involves an integral. The following
form of functional often appears in the calculus of variations,
Z b
J[y] = F(x, y, y′ ) dx (3.1)
a
The fundamental problem of the calculus of variations is to find the extremum (maximum or
minimum) of the functional (3.1).
This is illustrated in figure 3.1, where y(x) is shown in red color and Y (x) is shown in blue color.
By definition, the total change in the functional is given by
∆F[y] = F[y(x) + δ y(x)] − F[y(x)] = F[Y (x)] − F[y(x)] (3.6)
If η (x) is an arbitrary differentiable function that vanishes at the boundaries of the domain, i.e.,
y
δ y(x)
y(x)
Y (x)
Let us now define what is called the Gâteaux derivative or Gâteaux variation in the direction of
η (x). It is denoted by δ F[y; η ] and is defined as
∆F F[y + εη ] − F[y] d
δ F[y; η ] = lim = lim = F[y + εη ] (3.14)
ε →0 ε ε →0 ε dε ε =0
Note that the first variation and the Gâteaux variation are related through the parameter ε ,
i.e., δ F( f v) = εδ F(gv) where we have denoted first variation by δ F( f v) and Gâteaux variation by
δ F(gv) . Unfortunately, in the literature, these two variations are denoted by the same symbol
δ F.
Let us look at the meaning of η and ε geometrically. Since y is the unknown function to
be found so as to extremize a functional, we want to see what happens to the functional F[y]
when we perturb this function slightly. For this, we take another function η and multiply it by
a small number ε . We add εη to y and look at the value of F[y + εη ]. That is, we look at
the perturbed value of the functional due to perturbation εη . This is the shaded area shown in
figure 3.2. Now as ε → 0, we consider the limit of the shaded area divided by ε . If this limit
exists, such a limit is called the Gâteaux variation of F[y] at y for an arbitrary but fixed function
η.
y
y(x)
y + εη
η (x)
x
a b
Note that choosing a different η gives a different set of varied curves and hence a different
variation. Hence δ F[y; η ] depends on which function η is chosen to define the increment δ y
and this dependence is explicitly shown in the notation.
F[x, y, y′ , y′′ ]
3.3. FIRST VARIATION OF FUNCTIONALS 19
Y = y + εη
Y ′ = y′ + εη ′ and
Y = y + εη
′′ ′′ ′′
Formula (3.17) for δ F has the same form as the above formula for dF. Thus the variation of F
is given by the same formula as differential of F, if x is considered to be fixed.
It is to be noted that the differential of a function is the first-order approximation to the
change in that function, along a particular curve while the variation of a functional is the first-
order approximation to the change in the functional from one curve to other.
We mention here that the sum of terms in ε and ε 2 is called the second variation of F and
the sum of terms in ε , ε 2 , and ε 3 is called the third variation of F. However, when the term
variation is used alone, the first variation is meant.
The variational operator δ follows the rules of differential operator d of calculus. Let F1 and F2
be any continuous and differentiable functionals. Then we have the following results:
• δ F n = nF n−1 δ F
• δ (F1 + F2 ) = δ F1 + δ F2
• δ (F1 F2 ) = F1 δ F2 + F2 δ F1
F1 F2 δ F1 − F1 δ F2
• δ =
F2 F22
It is easy to show that the operators d
dx and δ are commutative. The commutative property
may be written mathematically as
d dy
(δ y) = δ
dx dx
The proof is as follows:
d d dη dy
(δ y) = (εη ) = ε = εη ′ = δ y′ = δ
dx dx dx dx
That is, the differential of the variation of a function is identical to the variation of the differential
of the same function.
Another commutative property is the one that states that the variation of the integral of a
functional F is the same as the integral of the variation of the same functional, or mathematically
Z Z
δ Fdx = δ Fdx
Note that the two integrals must be evaluated between the same two limits.
Rb
First variation of functional a F(x, y, y′ , y′′ ) dx
where Z b
J[y + εη ] = F[x, y + εη , y′ + εη ′ , y′′ + εη ′′ ] dx
a
Therefore, the change in functional is given by
Z b Z b
∆J = F[x, y + εη , y′ + εη ′ , y′′ + εη ′′ ] dx − F(x, y, y′ , y′′ ) dx (3.19)
a a
As previously defined, the Gâteaux derivative or Gâteaux variation in the direction of η (x) is
given by
∆J J[y + εη ] − J[y] d
δ J[y; η ] = lim = lim = J[y + εη ] (3.20)
ε →0 ε ε →0 ε dε ε =0
Example 3.1
where F(x, y, y′ , y′′ ) is some given function and A is a admissible class of functions. The integrand
F is known as the Lagrangian for the variational problem. We assume that the Lagrangian is
continuously differentiable in each of its four arguments x, y, y′ , and y′′ .
Very often, we encounter variational problems in which the integrand F takes the simple form
F(x, y, y′ ) and hence have the functional in the form
Z b
J[y] = F(x, y, y′ ) dx, y∈A (3.22)
a
has a local minimum at a point x = x0 in (a, b) if f (x0 ) < f (x) for all x near x0 on both sides
of x = x0 . In other words, f has a local minimum at a point x = x0 in (a, b) if f (x0 ) < f (x)
for all x, satisfying |x − x0 | < δ for some δ . If f has a local minimum at x0 in (a, b) and f is
differentiable in (a, b), then it is well known that
f ′ (x0 ) = 0 (3.23a)
Similar statements can be made if f has a local maximum at x0 . The aforementioned condition
(3.23a) is called a necessary condition for a local minimum; that is, if f has a local minimum
at x0 , then (3.23a) necessarily follows. Equation (3.23a) is not sufficient for a local minimum,
however; that is, if (3.23a) holds, it does not guarantee that x0 provides an actual minimum.
The following conditions are sufficient conditions for f to have a local minimum at x0
provided f ′′ exists. Again, similar conditions can be formulated for local maxima. If (3.23b)
holds, we say f is stationary at x0 and that x0 is an extreme point for f .
δ J[ŷ; η ] = 0 (3.24)
Example 3.2
ŷ = x
η = x(1 − x)
x
1
Z 1h 2 i
J[ŷ + εη ] = 1 + ŷ′ (x) + εη ′ (x) dx
0
Z 1h i
= 1 + (1 + ε (1 − 2x))2 dx
0
ε2
= 2+
3
Then the derivative of the functional
d 2ε
J[ŷ + εη ] =
dε 3
Evaluating this derivative at ε = 0 gives the Gâteaux derivative
d
δ J[ŷ; η ] = J[y + εη ] =0
dε ε =0
Example 3.3
ŷ = x
η = sin x
x
2π
Therefore, from (3.27) the necessary condition for the functional J[y] to have an extremum at y
is given by
Z b
∂F ∂F ′
η + ′ η dx = 0 (3.29)
a ∂y ∂y
for all η ∈ C2 [a, b] with η (a) = η (b) = 0.
solutions are not necessarily local minima. It is a second-order ordinary differential equation
with a solution that is required to satisfy two conditions at the boundaries of the domain of
solution. Such boundary value problems may have no solution, one unique solution, or multiple
solutions depending on the situation. A case with multiple solutions will imply that more than
one paths from point (a, α ) to point (b, β ) satisfy the Euler–Lagrange equation. However, not
all of these paths will necessarily minimize the functional J[y]. A second important aspect of the
Euler–Lagrange equation is related to our assumption that the curve y(x) ∈ C2 [a, b]. Indeed, our
considerations focused only on such smooth functions. However, the actual path that extremizes
an integral might be one with a corner or a kink. Such paths are not relevant for the use of the
Euler–Lagrange equation in Newtonian mechanics. However, they are often the true solutions in
other problems in the calculus of variations, as we have seen in the case of physics of soap films.
It may be worthwhile to note that if y is treated as independent variable and x is dependent
variable, then the Euler–Lagrange equation (3.32a) will takes the form
∂F d ∂F
− = 0, y ∈ [α , β ] (3.32b)
∂x dy ∂ x′
But, we have
d ′∂F ′′ ∂ F ′ d ∂F
y ′ =y +y (3.34)
dx ∂y ∂ y′ dx ∂ y′
Subtracting (3.34) from (3.33), we have
dF d ′∂F ∂F ′∂F ′ d ∂F
− y ′ = +y −y
dx dx ∂y ∂x ∂y dx ∂ y′
Rewriting the above equation to give
d ′∂F ∂F ′ ∂F d ∂F
F −y ′ − =y −
dx ∂y ∂x ∂y dx ∂ y′
By the Euler–Lagrange equation (3.32a) we see that the right-hand side of the above equation
is zero. Thus,
d ′∂F ∂F
F−y ′ − =0 (3.35)
dx ∂y ∂x
Equation (3.35) is another useful form of the Euler–Lagrange equation.
Case III. If F is independent of y′ , then ∂ F/∂ y′ = 0 and the form of Euler–Lagrange equation
(3.32a) becomes
∂F
=0
∂y
integrating, we get F = F(x), a function of x alone.
The variational problem is to find the plane curve whose length is shortest i.e., to determine the
function y(x) which minimizes the functional J[y]. The curve y(x) which minimizes the functional
J[y] is be determined by solving the Euler–Lagrange equation (3.32a)
∂F d ∂F
− =0
∂y dx ∂ y′
In the present problem q
F= 1 + y′ (x)2
and is a special case in which F independent of x and y. Then according to (3.17) EL equation
reduces to
∂F
=k
∂ y′
32 CHAPTER 3. CALCULUS OF VARIATIONS
Therefore, p
y′ = k 1 + y′2
Solving for y′ to obtain s
k2
y′ = =m
1 − k2
Integrating, y = mx + c, where constants m and c are to be found using the boundary conditions
y(x1 ) = y1 and y(x2 ) = y2 . Thus, the straight line joining the two points P(x1 , y1 ) and Q(x2 , y2 ),
y2 − y1 x2 y1 − x1 y2
y= x+
x2 − x1 x2 − x1
is the curve with shortest length.
Let P(x1 , y1 ) and Q(x2 , y2 ) be two points on a vertical plane. Consider a curved path connecting
these points. We allow a particle, without friction, to slide down this path under the influence
of gravity. The question here is what is the shape of curve that allows the particle to complete
the journey in the shortest possible time. Clearly, the shortest path from point P to point Q is
the straight line that connects the two points. However, along the straight line, the acceleration
is constant and not necessarily optimal. Naive guesses for the paths’s optimal shape, including
a straight line, a circular arc, a parabola, or a catenary are wrong.
In order to calculate the optimal curve we set up a two-dimensional Cartesian coordinate
system on the vertical plane that contains the two points P and Q as shown in figure 3.5. Our
goal is to find the path that minimizes the time it takes for an object to move from point P to
point Q.
P b
x
y(x)
g
c(x,y)
b
b Q
Ft
Fn
F
y
Figure 3.5: A particle sliding down a curved path.
3.8. APPLICATION OF EL EQUATION: MINIMAL PATH PROBLEMS 33
From figure 3.5 we see that at any point c(x, y) on curve y(x), the gravitational force vector F
decomposes into a component Ft tangent and Fn normal to curve at P. The component Fn does
nothing to move the particle along the path, only the component Ft has any effect. The vector
F is a constant at each point on the curve of (F = mg, where m is the mass of the particle and g
is the gravitational acceleration), but Fn and Ft depend on the steepness of the curve at c. The
steeper the curve, the larger Ft is, and the faster the particle moves. So it would be better if the
path close to point P is more steeper so that the velocity of the object increases rapidly and then
flattens towards point c. Definitely this sort of curve is longer than the straight line connecting
the end points. But the extra speed that the particle develops just as it is released will more
than make up for the extra distance that it must travel, and it will arrive at Q in less time than
it takes along a straight line. The curve along which the particle takes the least time to go from
P to Q is called the Brachistochrone (from the Greek words for shortest time). This famous
problem, known as the Brachistochrone Problem, was posed by Johann Bernoulli (1667-1748) in
1696. The problem was solved by Johann Bernoulli, his older brother Jakob Bernoulli, Newton,
and L’Hospital.
Let us begin our own study of the problem by deriving a formula relating the choice of the
curve y to the time required for a particle to fall from P to Q. The instantaneous velocity of the
ball along the curve is v = dsdt , where s denotes the arc-length. Therefore,
p q
ds dx2 + dy2 1
dt = = = 1 + y′ (x)2 dx (3.44)
v v v
Let τ be the time of descent from A to B along the curve y = y(x). Then,
Z τ Z S
ds
τ = dt = (3.45)
0 0 v
where S is the total arc-length of the curve. If the origin of the coordinate system is taken as
the staring point A, we have, using (3.44)
Z x2 p
1 + y′ (x)2
τ = dx (3.46)
0 v
To obtain an expression for v we use the fact that energy is conserved through the motion. Thus,
the total energy at any time t must be the same as the total energy at time zero (corresponding
to location P), which we may take to be zero; that is
1 2
mv + mg(−y) = 0
2
√
Solving for v gives v = 2gy. Therefore the time required for the particle to descend is
s
Z x2
1 1 + y′ (x)2
τ [y] = √ dx (3.47)
2g 0 y(x)
where we have explicitly noted that τ depends on the curve y(x). Equation (3.47) defines a
functional.
34 CHAPTER 3. CALCULUS OF VARIATIONS
The Brachistochrone problem can be stated as: find the function y(x) that minimizes the
functional s
Z x2
1 1 + y′ (x)2
τ = J[y] = √ dx (3.48)
2g 0 y(x)
subject to the conditions y(0) = 0 and y(x2 ) = y2 > 0. We could experiment with formula (3.48)
to determine the the shortest time. Clearly it would be tedious to choose y(x) one after another
and look for the shortest time.
First of all we note that s
1 + y′2
F =
y
which is independent of x and therefore we can apply the Beltrami identity (3.36)
∂F
F − y′ =B
∂ y′
where B is a constant. Now
∂F 1 1
= √ · p · 2y′
∂y′ y 2 1+y′2
b b
x
φ
b b b
y
Figure 3.6: The cycloid acts as a brachistochrone.
Another remarkable characteristic of the brachistochrone particle is that when two particles
at rest are simultaneously released from two different points M and N of the curve they will
reach the terminal point of the curve at the same time, if the terminal point is the lowest point
on the path (see figure 3.7). Such a curve is called an isochrone or a tautochrone. This is also
counterintuitive, since clearly they have different geometric distances to cover; however, since
they are acting under the gravity and the slope of the curve is different at the two locations,
the particle starting from a higher location gathers much bigger speed than the particle starting
at a lower location. Hence the brachistochrone problem may also be posed with a specified
terminal point and a variable starting point, leading to the class of variational problems with
open boundary.
36 CHAPTER 3. CALCULUS OF VARIATIONS
b
x
b
M
b
N
y b