Differentiability 2
Differentiability 2
Alex Nita
Abstract
called the total derivative (or the Jacobi matrix), which satisfies the following limit
condition:
|f (a + h) − f (a) − Df (a)h|
lim =0 (1.2)
h→0 |h|
1
satisfies
|E(h)|
lim =0 (1.4)
h→0 |h|
Remark 1.1 The expression Df (a)h denotes matrix multiplication. Here, h = (h1 , . . . , hn )
is a vector in Rn thought of as an n × 1 column vector:
m11 · · · m1n h1 m11 h1 + · · · + m1n hn
Df (a)h = ... .. .. .. = ..
. . . .
mm1 ··· mmn hn mm1 h1 + · · · + mmn h1
Remark 1.2 Just as in the single variable case, where h = ∆x = x − a, so, too, here:
x1 − a1
h = ∆x = x − a =
..
.
xn − an
Thus, the idea is that going a distance of h away from a, to the point x, we may approximate
the value of f (x) = f (a + h) by the “tangent plane” Df (a)∆x. Referring to the notes on
Points, Vectors and Matrices, this approximation, Df (a)∆x, is literally the tangent plane
approximation in the real-valued case of m = 1, for then
x1 − a1
Df (a)∆x = m1 . . . mn
..
.
xn − an
= m1 (x1 − a1 ) + · · · + mn (xn − an )
= m1 x1 + m2 x2 + · · · + mn xn + d
Remark 1.3 Thus to say that a function f is differentiable at a point is equivalent to saying
that f has a total derivative there. We shall see that f may be partially differentiable, and
to have directional derivatives in all directions, yet not be differentiable. We will explain this
further below.
2
2 The Directional Derivative
Remark 2.1 Notice that for each fixed nonzero t the difference f (a + tv) − f (a) in the
numerator is a vector in Rm , while 1/t is a real number, so the quotient f (a+tv)−f
t
(a)
is
actually scalar multiplication of the vector f (a + tv) − f (a) by 1/t.
Another thing to notice is that in the sum a + tv we added a point to a vector. Since we have
emphasized blurring the lines between points and vectors in Rn , on account of algebraically
they are indistinguishable, at least in Rn , the sum makes sense.
∂f d
or fxi (a) or Di f (a) = f (a + tei )
∂xi x=a dt t=0
f (a + tei ) − f (a)
= lim
t→0 t
f (a1 , . . . , ai−1 , ai + t, ai+1 , . . . , an ) − f (a1 , . . . , ai , . . . , an )
= lim
t→0 t
(3.1)
Remark 3.1 The practical import of this definition will become clear in a minute. For now,
d
notice that the derivative dt t=0
f (a + tei ) is an ordinary derivative from Calc 1. It’s the
derivative of the real-valued function of a real variable f ◦ T : R → R, (f ◦ T )(t) = f (a + tei ).
This means that all the other coordinates, which we normally treat as variables, since they
may vary, are treated here as constants. Thus, if we label the variables x1 , . . . , xi , . . . , xn ,
all the other xj for j 6= i are treated as constants in any expression for f . For example, if
f (x, y, z) = xyz + x2 y + z 2 , in the partial derivative ∂f
∂x with respect to x the ‘variables’ y
and z are treated as constants, so we can do what we normally do when computing a Calc 1
derivative, pull the constants out. Here, for example, we’d have ∂f ∂x = yz + 2xy.
3
4 The Relationship Between the Total and Directional
and Partial Derivatives
Nobody wants to compute an actual limit, though the limit idea is extremely important
theoretically. Luckily, we don’t have to here. The directional derivative, though defined in
terms of a limit, is in fact computable in terms of a matrix product!
The left-hand side is a limit, while the right-hand side is a matrix product, with v treated as
a column vector.
then computing the directional derivative of f at a in the direction of v would be easy, namely
−1
2 3 2 14
fv (a) = Df (a)v = 4 =
1 0 5 9
2
In this case I claim that the total derivative Df (a) is the matrix of partial derivatives
of the component functions fi of f ,
∂f1 ∂f1
∂x1 ···
a ∂xn a
Df (a) = ... .. .. (4.2)
. .
∂fm ∂fm
···
∂x1 a ∂xn a
To see this, take a real-valued function first, f : Rn → R (for any vector-valued function
as above is made up of its m real-valued component functions) and look at the directional
4
derivative in the ith coordinate direction ei = (0, . . . , 1, . . . , 0) (it has a 1 in the ith slot and
0 everywhere else). Letting
Df (a) = m1 m2 · · · mn
be the 1×n matrix defining the total derivative of f , and noting that the ith partial derivative
is the directional derivative in the ith coordinate direction, we have
0
..
.
∂f
= Dei f (a) = Df (a)ei = m1 m2 · · · mn 1
∂xi a .
..
0
= m1 · 0 + m2 · 0 + · · · mi · 1 + · · · mn · 0
= mi
Now that we know how to compute Df (a) if we know that Df (a) exists, we have to answer the
question, “How do we determine the existence of Df (a)?”. Well, we have seen that it
boils down do determining the existence of the m separate total derivatives of the component
5
functions Dfi (a). The remaining question, therefore, is, “How do we determine the
existence of the m separate total derivatives Dfi (a) of the component functions
∂fi
fi ?” The naı̈ve answer is, “Well, just compute the partials ∂x j
at a of each fi and put them in
a matrix,” unfortunately, is not entirely correct. It would be if we knew that the partials were
also continuous on a neighborhood of the point a, but not otherwise. Here is an example
∂fi
of why the existence of the partials ∂x j
at a alone is not enough to conclude the
existence of Df (a) (we must also have their continuity):
0, if (x, y) = (0, 0)
First, notice that all its directional derivatives exist at the origin, for if v = (h, k) is any
vector in R2 , then the directional derivative Dv f (0) is computable directly:
2
f (0 + tv) − f (0) 2
(th) (tk) − 0 1 3 2
t h k h
if k 6= 0
Dv f (0) = lim = lim · = lim = k
t→0 t t→0 (th)4 + (tk)2 t 3 2 4
t→0 t (t h + k 2 ) 0 if k = 0
In particular, choosing v = e1 = (1, 0) and v = e2 = (0, 1) shows that it has partial derivatives
∂f ∂f
∂x |(0,0) = ∂y |(0,0) = 0 at the origin. Outside the origin it is easily seen to be partially
differentiable, and its partial derivatives exist everywhere on R2 , and are given by
4 2 5
∂f (x + y )2xy − 4x y , if (x, y) 6= (0, 0)
= 4
(x + y )2 2
∂x
(0, 0), if (x, y) = (0, 0)
4 2 2 2 2
∂f (x + y )x − 2x y , if (x, y) 6= (0, 0)
= (x4 + y 2 )2
∂y
(0, 0), if (x, y) = (0, 0)
h4 1
f (h, h2 ) = =
2h4 2
so that arbitrarily close to the origin there are points for which f (x, y) = 1/2, while f (0, 0) = 0.
On the other hand, along any straight line y = mx the function satisfies
mx3 mx
f (x, mx) = = 2
x2 (x2 + m2 ) x + m2
so f approaches 0 along straight lines. By one of your homework problems, however, all
differentiable functions must be continuous, so we conclude that f is not differentiable at the
origin. (We prove that differentiability implies continuity below!)
∂f ∂f
The problem here, of course, is that the partials ∂x and ∂y are not continuous at the origin.
∂f 2
For example, ∂xapproaches 0 along the parabola y = x while it diverges to −∞ along the
line y = x. (Check this!)
6
Remark 4.5 The problem point (0, 0) isn’t special. We could make any point a problem
point, for example (1, 5), by translating the above example function by (1, 5), i.e. by consid-
(x−1)2 (y−5)
ering f (x, y) = (x−1) 4 +(y−5)2 when (x, y) 6= (1, 5) and f (0, 0) = (0, 0).
∂fi
OK, so now we know that the mere existence of the partials ∂xj a of f = (f1 , . . . , fm ) isn’t
∂fi
enough to ensure the existence of Df (a). What we need is the continuity of the partials ∂xj on
a neighborhood of a.
∂fi
Theorem 4.6 Let f : Rn → Rm . If all the partial derivatives ∂xj a of f exist and are
continuous at a, then f is differentiable at a.
Example 4.7 Let f : R3 → R2 be given by f (x, y, z) = (x2 + y − z, exy sin z + xz). Then
f1 (x, y, z) = x2 +y−z and f2 (x, y, z) = exy sin z+xz are each clearly continuously differentiable
in each partial derivative (for example, ∂f n
∂x = 2x is continuous on all of R ). Therefore, f is
1
7
5 Further Properties of the Total and Partial Derivative
The components of the matrix D(g ◦ f )(a) in (6.7) may explicitly be given by the formulas:
∂(g ◦ f )i ∂gi ∂f1 ∂gi ∂fm
= + ··· + (5.2)
∂xj a ∂y1 b ∂xj a ∂ym b ∂xj a
for all 1 ≤ i, j ≤ n. This is also frequently denoted fxi xj (a) = fxj xi (a).
Remark 5.3 Failure of continuity at a may lead to inequality of the mixed partials at a.
Consider the function f : R2 → R given by
3 3
x y − xy , if (x, y) 6= (0, 0)
f (x, y) = 2
x +y 2
0, if (x, y) = (0, 0)
Then,
2 2 2 3 3 3
∂f (x + y )(3x y − y ) − 2x(x y − xy ) , if (x, y) 6= (0, 0)
= (x2 + y 2 )4
∂x
0, if (x, y) = (0, 0)
4 2 3 2 3 5 4 2 3
3x y − x y + 3x y − y − 2x y + 2x y , if (x, y) 6= (0, 0)
= 2
(x + y ) 2 4
0, if (x, y) = (0, 0)
4 2 3 5
x y + 4x y − y , if (x, y) 6= (0, 0)
= (x2 + y 2 )4
0, if (x, y) = (0, 0)
8
and
2 2 3 2 3 3
∂f (x + y )(x − 3xy ) − 2y(x y − xy ) , if (x, y) 6= (0, 0)
= (x2 + y 2 )4
∂y
0, if (x, y) = (0, 0)
5 3 2 3 2 4
x − 3x y + x y − 3xy , if (x, y) 6= (0, 0)
= (x2 + y 2 )4
0, if (x, y) = (0, 0)
5 3 2 4
x − 4x y − xy , if (x, y) 6= (0, 0)
= 2
(x + y )2 4
0, if (x, y) = (0, 0)
∂f −5b5 ∂f a5
= = −b = =a
∂x (0,b) b4 ∂y (a,0) a4
and consequently
∂f ∂f
∂2f ∂x (0,t) − ∂x (0,0) −t − 0
= lim = lim = −1
∂y∂x (0,0)
t→0 t t→0 t
∂f ∂f
∂2f ∂y (t,0) − ∂y (0,0) t−0
= lim = lim =1
∂x∂y (0,0)
t→0 t t→0 t
∂2f ∂2f
and so ∂y∂x 6= ∂x∂y . The problem, of course, is the discontinuity of the second
(0,0) (0,0)
derivatives at (0, 0):
( 2 22 4
(x +y ) (5x −12x2 y 2 −y 4 )−2(x2 y 2 )2x(x5 −4x3 y 2 −xy 4 )
∂2f (x2 +y 2 )4 , if (x, y) 6= (0, 0)
=
∂x∂y 0, if (x, y) = (0, 0)
( 2 22 4
(x +y ) (x +12x2 y 2 −5y 4 )−2(x2 y 2 )2y(x4 y+4x2 y 3 −y 5 )
∂2f (x2 +y 2 )4 , if (x, y) 6= (0, 0)
=
∂y∂x 0, if (x, y) = (0, 0)
2
∂ f
For example, along the line x = y we have ∂x∂y = 2(1 − x), so it approaches a value of 2,
while along the line x = 0 it stays constant at 1, as noted above.
9
6 Appendix: Proofs of the Theorems
The left-hand side is a limit, while the right-hand side is a matrix product, with v treated as
a column vector.
Proof: Since f is differentiable at a, fix v and consider h = tv for some sufficiently small
t ∈ R. Applying the linear approximation (1.3) and the linearity of the derivative Df (a) (i.e.
Df (a)(ax + by) = aDf (a)x + bDf (a)y) we get
i.e.
f (x + tv) − f (x)
Dv f (x) = lim = Df (x)(v)
t→0 t
10
Proof: This follows from the inequalities
√
|ai | ≤ |a| ≤ n max |ai |
1≤i≤n
for all i, since if f is differentiable at a, then the limit (1.2) exists, so the first inequality above
implies that the limit of zero exists in each of the coordinates, and so for each of the coordinate
functions. Indeed, by that limit we must have that Dfi (a) is the ith component function of
Df (a). Conversely, if√the component functions are differentiable at a, then multiplying the
limit (1.2) for fi by n and using the second inequality above we have that the limit (1.2)
for f holds as well (just choose the fi with maximum absolute value), and moreover we must
have that Dfi (x) are the coordinate linear functionals of Df (a) by the first inequality.
∂fi
Theorem 6.3 Let f : Rn → Rm . If all the partial derivatives ∂xj a of f exist and are
continuous at a, then f is differentiable at a.
Proof: By Proposition 4.3 it is enough to prove this for the component functions fi of
f . Indeed, let fi be a component function of f , and suppose it’s partial derivatives all
∂fi
exist and are continuous in a neighborhood of a. Then, since ∂x j a
moves only in the jth
coordinate direction, we need only hj = (0, . . . , hj , . . . , 0) in those directions. By the definition
∂fi
of continuity of ∂xj a
, for any ε > 0 we choose there is a δ > 0 such that if |h| = |hj | < δ
then
∂fi ∂fi
−
∂xj a+h ∂xj a ε
<
|hj | n
Let h be a point in Rn , so that h = h1 + · · · + hn using our notation above. By the Mean
Value Theorem from Calc 1, the continuity of f and the existence of the jth partial implies
the existence of a point a + hj + tj ej between a + hj and a + hj + ej such that
∂fi
f (a + hj ) − f (a) = hj (6.4)
∂xj x+hi +ti ei
(Note: in the jth coordinate, keeping all other coordinates fixed, fj is a real-valued function of
a single variable, so this works. Recall the MVT: If f is continuous on [a, b] and differentiable
on (a, b) then there is a point c between a and b such that f (b) − f (a) = f 0 (c)(b − a)!) As a
consequence, we have
n
∂fi ∂fi X ∂fi
fi (a + h) − fi (a) − ··· h = fi (a + h) − fi (a) − hj
∂x1 a ∂xj a ∂xj a
j=1
n n
X ∂fi X ∂fi
= hj − hi
j=1
∂xj a+h+tj ei j=1
∂xj a
n
X ∂fi ∂fi
≤ − |hj |
j=1
∂xj a+h+tj ei ∂xj a
n
X |hj |ε
<
j=1
n
≤ |h|ε
where the first inequality is from factoring out |hj | and then using the triangle inequality, the
second is by application of (6.4) for each j, and the third by observing that |h1 | + · · · + |hn | ≤
11
p p
h21 + · · · + h2n + · · · + h21 + · · · + h2n = n|h|. Dividing the above inequality through by |h|
gives our desired inequality,
∂fi ∂fi
fi (a + h) − fi (a) − ··· h
∂x1 a ∂xj a
<ε
|h|
Proposition 6.4 (Hadamard) Let U ⊂ Rn be open and let f : U → Rm . Then, for any
x0 ∈ U the following are equivalent:
(1) f is differentiable at x0 .
(2) There exists a map ϕx0 : U → L(Rn , Rm ), continuous at x0 , such that for all x ∈ U we
have
f (x) = f (x0 ) + ϕx0 (x)(x − x0 ) (6.5)
|(h)hT | |(h)|2
so limh→0 |h|22
= limh→0 |h|2 = 0, so that limh→0 ϕx0 (x0 + h) = Df (x0 ).
(2) ⇒ (1): Conversely, suppose there is a ϕx0 : U → L(Rn , Rm ), continuous at x0 , such that
for all x ∈ U equation (6.6) holds. Then, by continuity we have that for all > 0 there is a
δ > 0 such that |h|2 = |x0 + h − x0 |2 < δ implies |ϕx0 (x0 + h) − ϕx0 (x0 )| < . Since Rn and
Rm are finite-dimensional, it is an easy matter to show that any T ∈ L(Rn , Rm ) is continuous,
12
and therefore bounded. Consequently we may use again that fact from linear algebra cited
above, and along with (6.5) we have
|f (x0 + h) − f (x0 ) − ϕx0 (x0 )(h)|2 (6.5) |ϕx0 (x0 + h)(h) − ϕx0 (x0 )(h)|2
=
|h|2 |h|2
|ϕx0 (x0 + h) − ϕx0 (x0 )||(h)|2
≤
|h|2
<
|f (x0 +h)−f (x0 )−ϕx0 (x0 )(h)|2
i.e. limh→0 |h|2 = 0, and f is differentiable at x0 .
The components of the matrix D(g ◦ f )(a) in (6.7) may explicitly be given by the formulas:
g(y) − g(f (x0 )) = ψy0 (y)(y − f (x0 )), with lim ψy0 (y) = Dg(f (x0 )) (6.11)
y→f (x0 )
By the second parts of (6.10) and (6.11) we have limx→x0 ψy0 (f (x)) ◦ ϕx0 (x) = Dg(f (x0 )) ◦
Df (x0 ). The linearity of Dg(f (x0 )) ◦ Df (x0 ) follows from that of ψy0 (f (x0 )) in (6.12),
so when we take the limit as x → x0 of (6.12) we get by Hadamard’s lemma again that
D(g ◦ f )(x0 ) = Dg(f (x0 )) ◦ Df (x0 ).
13
Theorem 6.6 (Clairaut: Equality of Mixed Partial Derivatives) If f : Rn → Rm
has twice continuously differentiable partial derivatives or equivalently if for all 1 ≤ i, j ≤ n the
2 2
partial derivatives ∂x∂i ∂x
f
j
and ∂x∂j ∂x
f
i
exist on a neighborhood of a point a and are continuous
at a, then
∂2f ∂2f
= (6.13)
∂xi ∂xj a ∂xi ∂xj a
for all 1 ≤ i, j ≤ n.
Proof: It will simplify notation a little if we write Dj instead of ∂/∂xj . In view of Proposition
4.3 it suffices to prove this for all component functions fk of f . Without loss of generality, we
may suppose that i < j. Let r : Rn → R be given by
fk (y) − fk (y1 , . . . , yi−1 , xi , yi+1 , . . . , yn ) − fk (y1 , . . . , yj−1 , xj , yj+1 , . . . , yn ) + fk (x)
r(y) =
(yi − xi )(yj − xj )
and define g : R → R by
g(t) = fk (y1 , . . . , yi−1 , t, yi+1 , . . . , yn ) − fk (y1 , . . . , yi−1 , t, yi+1 , . . . , yj−1 , xj , yj+1 , . . . , yn )
so that
g(yi ) − g(xi )
r(y) =
(yi − xi )(yj − xj )
We will show that both sides of (6.13) are equal to limy→x r(y). The denominator of r is the
area of the rectangle with vertices (yi , yj ), (xi , yi ), (yj , xj ) and (xi , xj ) in the i-j plane, while
the numerator is the alternating sum of the values of f at these vertices. Note that since the
partial derivatives of f (up to order 2), and so those of each component function fk of f , exist
on a neighborhood N ⊆ U of x, we have that g is differentiable on N . By the Mean Value
Theorem for R, there is a ξi between xi and yi such that
Reversing the roles of xi and xj above shows that, with ξ 0 probably different from ξ above,
that
lim r(y) = Di Dj f (x)
y→x
14