0% found this document useful (0 votes)

13 views14 pages

Differentiability 2

This document discusses the differentiability of vector-valued functions and introduces the concept of the total derivative, which approximates complex functions using linear functions. It explains directional and partial derivatives, emphasizing their relationships and the importance of the total derivative matrix. Theorems are presented to establish the conditions under which differentiability holds, and the document includes remarks and examples to clarify these concepts.

Uploaded by

Samuel Camacho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views14 pages

Differentiability 2

Uploaded by

Samuel Camacho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Differentiability

Alex Nita

Abstract

In this section we try to develop the basics of differentiability of vector-valued func-

tions f = (f1 , . . . , fm ) : Rn → Rm . It turns out that, as with continuity, it is enough to
know how to differentiate the component functions fi : Rn → R, i.e. how to differentiate
real-valued functions of several variables. Of course, with more dimensions come more
ways to differentiate. We can differentiate in different directions as well as in some overall
sense, and these are related in a specific way, namely by a matrix of partial derivatives.
The necessary hypotheses are spelled out in great detail, and you should study these
carefully. They are important. Also, the notation and use of matrices, though standard
in math, differ somewhat from the textbook. These notes are a supplement, of course,
but I believe they give a unified view of differentiation, and add some important features
that are missing from the book. Namely, we see that we may take directional derivatives
of vector-valued, and not just real-valued, functions, and additionally we give a slick
presentation of the chain rule in terms of total derivatives, which has the side benefit
of explaining why, componentwise, in terms of partial derivatives, the chain rule is a
sum–because that’s how matrix products work! The theorems are stated first, and their
proofs are postponed to the Appendix at the end, because they are technical and would
probably clutter the notes with unnecessary detail–but for those curious to see why the
theorems are true, they are there at the back.

1 The Total Derivative

The thing we want to do now is to locally appproximate a complicated function f : Rn → Rm

by a much simpler linear function. This is the idea behind the total derivative of f at a point
a in Rn . Formally, we say that f is differentiable at a point a ∈ Rn if there exists an
m × n matrix (m and n here depend on the domain and range of f !)
 
m11 ··· m1n
 .. .. .. 
Df (a) =  . . .  (a real m × n matrix) (1.1)
mm1 ··· mmn

called the total derivative (or the Jacobi matrix), which satisfies the following limit
condition:
|f (a + h) − f (a) − Df (a)h|
lim =0 (1.2)
h→0 |h|

Equivalently, f must locally be approximated by a linear function, that is

f (a + h) ≈ Df (a)h + f (a) (1.3)

where the error in the approximation

E(h) = f (a + h) − linear approximation of the value of f at x = a + h

= f (a + h) − Df (a)h + f (a)

1
satisfies
|E(h)|
lim =0 (1.4)
h→0 |h|

This last statement, (1.4), is obviously the same statement as (1.2).

Remark 1.1 The expression Df (a)h denotes matrix multiplication. Here, h = (h1 , . . . , hn )
is a vector in Rn thought of as an n × 1 column vector:
    
m11 · · · m1n h1 m11 h1 + · · · + m1n hn
Df (a)h =  ... .. ..   ..  =  ..
 
. .  .   . 
mm1 ··· mmn hn mm1 h1 + · · · + mmn h1

Remark 1.2 Just as in the single variable case, where h = ∆x = x − a, so, too, here:
 
x1 − a1
h = ∆x = x − a = 
 .. 
. 
xn − an

Thus, the idea is that going a distance of h away from a, to the point x, we may approximate
the value of f (x) = f (a + h) by the “tangent plane” Df (a)∆x. Referring to the notes on
Points, Vectors and Matrices, this approximation, Df (a)∆x, is literally the tangent plane
approximation in the real-valued case of m = 1, for then
 
x1 − a1
Df (a)∆x = m1 . . . mn 
 .. 
. 
xn − an
= m1 (x1 − a1 ) + · · · + mn (xn − an )
= m1 x1 + m2 x2 + · · · + mn xn + d

where d = −m1 a1 − · · · − mn an , which is precisely the expression for an n-dimensional plane.

For if this equals some number, k, then letting d0 = d − k we get the equation of the plane,
m1 x1 + m2 x2 + · · · + mn xn + d0 = 0. In the general case, where m may be greater than 1, we
have some higher-dimensional analog of the plane.

Remark 1.3 Thus to say that a function f is differentiable at a point is equivalent to saying
that f has a total derivative there. We shall see that f may be partially differentiable, and
to have directional derivatives in all directions, yet not be differentiable. We will explain this
further below.

2
2 The Directional Derivative

Suppose v is a ‘vector’ in Rn and a is a ‘point’ in Rn , and let T : R → Rn be the translation-

by-tv function
T (t) = x + tv
We say that f : Rn → Rm has a directional derivative at a in the direction of v if the
composition f ◦ T : R → Rm , (f ◦ T )(t) = f (a + tv), is “differentiable at 0”, in the sense that
the limit
d
fv (a) or Dv f (a) = f (a + tv)
dt t=0
(2.1)
f (a + tv) − f (a)
= lim
t→0 t
exists in Rm .

Remark 2.1 Notice that for each fixed nonzero t the difference f (a + tv) − f (a) in the
numerator is a vector in Rm , while 1/t is a real number, so the quotient f (a+tv)−f
t
(a)
is
actually scalar multiplication of the vector f (a + tv) − f (a) by 1/t.
Another thing to notice is that in the sum a + tv we added a point to a vector. Since we have
emphasized blurring the lines between points and vectors in Rn , on account of algebraically
they are indistinguishable, at least in Rn , the sum makes sense.

3 The Partial Derivative

Now suppose f is a real-valued function, f : Rn → R. The ith partial derivative of f at

a is the directional derivative of f at a in a coordinate direction, that is in the direction of a
unit coordinate vector ei = (0, . . . , i, . . . , 0),

∂f d
or fxi (a) or Di f (a) = f (a + tei )
∂xi x=a dt t=0
f (a + tei ) − f (a)
= lim
t→0 t
f (a1 , . . . , ai−1 , ai + t, ai+1 , . . . , an ) − f (a1 , . . . , ai , . . . , an )
= lim
t→0 t
(3.1)

Remark 3.1 The practical import of this definition will become clear in a minute. For now,
d
notice that the derivative dt t=0
f (a + tei ) is an ordinary derivative from Calc 1. It’s the
derivative of the real-valued function of a real variable f ◦ T : R → R, (f ◦ T )(t) = f (a + tei ).
This means that all the other coordinates, which we normally treat as variables, since they
may vary, are treated here as constants. Thus, if we label the variables x1 , . . . , xi , . . . , xn ,
all the other xj for j 6= i are treated as constants in any expression for f . For example, if
f (x, y, z) = xyz + x2 y + z 2 , in the partial derivative ∂f
∂x with respect to x the ‘variables’ y
and z are treated as constants, so we can do what we normally do when computing a Calc 1
derivative, pull the constants out. Here, for example, we’d have ∂f ∂x = yz + 2xy.

3
4 The Relationship Between the Total and Directional
and Partial Derivatives

Nobody wants to compute an actual limit, though the limit idea is extremely important
theoretically. Luckily, we don’t have to here. The directional derivative, though defined in
terms of a limit, is in fact computable in terms of a matrix product!

Theorem 4.1 If f : Rn → Rm is differentiable at a in Rn , then all of its directional deriva-

tives at a exist, and for any choice of vector v in Rn we have

Dv f (x) = Df (x)v (4.1)

The left-hand side is a limit, while the right-hand side is a matrix product, with v treated as
a column vector.

Example 4.2 Suppose f : R3 → R2 is differentiable at the point a = (1, 1, 2) and v =

(−1, 4, 2) is a vector in R3 . If we already knew the total derivative of f , say

2 3 2
Df (a) =
1 0 5

then computing the directional derivative of f at a in the direction of v would be easy, namely
 
−1
2 3 2   14
fv (a) = Df (a)v = 4 =
1 0 5 9
2

Thus, if f is differentiable at a, the task is to find a way to compute Df (a). For

then we can compute all directional derivatives by simple matrix multiplication. Luckily, we
can do this, using the previous theorem, provided we know the that the total derivative Df (a)
has rows consisting of the total derivatives of the component functions, that is
   
f1 (a) Df1 (a)
D  ...  =  ..
   
. 
fm (a) Dfm (a)
| {z }
Df (a)

In this case I claim that the total derivative Df (a) is the matrix of partial derivatives
of the component functions fi of f ,

∂f1 ∂f1
 
 ∂x1 ···
 a ∂xn a

Df (a) =  ... .. .. (4.2)
 
 . . 

 ∂fm ∂fm 
···
∂x1 a ∂xn a

To see this, take a real-valued function first, f : Rn → R (for any vector-valued function
as above is made up of its m real-valued component functions) and look at the directional

4
derivative in the ith coordinate direction ei = (0, . . . , 1, . . . , 0) (it has a 1 in the ith slot and
0 everywhere else). Letting

Df (a) = m1 m2 · · · mn

be the 1×n matrix defining the total derivative of f , and noting that the ith partial derivative
is the directional derivative in the ith coordinate direction, we have
 
0
 .. 
.
 
∂f
= Dei f (a) = Df (a)ei = m1 m2 · · · mn  1

∂xi a .
 .. 
0
= m1 · 0 + m2 · 0 + · · · mi · 1 + · · · mn · 0
= mi

This is true for each i = 1, . . . , n, so

∂f ∂f ∂f
Df (a) = m1 m2 ··· mn = ···
∂x1 ∂x2 ∂xn

Now consider a vector-valued function f : Rn → Rm , f = (f1 , . . . , fm ). Then, each of its

component functions fi is a real-valued function, and the above result applies separately to
each, from which we get our result,
∂f1 ∂f ∂f1
 

f1 (a)
 
Df1 (a)
 · · ·
 ∂x1 ∂x2 ∂xn 
 ..   .
..
 . . . .. 
Df (a) = D  .  =   =  .. .. ..

. 
 
 ∂f ∂f ∂fm 
fm (a) Dfm (a) m
···
∂x1 ∂x2 ∂xn
The only questionable thing about this is the legality of the second equality, where we “pulled
the D inside the column vector.” It was, in fact, legal, and moreover our ability to do this
gives us another useful way to decide the differentiability of a function. Let us state and prove
this result carefully.

Theorem 4.3 A function f : Rn → Rm , f = (f1 , . . . , fn ), is differentiable at a if and only

if each of its component functions fi : Rn → R is differentiable at a. In that case, we have
   
f1 (a) Df1 (a)
Df (a) = D  ...  =  .. (4.3)
   
. 
fm (a) Dfm (a)

That is, to compute Df (a) we can just compute

the 1 × n derivative
matrices of the fi first,
∂fi ∂fi
which we know are of the form Dfi (a) = ∂x 1 a
· · · ∂xn a , and enter the result into the
ith row of the larger matrix.

Now that we know how to compute Df (a) if we know that Df (a) exists, we have to answer the
question, “How do we determine the existence of Df (a)?”. Well, we have seen that it
boils down do determining the existence of the m separate total derivatives of the component

5
functions Dfi (a). The remaining question, therefore, is, “How do we determine the
existence of the m separate total derivatives Dfi (a) of the component functions
∂fi
fi ?” The naı̈ve answer is, “Well, just compute the partials ∂x j
at a of each fi and put them in
a matrix,” unfortunately, is not entirely correct. It would be if we knew that the partials were
also continuous on a neighborhood of the point a, but not otherwise. Here is an example
∂fi
of why the existence of the partials ∂x j
at a alone is not enough to conclude the
existence of Df (a) (we must also have their continuity):

Example 4.4 Consider the function f : R2 → R given by

 2
 x y , if (x, y) 6= (0, 0)
f (x, y) = x + y 2
4

0, if (x, y) = (0, 0)


First, notice that all its directional derivatives exist at the origin, for if v = (h, k) is any
vector in R2 , then the directional derivative Dv f (0) is computable directly:
 2
f (0 + tv) − f (0) 2
(th) (tk) − 0 1 3 2
t h k h
if k 6= 0
Dv f (0) = lim = lim · = lim = k
t→0 t t→0 (th)4 + (tk)2 t 3 2 4
t→0 t (t h + k 2 ) 0 if k = 0

In particular, choosing v = e1 = (1, 0) and v = e2 = (0, 1) shows that it has partial derivatives
∂f ∂f
∂x |(0,0) = ∂y |(0,0) = 0 at the origin. Outside the origin it is easily seen to be partially
differentiable, and its partial derivatives exist everywhere on R2 , and are given by
 4 2 5
∂f  (x + y )2xy − 4x y , if (x, y) 6= (0, 0)
= 4
(x + y )2 2
∂x 
(0, 0), if (x, y) = (0, 0)
 4 2 2 2 2
∂f  (x + y )x − 2x y , if (x, y) 6= (0, 0)
= (x4 + y 2 )2
∂y
(0, 0), if (x, y) = (0, 0)


Thus, f is partially differentiable everywhere in R2 . However, f is not differentiable at the

origin, in fact it is not even continuous there. For notice that on the parabola y = x2 the
function is constant with value 1/2:

h4 1
f (h, h2 ) = =
2h4 2
so that arbitrarily close to the origin there are points for which f (x, y) = 1/2, while f (0, 0) = 0.
On the other hand, along any straight line y = mx the function satisfies

mx3 mx
f (x, mx) = = 2
x2 (x2 + m2 ) x + m2

so f approaches 0 along straight lines. By one of your homework problems, however, all
differentiable functions must be continuous, so we conclude that f is not differentiable at the
origin. (We prove that differentiability implies continuity below!)
∂f ∂f
The problem here, of course, is that the partials ∂x and ∂y are not continuous at the origin.
∂f 2
For example, ∂xapproaches 0 along the parabola y = x while it diverges to −∞ along the
line y = x. (Check this!)

6
Remark 4.5 The problem point (0, 0) isn’t special. We could make any point a problem
point, for example (1, 5), by translating the above example function by (1, 5), i.e. by consid-
(x−1)2 (y−5)
ering f (x, y) = (x−1) 4 +(y−5)2 when (x, y) 6= (1, 5) and f (0, 0) = (0, 0).

∂fi
OK, so now we know that the mere existence of the partials ∂xj a of f = (f1 , . . . , fm ) isn’t
∂fi
enough to ensure the existence of Df (a). What we need is the continuity of the partials ∂xj on
a neighborhood of a.

∂fi
Theorem 4.6 Let f : Rn → Rm . If all the partial derivatives ∂xj a of f exist and are
continuous at a, then f is differentiable at a.

Conclusion: If we know that the component functions of f = (f1 , . . . , fm ) are each

continuously differentiable on a neighborhood of our point a in Rn , then we know
that the fi , and therefore f itself, are differentiable, and the total derivative Df (a)
∂fi
is in fact the m × n matrix of partial derivatives ∂x j a
!

Example 4.7 Let f : R3 → R2 be given by f (x, y, z) = (x2 + y − z, exy sin z + xz). Then
f1 (x, y, z) = x2 +y−z and f2 (x, y, z) = exy sin z+xz are each clearly continuously differentiable
in each partial derivative (for example, ∂f n
∂x = 2x is continuous on all of R ). Therefore, f is
1

differentiable and, say at a = (1, 1, 2), we have

∂f1 ∂f1 ∂f1

 
 ∂x (1,1,2) ∂y (1,1,2) ∂z (1,1,2) 

Df (1, 1, 2) = 
 ∂f2 ∂f2 ∂f2 
∂x (1,1,2) ∂y (1,1,2) ∂z (1,1,2)
!
2x (1,1,2) 1 (1,1,2) −1 (1,1,2)
= xy xy xy
ye sin z + z (1,1,2) xe sin z (1,1,2) e cos z + x (1,1,2)

2 1 −1
=
e sin 2 + 2 e sin 2 e cos 2 + 1

Moreover, if v = (−1, 4, 2) is a vector in R3 , then we can compute the directional derivative

of f at (1, 1, 2) in the direction of v by simple matrix multiplication:
 
−1
D(−1,4,2) f (1, 1, 2) = Df (1, 1, 2)  4 
2
 
−1
2 1 −1  4
=
e sin 2 + 2 e sin 2 e cos 2 + 1
2

0
=
3e sin 2 + 2e cos 2

7
5 Further Properties of the Total and Partial Derivative

Theorem 5.1 (Chain Rule I) Let f : Rn → Rm and g : Rm → Rp be functions such that

g ◦ f is defined (i.e. the image of f is contained in the domain of g). If f differentiable
at a ∈ Rn and g is differentiable at b = f (a), then their composite g ◦ f : Rn → Rp is
differentiable at a and their derivative is a matrix product, namely the product of their two
respective total derivatives,

D(g ◦ f )(a) = Dg f (a) · Df (a) (5.1)

The components of the matrix D(g ◦ f )(a) in (6.7) may explicitly be given by the formulas:
∂(g ◦ f )i ∂gi ∂f1 ∂gi ∂fm
= + ··· + (5.2)
∂xj a ∂y1 b ∂xj a ∂ym b ∂xj a

or, if we let zi := gi (y1 , . . . , ym ) and yk := fk (x1 , . . . , xn ),

∂zi ∂zi ∂y1 ∂zi ∂ym

= + ··· + (5.3)
∂xj ∂y1 ∂xj ∂ym ∂xj

Theorem 5.2 (Clairaut: Equality of Mixed Partial Derivatives) If f : Rn → Rm

has twice continuously differentiable partial derivatives or equivalently if for all 1 ≤ i, j ≤ n the
2 2
partial derivatives ∂x∂i ∂x
f
j
and ∂x∂j ∂x
f
i
exist on a neighborhood of a point a and are continuous
at a, then
∂2f ∂2f
= (5.4)
∂xi ∂xj a ∂xi ∂xj a

for all 1 ≤ i, j ≤ n. This is also frequently denoted fxi xj (a) = fxj xi (a).

Remark 5.3 Failure of continuity at a may lead to inequality of the mixed partials at a.
Consider the function f : R2 → R given by
 3 3
 x y − xy , if (x, y) 6= (0, 0)
f (x, y) = 2
x +y 2

0, if (x, y) = (0, 0)


Then,
 2 2 2 3 3 3
∂f  (x + y )(3x y − y ) − 2x(x y − xy ) , if (x, y) 6= (0, 0)
= (x2 + y 2 )4
∂x
0, if (x, y) = (0, 0)

 4 2 3 2 3 5 4 2 3
 3x y − x y + 3x y − y − 2x y + 2x y , if (x, y) 6= (0, 0)
= 2
(x + y ) 2 4

0, if (x, y) = (0, 0)

 4 2 3 5
 x y + 4x y − y , if (x, y) 6= (0, 0)
= (x2 + y 2 )4
0, if (x, y) = (0, 0)


8
and
 2 2 3 2 3 3
∂f  (x + y )(x − 3xy ) − 2y(x y − xy ) , if (x, y) 6= (0, 0)
= (x2 + y 2 )4
∂y
0, if (x, y) = (0, 0)

 5 3 2 3 2 4
 x − 3x y + x y − 3xy , if (x, y) 6= (0, 0)
= (x2 + y 2 )4
0, if (x, y) = (0, 0)

 5 3 2 4
 x − 4x y − xy , if (x, y) 6= (0, 0)
= 2
(x + y )2 4

0, if (x, y) = (0, 0)


Therefore, for a, b 6= 0 we have

∂f −5b5 ∂f a5
= = −b = =a
∂x (0,b) b4 ∂y (a,0) a4

and consequently
∂f ∂f
∂2f ∂x (0,t) − ∂x (0,0) −t − 0
= lim = lim = −1
∂y∂x (0,0)
t→0 t t→0 t
∂f ∂f
∂2f ∂y (t,0) − ∂y (0,0) t−0
= lim = lim =1
∂x∂y (0,0)
t→0 t t→0 t

∂2f ∂2f
and so ∂y∂x 6= ∂x∂y . The problem, of course, is the discontinuity of the second
(0,0) (0,0)
derivatives at (0, 0):
( 2 22 4
(x +y ) (5x −12x2 y 2 −y 4 )−2(x2 y 2 )2x(x5 −4x3 y 2 −xy 4 )
∂2f (x2 +y 2 )4 , if (x, y) 6= (0, 0)
=
∂x∂y 0, if (x, y) = (0, 0)
( 2 22 4
(x +y ) (x +12x2 y 2 −5y 4 )−2(x2 y 2 )2y(x4 y+4x2 y 3 −y 5 )
∂2f (x2 +y 2 )4 , if (x, y) 6= (0, 0)
=
∂y∂x 0, if (x, y) = (0, 0)
2
∂ f
For example, along the line x = y we have ∂x∂y = 2(1 − x), so it approaches a value of 2,
while along the line x = 0 it stays constant at 1, as noted above.

9
6 Appendix: Proofs of the Theorems

Theorem 6.1 If f : Rn → Rm is differentiable at a in Rn , then all of its directional deriva-

tives at a exist, and for any choice of vector v in Rn we have

Dv f (x) = Df (x)v (6.1)

The left-hand side is a limit, while the right-hand side is a matrix product, with v treated as
a column vector.

Proof: Since f is differentiable at a, fix v and consider h = tv for some sufficiently small
t ∈ R. Applying the linear approximation (1.3) and the linearity of the derivative Df (a) (i.e.
Df (a)(ax + by) = aDf (a)x + bDf (a)y) we get

f (x + tv) − f (x) − tDf (x)v = f (x + tv) − f (x) − Df (x)(tv) (6.2)

= f (x + h) − f (x) − Df (x)h
= E(h)
= E(tv)

and applying the limit (1.4)

|E(tv)| |E(tv)| |E(tv)| |E(h)|

lim = lim · |v| = lim · |v| = lim · |v| = 0 · |v| = 0
t→0 |t| t→0 |t||v| t→0 |tv| h→0 |h|

By (6.2) this means

|f (x + tv) − f (x) − tDf (x)(v)|
lim =0
t→0 t
and hence
f (x + tv) − f (x) f (x + tv) − f (x) tDf (x)(v)
lim − Df (x)(v) = lim − lim
t→0 t t→0 t t→0 t
f (x + tv) − f (x) − tDf (x)(v)
= lim
t→0 t
= 0

i.e.
f (x + tv) − f (x)
Dv f (x) = lim = Df (x)(v)
t→0 t

Theorem 6.2 A function f : Rn → Rm , f = (f1 , . . . , fn ), is differentiable at a if and only

if each of its component functions fi : Rn → R is differentiable at a. In that case, we have
   
f1 (a) Df1 (a)
Df (a) = D  ...  =  .. (6.3)
   
. 
fm (a) Dfm (a)

That is, to compute Df (a) we can just compute

the 1 × n derivative
matrices of the fi first,
∂fi ∂fi
which we know are of the form Dfi (a) = ∂x 1 a
· · · ∂xn a , and enter the result into the
ith row of the larger matrix.

10
Proof: This follows from the inequalities
√
|ai | ≤ |a| ≤ n max |ai |
1≤i≤n

for all i, since if f is differentiable at a, then the limit (1.2) exists, so the first inequality above
implies that the limit of zero exists in each of the coordinates, and so for each of the coordinate
functions. Indeed, by that limit we must have that Dfi (a) is the ith component function of
Df (a). Conversely, if√the component functions are differentiable at a, then multiplying the
limit (1.2) for fi by n and using the second inequality above we have that the limit (1.2)
for f holds as well (just choose the fi with maximum absolute value), and moreover we must
have that Dfi (x) are the coordinate linear functionals of Df (a) by the first inequality.

∂fi
Theorem 6.3 Let f : Rn → Rm . If all the partial derivatives ∂xj a of f exist and are
continuous at a, then f is differentiable at a.

Proof: By Proposition 4.3 it is enough to prove this for the component functions fi of
f . Indeed, let fi be a component function of f , and suppose it’s partial derivatives all
∂fi
exist and are continuous in a neighborhood of a. Then, since ∂x j a
moves only in the jth
coordinate direction, we need only hj = (0, . . . , hj , . . . , 0) in those directions. By the definition
∂fi
of continuity of ∂xj a
, for any ε > 0 we choose there is a δ > 0 such that if |h| = |hj | < δ
then
∂fi ∂fi
−
∂xj a+h ∂xj a ε
<
|hj | n
Let h be a point in Rn , so that h = h1 + · · · + hn using our notation above. By the Mean
Value Theorem from Calc 1, the continuity of f and the existence of the jth partial implies
the existence of a point a + hj + tj ej between a + hj and a + hj + ej such that

∂fi
f (a + hj ) − f (a) = hj (6.4)
∂xj x+hi +ti ei

(Note: in the jth coordinate, keeping all other coordinates fixed, fj is a real-valued function of
a single variable, so this works. Recall the MVT: If f is continuous on [a, b] and differentiable
on (a, b) then there is a point c between a and b such that f (b) − f (a) = f 0 (c)(b − a)!) As a
consequence, we have
n
∂fi ∂fi X ∂fi
fi (a + h) − fi (a) − ··· h = fi (a + h) − fi (a) − hj
∂x1 a ∂xj a ∂xj a
j=1
n n
X ∂fi X ∂fi
= hj − hi
j=1
∂xj a+h+tj ei j=1
∂xj a
n
X ∂fi ∂fi
≤ − |hj |
j=1
∂xj a+h+tj ei ∂xj a
n
X |hj |ε
<
j=1
n
≤ |h|ε

where the first inequality is from factoring out |hj | and then using the triangle inequality, the
second is by application of (6.4) for each j, and the third by observing that |h1 | + · · · + |hn | ≤

11
p p
h21 + · · · + h2n + · · · + h21 + · · · + h2n = n|h|. Dividing the above inequality through by |h|
gives our desired inequality,

∂fi ∂fi
fi (a + h) − fi (a) − ··· h
∂x1 a ∂xj a
<ε
|h|

We have thus demonstrated the limit

∂fi ∂fi
fi (a + h) − fi (a) − ··· h
∂x1 a ∂xj a
lim =0
h→0 |h|

which is the definition of differentiability,

and moreover,
in the course of the proof, we have
∂fi ∂fi
also shown that Df (a) = ··· as well!
∂x1 a ∂xj a

Proposition 6.4 (Hadamard) Let U ⊂ Rn be open and let f : U → Rm . Then, for any
x0 ∈ U the following are equivalent:

(1) f is differentiable at x0 .
(2) There exists a map ϕx0 : U → L(Rn , Rm ), continuous at x0 , such that for all x ∈ U we
have
f (x) = f (x0 ) + ϕx0 (x)(x − x0 ) (6.5)

If any of these conditions holds, moreover, then

Df (x0 ) = ϕx0 (6.6)

Proof: (1) ⇒ (2): Suppose f is differentiable at x0 , then there is an -function : Rn → Rm

satisfying f (x0 + h) = f (x0 ) + Df (x)h + (h), where limh→0 |(h)| |h|2
2
= 0, for all h with
x0 + h ∈ U . Define ϕx0 by
1

Df (x ) +
0 (x − x0 ) · (x − x0 )T , if x ∈ U \{x0 }
ϕx0 (x) = |x − x0 |22
Df (x0 ), if x = x0


where the product (x − x0 ) · (x − x0 )T is a matrix product, producing an m × n matrix,

associated with a linear map in L(Rn , Rm ). This map is therefore linear, being the sum of
two linear functions. Applying y = x − x0 we get that f (x) = f (x0 ) + ϕx0 (x)(x − x0 ). To
see that ϕx0 is continuous at x0 , note a fact of linear algebra: the operator norm satisfies
qP qP
|(h)hT | = ij ((h)hT
) ij = ij (h)i hj = |(h)|2 |h|2

|(h)hT | |(h)|2
so limh→0 |h|22
= limh→0 |h|2 = 0, so that limh→0 ϕx0 (x0 + h) = Df (x0 ).

(2) ⇒ (1): Conversely, suppose there is a ϕx0 : U → L(Rn , Rm ), continuous at x0 , such that
for all x ∈ U equation (6.6) holds. Then, by continuity we have that for all > 0 there is a
δ > 0 such that |h|2 = |x0 + h − x0 |2 < δ implies |ϕx0 (x0 + h) − ϕx0 (x0 )| < . Since Rn and
Rm are finite-dimensional, it is an easy matter to show that any T ∈ L(Rn , Rm ) is continuous,

12
and therefore bounded. Consequently we may use again that fact from linear algebra cited
above, and along with (6.5) we have

|f (x0 + h) − f (x0 ) − ϕx0 (x0 )(h)|2 (6.5) |ϕx0 (x0 + h)(h) − ϕx0 (x0 )(h)|2
=
|h|2 |h|2
|ϕx0 (x0 + h) − ϕx0 (x0 )||(h)|2
≤
|h|2
<
|f (x0 +h)−f (x0 )−ϕx0 (x0 )(h)|2
i.e. limh→0 |h|2 = 0, and f is differentiable at x0 .

Theorem 6.5 (Chain Rule I) Let f : Rn → Rm and g : Rm → Rp be functions such that

The components of the matrix D(g ◦ f )(a) in (6.7) may explicitly be given by the formulas:

∂(g ◦ f )i ∂gi ∂f1 ∂gi ∂fm

= + ··· + (6.8)
∂xj a ∂y1 b ∂xj a ∂ym b ∂xj a

or, if we let zi := gi (y1 , . . . , ym ) and yk := fk (x1 , . . . , xn ),

∂zi ∂zi ∂y1 ∂zi ∂ym

= + ··· + (6.9)
∂xj ∂y1 ∂xj ∂ym ∂xj

Proof: If f is differentiable at x0 , then by Hadamard’s lemma there is a an operator valued

function ϕ : U → L(Rn , Rm ), continuous at x0 , such that

f (x) − f (x0 ) = ϕx0 (x − x0 ), with lim ϕx0 (x) = Df (x0 ) (6.10)

x→x0

and similarly since g is differentiable at f (x0 ) there is a ψ : V → L(Rm , Rp ) such that

g(y) − g(f (x0 )) = ψy0 (y)(y − f (x0 )), with lim ψy0 (y) = Dg(f (x0 )) (6.11)
y→f (x0 )

Letting y = f (x) and substituting into (6.11) we get, by (6.10),

(g ◦ f )(x) − (g ◦ f )(x0 ) = ψy0 (f (x))(f (x) − f (x0 )) (6.12)

= ψy0 (f (x)) ◦ ϕx0 (x)(x − x0 )

By the second parts of (6.10) and (6.11) we have limx→x0 ψy0 (f (x)) ◦ ϕx0 (x) = Dg(f (x0 )) ◦
Df (x0 ). The linearity of Dg(f (x0 )) ◦ Df (x0 ) follows from that of ψy0 (f (x0 )) in (6.12),
so when we take the limit as x → x0 of (6.12) we get by Hadamard’s lemma again that
D(g ◦ f )(x0 ) = Dg(f (x0 )) ◦ Df (x0 ).

13
Theorem 6.6 (Clairaut: Equality of Mixed Partial Derivatives) If f : Rn → Rm
has twice continuously differentiable partial derivatives or equivalently if for all 1 ≤ i, j ≤ n the
2 2
partial derivatives ∂x∂i ∂x
f
j
and ∂x∂j ∂x
f
i
exist on a neighborhood of a point a and are continuous
at a, then
∂2f ∂2f
= (6.13)
∂xi ∂xj a ∂xi ∂xj a
for all 1 ≤ i, j ≤ n.
Proof: It will simplify notation a little if we write Dj instead of ∂/∂xj . In view of Proposition
4.3 it suffices to prove this for all component functions fk of f . Without loss of generality, we
may suppose that i < j. Let r : Rn → R be given by
fk (y) − fk (y1 , . . . , yi−1 , xi , yi+1 , . . . , yn ) − fk (y1 , . . . , yj−1 , xj , yj+1 , . . . , yn ) + fk (x)
r(y) =
(yi − xi )(yj − xj )
and define g : R → R by
g(t) = fk (y1 , . . . , yi−1 , t, yi+1 , . . . , yn ) − fk (y1 , . . . , yi−1 , t, yi+1 , . . . , yj−1 , xj , yj+1 , . . . , yn )
so that
g(yi ) − g(xi )
r(y) =
(yi − xi )(yj − xj )
We will show that both sides of (6.13) are equal to limy→x r(y). The denominator of r is the
area of the rectangle with vertices (yi , yj ), (xi , yi ), (yj , xj ) and (xi , xj ) in the i-j plane, while
the numerator is the alternating sum of the values of f at these vertices. Note that since the
partial derivatives of f (up to order 2), and so those of each component function fk of f , exist
on a neighborhood N ⊆ U of x, we have that g is differentiable on N . By the Mean Value
Theorem for R, there is a ξi between xi and yi such that

g(yi ) − g(xi ) g 0 (ξi )(yi − xi ) g 0 (ξi )

r(y) = = =
(yi − xi )(yj − xj ) (yi − xi )(yj − xj ) yj − xj
Di fk (y1 , . . . , yi−1 , ξi , yi+1 , . . . , yn ) − Di fk (y1 , . . . , yi−1 , ξi , yi+1 , . . . , yj−1 , xj , yj+1 , . . . , yn )
=
yj − xj
(6.14)
Define h : R → R is defined on a sufficiently small neighborhood of xj by
h(t) = Dj fk (y1 , . . . , yi−1 , ξi , yi+1 , . . . , yj−1 , t, yj+1 , . . . , yn )
then, again, h is differentiable and the Mean Value Theorem gives the existence of ξj between
xj and yj . Consequently, from (6.14) we get

h(yj ) − h(xj ) h0 (ξj )(yj − xj )

r(y) = = = h0 (ξj )
yj − xj yj − xj
= Dj Di f (y1 , . . . , yi−1 , ξi , yi+1 , . . . , yj−1 , ξj , yj+1 , . . . , yn )
Let ξ = ξ(y) = (y1 , . . . , yi−1 , ξi , yi+1 , . . . , yj−1 , ξj , yj+1 , . . . , yn ), and note that |ξ − x|2 ≤
|y − x|2 , so the continuity of Dj Di f at x implies
lim r(y) = lim Dj Di f (ξ) = Dj Di f (x)
y→x ξ→x

Reversing the roles of xi and xj above shows that, with ξ 0 probably different from ξ above,
that
lim r(y) = Di Dj f (x)
y→x

Real Analysis
No ratings yet
Real Analysis
49 pages
Differentiation Introduction
No ratings yet
Differentiation Introduction
37 pages
1 Eng Completare 05 2016 Differential Several Var
No ratings yet
1 Eng Completare 05 2016 Differential Several Var
13 pages
Multivariable Calculus: Inverse-Implicit Function Theorems: N N M F X
No ratings yet
Multivariable Calculus: Inverse-Implicit Function Theorems: N N M F X
11 pages
Differential Calculus For Vector Functions 1 Vector Functions of Variable
No ratings yet
Differential Calculus For Vector Functions 1 Vector Functions of Variable
11 pages
Lecture Note 5
No ratings yet
Lecture Note 5
9 pages
Notes
No ratings yet
Notes
21 pages
Functions of Several Variables: Unconstrained Extrema: N K N N N H 0 N N
No ratings yet
Functions of Several Variables: Unconstrained Extrema: N K N N N H 0 N N
5 pages
1 Linear Transformations and Their Matrix Repre-Sentations
No ratings yet
1 Linear Transformations and Their Matrix Repre-Sentations
9 pages
18.024 SPRING OF 2008 Der. Derivatives of Vector Fields
No ratings yet
18.024 SPRING OF 2008 Der. Derivatives of Vector Fields
5 pages
Chap 4
No ratings yet
Chap 4
9 pages
Gradients
No ratings yet
Gradients
3 pages
Lesson Plan GR 8 Mathematics Graphs
No ratings yet
Lesson Plan GR 8 Mathematics Graphs
34 pages
4 Topics in Calculus: 4.1 Transformations From To
No ratings yet
4 Topics in Calculus: 4.1 Transformations From To
6 pages
Calculus Ii: Chapter 1: Functions of Several Variables
No ratings yet
Calculus Ii: Chapter 1: Functions of Several Variables
31 pages
Multivariatecalculus
No ratings yet
Multivariatecalculus
16 pages
MATH3031 CH 2 Notes
No ratings yet
MATH3031 CH 2 Notes
22 pages
Math-Chapter 6
No ratings yet
Math-Chapter 6
4 pages
Real Analysis II
No ratings yet
Real Analysis II
25 pages
Multi Var Lockdown 14
No ratings yet
Multi Var Lockdown 14
94 pages
ch2 Diff
No ratings yet
ch2 Diff
5 pages
Chapter 3 (Annotated - 1)
No ratings yet
Chapter 3 (Annotated - 1)
26 pages
Vishwambhar Pati: H 0 DF DX
No ratings yet
Vishwambhar Pati: H 0 DF DX
128 pages
Partial and Directional Derivatives, Differentiability: Rafikul Alam Department of Mathematics IIT Guwahati
No ratings yet
Partial and Directional Derivatives, Differentiability: Rafikul Alam Department of Mathematics IIT Guwahati
20 pages
Multivariable Calculus: 1. The Derivative
No ratings yet
Multivariable Calculus: 1. The Derivative
17 pages
Directional Derivatives and The Gradient: Remark
No ratings yet
Directional Derivatives and The Gradient: Remark
4 pages
Slides 09-2023
No ratings yet
Slides 09-2023
38 pages
Differentiation in Several Variables
No ratings yet
Differentiation in Several Variables
12 pages
Directional Derivative
No ratings yet
Directional Derivative
11 pages
Properties of Differentiable Functions
No ratings yet
Properties of Differentiable Functions
20 pages
Maths
No ratings yet
Maths
6 pages
Different I Ability
No ratings yet
Different I Ability
2 pages
Note - 2 Math Camp
No ratings yet
Note - 2 Math Camp
12 pages
Section 5: The Jacobian Matrix and Applications
No ratings yet
Section 5: The Jacobian Matrix and Applications
28 pages
Lecture 9
No ratings yet
Lecture 9
13 pages
Different I Ability
No ratings yet
Different I Ability
3 pages
Derivative Many Variables
No ratings yet
Derivative Many Variables
25 pages
Differentiation Theory
No ratings yet
Differentiation Theory
6 pages
Chapter 5 - Derivation-7
No ratings yet
Chapter 5 - Derivation-7
81 pages
Engineering Analysis 1 PDF Filename
100% (1)
Engineering Analysis 1 PDF Filename
77 pages
PartialDerivative and Polar Coordinates
No ratings yet
PartialDerivative and Polar Coordinates
7 pages
Elijah's Math Notes
No ratings yet
Elijah's Math Notes
58 pages
Chap14 Sec6
No ratings yet
Chap14 Sec6
107 pages
MTH 311 Module 3
No ratings yet
MTH 311 Module 3
22 pages
Differential Forms - No Untitled
No ratings yet
Differential Forms - No Untitled
161 pages
ASO Introduction To Manifolds
No ratings yet
ASO Introduction To Manifolds
36 pages
Calculus With Vectors and Matrices
No ratings yet
Calculus With Vectors and Matrices
16 pages
Real Analysis (해석학2) 2024
No ratings yet
Real Analysis (해석학2) 2024
14 pages
From Multivariable Calculus To Gateaux and Frechet Derivatives PDF
No ratings yet
From Multivariable Calculus To Gateaux and Frechet Derivatives PDF
4 pages
Differentiability For Multivariable Functions
No ratings yet
Differentiability For Multivariable Functions
7 pages
Geg 311 Vector Functions
No ratings yet
Geg 311 Vector Functions
19 pages
AMA286 Supplementary Notes: September 3, 2008
No ratings yet
AMA286 Supplementary Notes: September 3, 2008
30 pages
Lecture 5 Full
No ratings yet
Lecture 5 Full
10 pages
Class 11 MATHS
No ratings yet
Class 11 MATHS
2 pages
Application of Math Principles To Engineering PDF
33% (3)
Application of Math Principles To Engineering PDF
333 pages
Ehsan Math TP
No ratings yet
Ehsan Math TP
10 pages
Mu Questions Successive Differentiation
No ratings yet
Mu Questions Successive Differentiation
4 pages
Ga Fluid Dynamics
No ratings yet
Ga Fluid Dynamics
10 pages
Analysis Distribution TH Lectures
No ratings yet
Analysis Distribution TH Lectures
79 pages
Ade NPTL Notes
No ratings yet
Ade NPTL Notes
207 pages
Chapter 14
No ratings yet
Chapter 14
27 pages
q3 Week 4 Stem g11 Basic Calculus
No ratings yet
q3 Week 4 Stem g11 Basic Calculus
12 pages
Quora
No ratings yet
Quora
6 pages
Syllabus 3rd
No ratings yet
Syllabus 3rd
4 pages
Optimization For Machine Learning
No ratings yet
Optimization For Machine Learning
45 pages
SYLLABUS - Math 113 Differential Calculus
No ratings yet
SYLLABUS - Math 113 Differential Calculus
12 pages
CH 13 - Limits Derivatives PDF
No ratings yet
CH 13 - Limits Derivatives PDF
69 pages
Syllabus Sem-1
No ratings yet
Syllabus Sem-1
84 pages
Relation and Function Case Study 1
No ratings yet
Relation and Function Case Study 1
46 pages
MTH632 Mid Term Solved Subjective
No ratings yet
MTH632 Mid Term Solved Subjective
12 pages
Differential Geometry
100% (1)
Differential Geometry
161 pages
Introduction To Diff Erential Equations Jeff Rey R. Chasnov
No ratings yet
Introduction To Diff Erential Equations Jeff Rey R. Chasnov
120 pages
CHPT 3 - Development of Truss Equations
No ratings yet
CHPT 3 - Development of Truss Equations
69 pages
Integration Concepts/Formula.: Module in Integral Calculus
No ratings yet
Integration Concepts/Formula.: Module in Integral Calculus
41 pages
HBC2110 Management Maths I
No ratings yet
HBC2110 Management Maths I
88 pages
22mat - 11 Module 1&2
No ratings yet
22mat - 11 Module 1&2
2 pages
Lecture Notes On Multivariable Calculus
No ratings yet
Lecture Notes On Multivariable Calculus
36 pages
ENGR 233 Outline
No ratings yet
ENGR 233 Outline
5 pages
Integral (Almost Done)
No ratings yet
Integral (Almost Done)
39 pages
Basic Calculus Week 8
No ratings yet
Basic Calculus Week 8
7 pages
Application of Fluid Mechanics in Daily Life
No ratings yet
Application of Fluid Mechanics in Daily Life
141 pages
All Calculus I Milestones PDF
No ratings yet
All Calculus I Milestones PDF
54 pages
Advance Cal Unit 1 2
No ratings yet
Advance Cal Unit 1 2
34 pages
MAT 1320 DGD Workbook
No ratings yet
MAT 1320 DGD Workbook
120 pages
Lect3 UWA PDF
No ratings yet
Lect3 UWA PDF
73 pages
Differenciation Calculus
No ratings yet
Differenciation Calculus
18 pages
01.14.pyramidal Implementation of The Lucas Kanade Feature Tracker - Description of The Algorithm
No ratings yet
01.14.pyramidal Implementation of The Lucas Kanade Feature Tracker - Description of The Algorithm
9 pages

Differentiability 2

Uploaded by

Differentiability 2

Uploaded by

Differentiability

In this section we try to develop the basics of differentiability of vector-valued func-

1 The Total Derivative

The thing we want to do now is to locally appproximate a complicated function f : Rn → Rm

Equivalently, f must locally be approximated by a linear function, that is

f (a + h) ≈ Df (a)h + f (a) (1.3)

where the error in the approximation

This last statement, (1.4), is obviously the same statement as (1.2).

where d = −m1 a1 − · · · − mn an , which is precisely the expression for an n-dimensional plane.

Suppose v is a ‘vector’ in Rn and a is a ‘point’ in Rn , and let T : R → Rn be the translation-

3 The Partial Derivative

Now suppose f is a real-valued function, f : Rn → R. The ith partial derivative of f at

Theorem 4.1 If f : Rn → Rm is differentiable at a in Rn , then all of its directional deriva-

Dv f (x) = Df (x)v (4.1)

Example 4.2 Suppose f : R3 → R2 is differentiable at the point a = (1, 1, 2) and v =

Thus, if f is differentiable at a, the task is to find a way to compute Df (a). For

This is true for each i = 1, . . . , n, so

Now consider a vector-valued function f : Rn → Rm , f = (f1 , . . . , fm ). Then, each of its

Theorem 4.3 A function f : Rn → Rm , f = (f1 , . . . , fn ), is differentiable at a if and only

That is, to compute Df (a) we can just compute

Example 4.4 Consider the function f : R2 → R given by

Thus, f is partially differentiable everywhere in R2 . However, f is not differentiable at the

Conclusion: If we know that the component functions of f = (f1 , . . . , fm ) are each

differentiable and, say at a = (1, 1, 2), we have

∂f1 ∂f1 ∂f1

Moreover, if v = (−1, 4, 2) is a vector in R3 , then we can compute the directional derivative

Theorem 5.1 (Chain Rule I) Let f : Rn → Rm and g : Rm → Rp be functions such that

or, if we let zi := gi (y1 , . . . , ym ) and yk := fk (x1 , . . . , xn ),

∂zi ∂zi ∂y1 ∂zi ∂ym

Theorem 5.2 (Clairaut: Equality of Mixed Partial Derivatives) If f : Rn → Rm

Therefore, for a, b 6= 0 we have

Theorem 6.1 If f : Rn → Rm is differentiable at a in Rn , then all of its directional deriva-

Dv f (x) = Df (x)v (6.1)

f (x + tv) − f (x) − tDf (x)v = f (x + tv) − f (x) − Df (x)(tv) (6.2)

and applying the limit (1.4)

|E(tv)| |E(tv)| |E(tv)| |E(h)|

By (6.2) this means

Theorem 6.2 A function f : Rn → Rm , f = (f1 , . . . , fn ), is differentiable at a if and only

That is, to compute Df (a) we can just compute

We have thus demonstrated the limit

which is the definition of differentiability,

If any of these conditions holds, moreover, then

Df (x0 ) = ϕx0 (6.6)

Proof: (1) ⇒ (2): Suppose f is differentiable at x0 , then there is an -function  : Rn → Rm

where the product (x − x0 ) · (x − x0 )T is a matrix product, producing an m × n matrix,

Theorem 6.5 (Chain Rule I) Let f : Rn → Rm and g : Rm → Rp be functions such that

∂(g ◦ f )i ∂gi ∂f1 ∂gi ∂fm

or, if we let zi := gi (y1 , . . . , ym ) and yk := fk (x1 , . . . , xn ),

∂zi ∂zi ∂y1 ∂zi ∂ym

Proof: If f is differentiable at x0 , then by Hadamard’s lemma there is a an operator valued

f (x) − f (x0 ) = ϕx0 (x − x0 ), with lim ϕx0 (x) = Df (x0 ) (6.10)

and similarly since g is differentiable at f (x0 ) there is a ψ : V → L(Rm , Rp ) such that

Letting y = f (x) and substituting into (6.11) we get, by (6.10),

(g ◦ f )(x) − (g ◦ f )(x0 ) = ψy0 (f (x))(f (x) − f (x0 )) (6.12)

g(yi ) − g(xi ) g 0 (ξi )(yi − xi ) g 0 (ξi )

h(yj ) − h(xj ) h0 (ξj )(yj − xj )

You might also like

Proof: (1) ⇒ (2): Suppose f is differentiable at x0 , then there is an -function : Rn → Rm

where the product (x − x0 ) · (x − x0 )T is a matrix product, producing an m × n matrix,