0% found this document useful (0 votes)

179 views5 pages

Math Cheat Sheet

This document provides a math cheatsheet covering basic probability and continuous optimization concepts: 1) It reviews key probability concepts like discrete and continuous random variables, and defines expected value using two equivalent forms. It also discusses linearity of expectation. 2) For continuous optimization, it discusses finding the global maximum of a single-variable function by finding its critical points and checking limits. It also covers constrained maximization over an interval by checking critical points and endpoints. 3) Examples are provided to illustrate computing expected value for uniform distributions using both forms, and the importance of integral bounds when using the second form. It also demonstrates finding the global maximum of quadratic and exponential functions.

Uploaded by

Jad Bechara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

179 views5 pages

Math Cheat Sheet

Uploaded by

Jad Bechara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

445 Math Cheatsheet

Below is a quick refresher on some math tools from 340 that we’ll assume knowledge of for the
PSets.

1 Basic Probability
1.1 Discrete random variables
A random variable is a variable whose value is uncertain (i.e. the roll of a die). If X is a random
variable that always takes non-negative, integer values, (we’ll refer to this as a discrete random
variable) then we can write the expected value of X as:
∞
X
Definition of expected value, form 1: E[X] = Pr[X = i] · i.
i=0

Probably the above definition is familiar to most of you already. Another way to compute the
expected value (which sometimes results in simpler calculations) is:
∞
X
Definition of expected value, form 2: E[X] = Pr[X > i].
i=0

Let’s quickly see why the two definitions are equivalent:

∞
X ∞ X
X
Pr[X > i] = Pr[X = j].
i=0 i=0 j>i
∞ X
X
= Pr[X = j]
j=0 i<j
∞
X
= j · Pr[X = j].
j=0

We obtain the second equality just by flipping the order of sums: the term Pr[X = j] is summed
once for every i < j. The third equality is obtained by just observing that there are exactly j non-
negative integers less than j.

1.2 Continuous random variables

Now let’s consider a continuous, non-negative random variable with probability density function
(PDF) f (·) and cumulative distribution function (CDF) F (·). What do all these words mean? You
should imagine the following mapping:
• Continuous just means that the random variable might take any non-negative value. For
instance, rather than the roll of a die, a random variable might be the number of seconds you
spend reading this sentence.

1
• The PDF is just a formal way of discussing the probability that X = x. Because the random
variable is continuous, the probability that X = x is actually zero for all x (what is the
probably that you spend exactly 3.4284203 seconds reading this sentence)? So we think
of dx as being infinitesimally small (the same dx from your calculus classes), and think of
Pr[X = x] as f (x)dx.

• The CDF of a random variable is simpler

Rx to define, and just denotes F (x) = Pr[X ≤ x].
Note that we therefore have F (x) = 0 f (y)dy (think of this as “summing” (integrating) over
all y ≤ x the probability that X = y (f (y)dy). Therefore F 0 (x) = f (x) (by fundamental
theorem of calculus).

So how do we take the expectation of a continuous random variable? We just need to map the
definitions above into the new language.

Z ∞
Definition of expected value, continuous random variables, form 1: E[X] = xf (x)dx.
0

You should parse exactly the same way as form 1 for discrete random variables, except we’ve
replaced the sum with an integral, and Pr[X = i] is now “f (x)dx ≈ Pr[X = x].” The equivalent
definition for form 2 is also often easier to use in calculations:

Z ∞
Definition of expected value, continuous random variables, form 2: E[X] = (1−F (x))dx.
0

If F (x) = Pr[X ≤ x], then 1 − F (x) = Pr[X > x], so this is the same as form 2 for discrete
random variables, except we’ve replaced the sum with an integral. For form 2, it is crucial that the
integral start below at 0, even when the random variable only takes values (say) > 1. We’ll see this
in examples below.

1.3 Examples
Consider the uniform distribution on the set {4, 5} (4 w.p. 1/2, 5 w.p. 1/2). Then the expected value
as computed by form 1 is:
∞
X
Pr[X = i] · i = 4 · 1/2 + 5 · 1/2 = 4.5.
i=0

The expected value as computed by form 2 is:

∞
X 3
X 4
X
Pr[X > i] = 1+ 1/2 = 4.5.
i=0 i=0 i=4

Now consider the uniform distribution on the interval [4, 5] (equally likely to be any real number
in [4, 5]). Then the PDF associated with this distribution is f (x) = 1, x ∈ [4, 5], f (x) = 0, x ∈ /
[4, 5]. And we can compute the expected value by form 1 as:
Z ∞ Z 5
xf (x)dx = xdx = x2 /2|54 = 25/2 − 8 = 4.5.
0 4
We can also compute it using form 2 as:

2
Z ∞ Z 4 Z 5 Z ∞
(1 − F (x))dx = 1dx + (x − 4)dx + 0dx = 4 + (x2 /2 − 4x|54 ) + 0 = 4.5.
0 0 4 5

Note that it is crucial that we started the integral at 0 and not 4 for form 2, otherwise we would
have incorrectly computed the expectation as .5 instead of 4.5. This isn’t crucial for form 1, since
all the terms in [0, 4] drop out anyway as f (x) = 0.

1.4 Linearity of Expectation

Linearity of expectation refers to the following simple, but surprisingly useful fact. Let X1 and X2
be two random variables. Then E[X1 + X2 ] = E[X1 ] + E[X2 ]. The proof is immediate from the
definitions above. We include the proof for the discrete case:

∞
X
E[X1 + X2 ] = Pr[X1 + X2 = i] · i
i=0
X∞ Xi
= Pr[X1 = j] · Pr[X2 = i − j] · i
i=0 j=0
X∞ X ∞
= Pr[X1 = j] · Pr[X2 = i − j] · i
j=0 i=j
X∞ ∞
X
= Pr[X1 = j] · Pr[X2 = `] · (` + j) (changing variables with ` = i − j)
j=0 `=0
∞ ∞
!
X X
= Pr[X1 = j] · j+ Pr[X2 = `] · `
j=0 `=0
X∞
= Pr[X1 = j] · (j + E[X2 ])
j=0
∞
X
= E[X1 ] + E[X2 ]. (because Pr[X1 = j] = 1)
j=0

2 Basic continuous optimization

2.1 Single-variable, unconstrained optimization
Say we want to find the global maximum of a continuous, differentiable function f (·). Any value
that is a global maximum must also be a critical point, point where f 0 (x) = 0. Not all critical
points are local optima, and not all local optima are local maxima, but all local maxima are critical
points. One also needs to confirm that f (·) indeed achieves its global maximum by examining
limx→±∞ f (x).
For example, say we want to find the global maximum of f (x) = x2 . There is a unique critical
point at x = 0. So if the function attains its global maximum, it must be at x = 0. However,
limx→∞ x2 = ∞, so the function doesn’t attain its global maximum.

3
Say we want to find the global maximum of f (x) = 4x − x2 . The derivative is 4 − 2x, so there
is a unique critical point at x = 2. So if there is a global maximum, it must be x = 2. We can verify
that limx→±∞ = −∞, so x = 2 must be the global maximum.1

2.2 Single-variable, constrained optimization

Say now we want to find the constrained maximum of a differentiable function f (·) over the interval
[a, b]. Now, any value that is the constrained maximum must either be a critical point, or an endpoint
of the interval. Here are a few approaches to find the constrained maximum:

• Find all critical points, compute f (a), f (b), f (x) for all critical points x and output the largest.

• Confirm that f 0 (a) > 0 (that is, f is increasing at a) and f 0 (b) < 0. This proves that neither
a nor b can be the global maximum. Then compute f (x) for all critical points x and output
the largest.

• In either of the above, rather than directly comparing f (x) to f (y), one can instead prove that
f 0 (z) ≥ 0 on the entire interval [x, y] to conclude that f (y) ≥ f (x).

• Prove that x is a global unconstrained maximum of f (·), and observe that x ∈ [a, b].

There are many other approaches. The point is that at the end of the day, you must directly or
indirectly compare all critical points and all endpoints. You don’t have to directly compute f (·)
at all of these values (the bullets above provide some shortcuts), but you must at least indirectly
compare them. For this class, it is OK to just describe your approach without writing down the
entire calculations (as in the following examples).
Say we want to find the constrained maximum of f (x) = x2 on the interval [3, 8]. f has no
critical points on this range, so the maximum must be either 3 or 8. f 0 (x) = 2x > 0 on this entire
interval, so therefore the maximum must be 8.
Say we want to find the constrained maximum of f (x) = 3x2 − x3 on the interval [−2, 3].
f 0 (x) = 6x − 3x2 , and therefore f has critical points at 0 and 2. So we need to (at least indirectly)
consider −2, 0, 2, 3. We see that f 0 (x) ≤ 0 on [−2, 0], so we can immediately rule out 0. We also
see that f 0 (x) ≤ 0 on [2, 3], so we can immediately rule out 3, and we only need to compare −2
and 2. We can also immediately see that f (−x) > f (x) for all x > 0, and therefore f (−2) is the
global constrained maximum.
Say we want to find the constrained maximum of f (x) = 4x − x2 on the interval [−8, 5]. We
already proved above that x = 2 is the global unconstrained maximum. Therefore x = 2 is also
the global constrained maximum on [−8, 5].

Warning! An incorrect approach. It might be tempting to try the following approach: First, find
all local maxima of f (·). Call this set X. Then, check to see which elements of X lie in [a, b].
Call them Y . Then, output the argmax of f (x) over all x ∈ Y . This approach does not work,
and in fact we already saw a counterexample. Say we want to find the constrained maximum of
f (x) = 3x2 − x3 on the interval [−2, 3]. Then f 0 (x) = 6x − 3x2 , and f has critical points at 0 and
2. We can verify that x = 0 is a local minimum and x = 2 is a local maximum. So x = 2 is the
unique local maximum, and it also lies in [−2, 3]. But, we saw that it’s incorrect to conclude that
therefore x = 2 is the constrained global maximum.
1
We can also verify that x = 2 is a local maximum by computing f 00 (2) = −2, but this isn’t necessary.

4
2.3 Multi-variable, unconstrained optimization
Say now we want to find the unconstrained global maximum of a differentiable multi-variate func-
tion f (·, ·, . . . , ·). Again, any value that is the unconstrained maximum must be a critical point,
where a critical point has ∂f∂x(~xi ) = 0 for all i. Again, not all critical points are local optima/maxima,
but all local maxima are definitely critical points. One also needs to confirm that f (·) indeed
achieves its global maximum by examining limits towards ∞. Doing this formally can sometimes
be tedious, but in this class we’ll only see cases where this is straight-forward.2 Sometimes, it
might also be helpful to think of some variables as being fixed, and solve successive single-variable
optimization problems. Here are some examples that you might reasonably need to solve:
Say you want to maximize f (x1 , x2 ) = x1 − x21 − x22 . We can immediately see that for any
x1 , f (x1 , x2 ) is maximized at x2 = 0 (this is what we mean by thinking of x1 as fixed and solving
a single-variable optimization problem for x2 ). Once we’ve set x2 = 0, we now just want to
maximize x1 − x21 , which is achieved at x1 = 1/2. So the unconstrained maximizer is (1/2, 0).
Say you want to maximize f (x1 , x2 ) = x1 x2 − x21 − x22 . We can again think of x1 as fixed and
∂f
see that ∂x 2
= x1 − 2x2 , and so for fixed x1 , the unique maximizer is at x2 = x1 /2. We can then
just optimize x1 (x1 /2) − x21 − (x1 /2)2 = (−3/4) · x21 , which is clearly maximized at x1 = 0. So
the unique global maximizer is (0, 0). P
Say you want to maximize f (~x) = i fi (xi ). That is, the function you’re trying to maximize
is just the sum of single-variable functions (one for each coordinate of ~x). Then we can simply
maximize each fi (·) separately, and let x∗i = arg maxxi {fi (xi )}. Observe that ~x∗ must be the
maximizer of f (~x). Most (possibly all) of the instances you will need to solve in the PSets will be
of this format.

2.4 Multi-variable, constrained optimization

Finally, say we want to find the constrained global maximum of a differentiable multi-variate func-
tion f (·, . . . , ·). Then the same rules as before apply: we must (at least indirectly) consider all
critical points and all extreme points. Multi-variable constrained optimization in general is tricky,
and would require an entire class to learn enough tricks to solve every instance. Most (possibly
all) of the instances you will need to solve in the PSet will be solvable by finding an unconstrained
maximizer of f , ~x∗ , and observing that ~x∗ satisfies the P constraints.
For example, say you want to maximize f (~x) = i xi e−xi , subject to the constraints −5 ≤
∂f
xi ≤ 5 for all i. We can find the unconstrained maximizer by observing that ∂x i
= e−xi − xi e−xi ,
which is positive when xi < 1, and negative when xi > 1. So the unique maximizer is at xi = 1.
So (1, . . . , 1) is the unique global maximizer. We observe that −5 ≤ 1 ≤ 5, so (1, . . . , 1) also
satisfies the constraints. So (1, . . . , 1) is also the constrained maximizer.
Again, recall that it is not a valid approach to first find all critical points of f (·), and then see
which critical points satisfy the constraints and only consider those (recall example at the end of
Section 2.2).

2
Sometimes you’ll need to be clever, but ideally very few (if any) proofs will require very tedious calculations.

Sample For Solution Manual A First Course in Fuzzy and Neural Control by Nguyen & Prasad
No ratings yet
Sample For Solution Manual A First Course in Fuzzy and Neural Control by Nguyen & Prasad
7 pages
Stability
No ratings yet
Stability
16 pages
HW6 Solutions PDF
No ratings yet
HW6 Solutions PDF
34 pages
WRI131 Syllabus - Princeton
No ratings yet
WRI131 Syllabus - Princeton
27 pages
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Lecture Notes of Advanced Probability
No ratings yet
Lecture Notes of Advanced Probability
101 pages
Chapter 4 Cheat Sheet
No ratings yet
Chapter 4 Cheat Sheet
4 pages
CS Cheat Sheet
No ratings yet
CS Cheat Sheet
8 pages
Handbook of Matrices
No ratings yet
Handbook of Matrices
309 pages
PDF Fission & Fusion
No ratings yet
PDF Fission & Fusion
46 pages
Aakash Tablet
100% (3)
Aakash Tablet
111 pages
Diff Eqn Book Jas
No ratings yet
Diff Eqn Book Jas
543 pages
Perceptron Linear Classifiers
No ratings yet
Perceptron Linear Classifiers
42 pages
Polynomials
No ratings yet
Polynomials
4 pages
Routh Hurwitz Criterion
No ratings yet
Routh Hurwitz Criterion
2 pages
Subject: Maths Objective Problems Class: X Topic: Polynomials
No ratings yet
Subject: Maths Objective Problems Class: X Topic: Polynomials
2 pages
The Kalman Filter: State-Space Derivation For Mass-Spring-Damper System
No ratings yet
The Kalman Filter: State-Space Derivation For Mass-Spring-Damper System
10 pages
Arbenz Peter, Numerical Methods For Computational Science and Engineering
No ratings yet
Arbenz Peter, Numerical Methods For Computational Science and Engineering
940 pages
Inequalities
No ratings yet
Inequalities
12 pages
Matlab For Microeconometrics: Numerical Optimization: Nick Kuminoff Virginia Tech: Fall 2008
No ratings yet
Matlab For Microeconometrics: Numerical Optimization: Nick Kuminoff Virginia Tech: Fall 2008
16 pages
2 Iit
No ratings yet
2 Iit
13 pages
Assignment 1 Answers
No ratings yet
Assignment 1 Answers
7 pages
Fuzzy Logic Fundamentals: 4.1 Background
No ratings yet
Fuzzy Logic Fundamentals: 4.1 Background
17 pages
Preparation Guide by AIR-1 JAM & JEST - Rishabh Kaushik
No ratings yet
Preparation Guide by AIR-1 JAM & JEST - Rishabh Kaushik
12 pages
Calculus of Variations Examples
No ratings yet
Calculus of Variations Examples
7 pages
Eigenvalues and Eigenvectors
No ratings yet
Eigenvalues and Eigenvectors
15 pages
Respect The Unstable PDF
No ratings yet
Respect The Unstable PDF
14 pages
Bessel Functions (Guide)
100% (1)
Bessel Functions (Guide)
22 pages
1C-4 Schaum's Partial Fractions
No ratings yet
1C-4 Schaum's Partial Fractions
9 pages
Optimal Control Matlab
No ratings yet
Optimal Control Matlab
25 pages
Lecture Slides
No ratings yet
Lecture Slides
21 pages
Redox Cheat Sheet
No ratings yet
Redox Cheat Sheet
1 page
Eigenvalues, Diagonalization and Special Matrices
No ratings yet
Eigenvalues, Diagonalization and Special Matrices
148 pages
Chapter 9 Quantitative Feedback Theory
No ratings yet
Chapter 9 Quantitative Feedback Theory
44 pages
Levinson and Durbin Algorithm
No ratings yet
Levinson and Durbin Algorithm
4 pages
Dynamical Systems
No ratings yet
Dynamical Systems
102 pages
Numerical Analysis - Sivarnamakrishna Das, C. Vijayakumari First, 2014 -Compressed
100% (1)
Numerical Analysis - Sivarnamakrishna Das, C. Vijayakumari First, 2014 -Compressed
385 pages
VOLTERRA INTEGRAL EQUATIONS .Ru
No ratings yet
VOLTERRA INTEGRAL EQUATIONS .Ru
15 pages
Mutual Information
No ratings yet
Mutual Information
8 pages
Capacity of A Perceptron
No ratings yet
Capacity of A Perceptron
8 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
7 pages
FT Application Solving PDE and ODE
100% (1)
FT Application Solving PDE and ODE
2 pages
Adjoint Tutorial PDF
No ratings yet
Adjoint Tutorial PDF
6 pages
MATLAB Source Codes
No ratings yet
MATLAB Source Codes
69 pages
SS Notes PDF
No ratings yet
SS Notes PDF
113 pages
Fuzzy 2
No ratings yet
Fuzzy 2
76 pages
Lecture 2 Merged
No ratings yet
Lecture 2 Merged
75 pages
Solutions Manual
No ratings yet
Solutions Manual
94 pages
Probability and Random Variables
100% (3)
Probability and Random Variables
380 pages
263 Homework
No ratings yet
263 Homework
153 pages
B39AX Topic1-P PDF
No ratings yet
B39AX Topic1-P PDF
28 pages
03 Numeric Basics
No ratings yet
03 Numeric Basics
68 pages
Partial Differentiation
No ratings yet
Partial Differentiation
11 pages
Dynamic Programming and Optimal Control: Third Edition Dimitri P. Bertsekas
0% (1)
Dynamic Programming and Optimal Control: Third Edition Dimitri P. Bertsekas
54 pages
Understanding The Fast Fourier Transform
No ratings yet
Understanding The Fast Fourier Transform
186 pages
Rosenstark
No ratings yet
Rosenstark
13 pages
Simplifying Circle Geometry
From Everand
Simplifying Circle Geometry
Jacob Ncongwane
No ratings yet
Exercises of Partial Differential Equations
From Everand
Exercises of Partial Differential Equations
Simone Malacrida
No ratings yet
1 The Mean Value Theorem
No ratings yet
1 The Mean Value Theorem
7 pages
1 Academic Integrity and Collaboration Policy For This Assignment
No ratings yet
1 Academic Integrity and Collaboration Policy For This Assignment
14 pages
STAT112 Lecture 2
No ratings yet
STAT112 Lecture 2
28 pages
Limits PDF
No ratings yet
Limits PDF
4 pages
Phil 318: M: Fall 2018 Tentative Syllabus
No ratings yet
Phil 318: M: Fall 2018 Tentative Syllabus
3 pages
The Mathematics of Poker
100% (1)
The Mathematics of Poker
400 pages
Syllable Walk-Through
No ratings yet
Syllable Walk-Through
4 pages
College Packing List
No ratings yet
College Packing List
4 pages
PHY205 Syllabus
No ratings yet
PHY205 Syllabus
2 pages
PHY205 Problem Set 0
No ratings yet
PHY205 Problem Set 0
3 pages
STC Cheat Sheet
No ratings yet
STC Cheat Sheet
2 pages
Stat110 Harvard Notes
No ratings yet
Stat110 Harvard Notes
32 pages
How To Gamble If You Must
No ratings yet
How To Gamble If You Must
23 pages
C 24 05
No ratings yet
C 24 05
58 pages
SHS Statistics and Probability Q3 Mod1 Random Variables and v4
100% (2)
SHS Statistics and Probability Q3 Mod1 Random Variables and v4
41 pages
Coin !!!
No ratings yet
Coin !!!
13 pages
Questions N Solutions
No ratings yet
Questions N Solutions
7 pages
Iii - 4 - Statistical Methods and Its Applications - II
No ratings yet
Iii - 4 - Statistical Methods and Its Applications - II
114 pages
Stochastic Processes: Stat433/833 Lecture Notes
No ratings yet
Stochastic Processes: Stat433/833 Lecture Notes
113 pages
Probability P1 3rd
100% (1)
Probability P1 3rd
27 pages
(Probability2023) Chapter4
No ratings yet
(Probability2023) Chapter4
57 pages
Probability Distributions
No ratings yet
Probability Distributions
4 pages
Lec4 PDF
No ratings yet
Lec4 PDF
13 pages
01 Dimen - Solar Radiation
No ratings yet
01 Dimen - Solar Radiation
14 pages
Tutorial 2, Design and Analysis of Algorithms, 2024
No ratings yet
Tutorial 2, Design and Analysis of Algorithms, 2024
2 pages
Entregabl E
No ratings yet
Entregabl E
9 pages
Probability and Expectation IA
No ratings yet
Probability and Expectation IA
6 pages
Probability Theory I STA 112 IPETU
No ratings yet
Probability Theory I STA 112 IPETU
31 pages
GallaFarms Fav 2G
No ratings yet
GallaFarms Fav 2G
3 pages
Mean, Variance
No ratings yet
Mean, Variance
3 pages
Central Limit Theorem - Wikipedia
No ratings yet
Central Limit Theorem - Wikipedia
19 pages
Chapter 04
No ratings yet
Chapter 04
29 pages
Math 170A Notes
No ratings yet
Math 170A Notes
55 pages
Stat 475 Life Contingencies
No ratings yet
Stat 475 Life Contingencies
42 pages
Mean, Variance, and Standard Deviation of
No ratings yet
Mean, Variance, and Standard Deviation of
15 pages
Computational Gestalts
No ratings yet
Computational Gestalts
13 pages
Business Stat CH 1
No ratings yet
Business Stat CH 1
15 pages
Multiple Choice Questions-Chapter Five Discrete Probability Distributions
No ratings yet
Multiple Choice Questions-Chapter Five Discrete Probability Distributions
13 pages
Stat
No ratings yet
Stat
9 pages
Week 4 Lecture Notes
No ratings yet
Week 4 Lecture Notes
37 pages
19 Probability Part 2 of 3
No ratings yet
19 Probability Part 2 of 3
14 pages

Math Cheat Sheet

Uploaded by

Math Cheat Sheet

Uploaded by

445 Math Cheatsheet

Let’s quickly see why the two definitions are equivalent:

1.2 Continuous random variables

• The CDF of a random variable is simpler

The expected value as computed by form 2 is:

1.4 Linearity of Expectation

2 Basic continuous optimization

2.2 Single-variable, constrained optimization

2.4 Multi-variable, constrained optimization

You might also like