0% found this document useful (0 votes)
34 views2 pages

Machine Learning Notation: 1 Numbers & Arrays 4 Functions

This document provides notation for machine learning concepts including: 1) Numbers, arrays, sets, graphs, and indexing are defined. 2) Calculus, linear algebra, and probability/information theory notation are introduced. 3) Machine learning notation is presented including training data, hypothesis spaces, cost functions, and predictions.

Uploaded by

Nguyen Kim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views2 pages

Machine Learning Notation: 1 Numbers & Arrays 4 Functions

This document provides notation for machine learning concepts including: 1) Numbers, arrays, sets, graphs, and indexing are defined. 2) Calculus, linear algebra, and probability/information theory notation are introduced. 3) Machine learning notation is presented including training data, hypothesis spaces, cost functions, and predictions.

Uploaded by

Nguyen Kim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Machine Learning Notation

Shan-Hung Wu

1 Numbers & Arrays 4 Functions


a A scalar (integer or real)
A A scalar constant f :A!B A function f with domain A and range B
a A vector f g Composition of functions f and g
A A matrix f (x; ✓) A function of x parametrized by ✓ (with ✓
A A tensor omitted sometimes)
In The n ⇥ n identity matrix ln x Natural logarithm of x
D A diagonal matrix (x) Logistic sigmoid, i..e, (1 + exp( x)) 1
diag(a) A square, diagonal matrix with diagonal ⇣(x) Softplus, ln(1 + exp(x))
entries given by a kxkp Lp norm of x
a A scalar random variable kxk L2 norm of x
a A vector-valued random variable x+ Positive part of x, i.e., max(0, x)
A A matrix-valued random variable 1(x; cond) The indicator function of x: 1 if the
condition is true, 0 otherwise
g[f ; x] A functional that maps f to f (x)

2 Sets & Graphs


A A set
R The set of real numbers Sometimes we use a function f whose argument is a scalar, but
{0, 1} The set containing 0 and 1 apply it to a vector, matrix, or tensor: f (x), f (X), or f (X).
{0, 1, · · · , n} The set of all integers between 0 and n This means to apply f to the array element-wise. For example,
[a, b] The real interval including a and b if C = (X), then Ci,j,k = (Xi,j,k ) for all i, j and k.
(a, b] The real interval excluding a but including
b
A\B Set subtraction, i.e., the set containing the
elements of A that are not in B
G A graph whose each vertex x(i) denotes a
random variable and edge denotes
conditional dependency (directed) or 5 Calculus
correlation (undirected)
Pa(x(i) ) The parents of a vertex x(i) in G df
f 0 (a) or dx (a) Derivative of f : R ! R at input
point a
@f
@xi (a) Partial derivative of f : Rn ! R with
respect to xi at input a
3 Indexing rf (a) 2 Rn Gradient of f : Rn ! R at input a
rf (A) 2 Rm⇥n Matrix derivatives of f : Rm⇥n ! R
ai Element i of vector a, with indexing at input A
starting at 1 rf (A) Tensor derivatives of f at input A
a i All elements of vector a except for element J (f )(a) 2 Rm⇥n The Jacobian matrix of f : Rn ! Rm
i at input a
Ai,j Element (i, j) of matrix A r2 f (a) or The Hessian matrix of f : Rn ! R at
Ai,: Row i of matrix A H(f )(a)´ 2 Rn⇥n input point a
A:,i Column i of matrix A f (x)dx Definite integral over the entire
Ai,j,k Element (i, j, k) of a 3-D tensor A domain of x
2-D slice of a 3-D tensor Definite integral with respect to x
´
A:,:,i S
f (x)dx
ai Element i of the random vector a over the set S

1
6 Linear Algebra 9 Typesetting
Section* Section that can be skipped for the first
A> Transpose of matrix A
time reading
A† Moore-Penrose pseudo-inverse of A
Section** Section for reference only (will not be
A B Element-wise (Hadamard) product of A
taught)
and B
[Proof] Prove it yourself
det(A) Determinant of A
[Homework] You have homework
tr(A) Trace of A
e(i) The i-th standard basis vector (a one-hot
vector)

7 Probability & Info. Theory


a?b Random variables a and b are independent
a?b | c They are conditionally independent given c
Pr(a | b) orShorthand for the probability
Pr(a | b) Pr(a = a | b = b)
Pa (a) A probability mass function of the discrete
random variable a
pa (a) A probability density function of the
continuous random variable a
P(a = a) Either Pa (a) or pa (a)
P(✓) A probability distribution parametrized by

N (µ, ⌃) The Gaussian distribution with mean µ
and covariance matrix ⌃
x ⇠ P(✓) Random variable x has distribution P
Ex⇠P [f(x)] Expectation of f(x) with respect to P
Var[f(x)] Variance of f(x)
Cov[f(x), g(x)] Covariance of f(x) and g(x)
H(x) Shannon entropy of the random variable x
DKL (PkQ) Kullback-Leibler (KL) divergence from
distribution Q to P

8 Machine Learning
X The set of training examples
N Size of X
(x(i) , y (i) ) The i-th example pair in X (supervised
learning)
x(i) The i-th example in X (unsupervised
learning)
D Dimension of a data point x(i)
K Dimension of a label y (i)
X 2 RN ⇥D Design matrix, where X i,: denotes x(i)
P (x, y) A data generating distribution
F Hypothesis space of functions to be learnt,
i.e., a model
C[f ] A cost functional of f 2 F
C(✓) A cost function of ✓ parametrizing f 2 F
(x0 , y 0 ) A testing pair
ŷ Label predicted by a function f , i.e.,
ŷ = f (x0 ) (supervised learning)

You might also like