0% found this document useful (0 votes)

3 views5 pages

cheatsheet 2

The document discusses various regression techniques, including Logistic Regression, Ridge Regression, and Lasso Regression, highlighting their mathematical formulations and optimization methods. It emphasizes the importance of convexity in optimization and the bias-variance trade-off in model selection. Additionally, it touches on gradient descent methods and the implications of feature selection and regularization in regression analysis.

Uploaded by

zwu363

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views5 pages

cheatsheet 2

Uploaded by

zwu363

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

not convex

Logistic Regression A Gradient Descent

↑ (Y 1) X w) (WiX) dflw
=
x, Hexp(-WTX)
= :
WK+ WK y
=
wi
-

PI = 0 1 ..

) =
10(Wix) =

HexpIwTX) Ridge Regression

:
8
f(wt) = -

X " ly -Xw + ) + Aw +

Feature can be cont /dis. We + He I Ne +

. =

7
-

wory i
, Shrinkage Is giddient

Wo = 0 , WFI Lasso o f (We) = -

X (y -
Awt) +
xsignIwe)
·
Wo = 0 W = 0 5 +
y X" ly x
,
.

,
Wt +1 =
We -

Xwt)
-

sign(we)
-
I
I Doesn't
O
1 exp -No +
IWkXk) shrinkage with We

i · j See chastic Gradient Descent

odd
Pll(Xx exp(wTx o Ow 12t(W)
:
= We + 1
=
We -

=
We
w

ECOS2+ (W)) :
Jl(w)

If IIWK-Wolk 11 Oli (w)/k = G

Log odd WTX (Decision Rule ( The R
- max
: + Wo 1 sup
= dR
w
R y
WMLE :
argmax
+ P(yi/Xi ; w) Eld(iv) -l(we)] =
<Ty
↑

=
arg min [log (1 + expl-yix (6) J
=
3 =
64
LOW LOSS efficient (take small sample (
> :
sign lyi) Sign (NiTW) Y k) K > i more
memory
-
-
= :

find
,

eo
hard faster , works well on large data
lwll without
chage sogn(XiTw) point
>
increase
-

stopping
- >
GDI take data sit)
-

a entire
J is CONVEX easy to
optimize
,
>
-
" noise has higher impact on stability of SGD
NO CLOSED FORT SOLUTION
i more sensitive to learning rate

=> When linearly separable ,

w- <
(overfit) >
-
i Both unbiased both scale linearly
,

Prediction Pitfall with datapoint

Convexity >
- not always have one

mininum
all local min are global min
>
-

Normalize 18jK (k) more

imp X if not scaled
> (l x) xy GK >
-
-
x +
-

>
-

Eliminate correlation => F = 0

>
-

f((x)x + xy) = ((x) f(-) + xf(y)

>
-
CorrelationF Causation =>
expect to do with :
* of feature
>
-
f is convex if [(X , +) :
f(x) < +] is convex
>
Check feature with 0
-
variance
>
↓ differentiable everywhere is
-
is convex

"
if f(y) > f(x) + 7 f(x) (y x) x,
y +domIf)
Regression
-

Lasso
>
-
...
twice differentiable - - -

in ary min I . (yi xiTWK X w

=
+
-

- f(x) 20 + E dom (f)

Ridge r(w) =
Hull M and smooth
gradient efficient
= > convex
descent i
-

to
optimize
i only to smooth functions >
-
dense solution ,
L ball "smooth"

>
-

Sub-gradien 9 + :
f(y) > f(x) + gily -

x)
↓ Lasso 0 (w) =
Hwll , M >
-

convey but non-smooth

subgd for non-smooth :NON-SmooTH - small get

sparse solution Liball "pointy"
>
-

than simple reg

i slower than
gd on smooth

*
>
-
L,
encourage
>
-

M d (Exp) feasible set shronks

always GD ,
can use
coef > o
-

& guarantee only when f is convex ↓ Model Selection &

& Feature selection (non-zero)

Bias-Variance Trade-off ③ retrain with

sparse model , x= 0

y(X) likelihood
(Exy Is maximize
((Y-y(X))")
assuming Grassin
minimize

y(x)
=
Ey(x [Y(X x] = Linear Regression
Eg(x ] /X x] If (TX) "exists WMLE (XX)' XT
y
EyIx [Ep[ (Y
-
=
, :

data BLs # ] Yi
Ex1x [ (Y y(x)) /x x] + Ex [ (y(x) 8b(x))) not enough If X 120
:

= -
,
=
-

irreducible crow (label noise) error (simple model

( demean X with
training mean
learning
learning Ept) Ep (fb()) fp(x))
error =
(g(x) -
Ep (Yo(s))2 + -

] error
test error

biased varian
square
a

error
training
degree P.
Ridge Regression > never make
- coefficient zere Useful Gradient

LS If don
unique and
:
solution is not unstable
Ow(wTAx) Ax w(W A)
, = =
A
I (y XiT WK ↓ I W112
wridge any
+
-

min
: :

ATX OW(Aw) A
Jw(XTAW)
:
=

2
+
Xy + WY (XTX +
XIw)
argain
:
-
w

For Aw
E = 0 i .

Fridge = (X +
X + x2)
xTy symmetric A , Jw(WT A w) =L

Is (x +
X + x 1)"always exist ?

> XT X is
PSD/alleigenvale =0 (def PSD)
-

1) y -
Xc11 =
<y
-
Xw)" (y -

Xw)
: XTX =
HX11230 : OE zTXTXE),
yTy
+ +
= -
<Wi X y
+ w X" Xw
>
-

def eigen .

pair
(v, o) XXV = Or or UTXYXV = o nu

symmetric
"
Let V= IV ,
...
Va] Orthonormal .
St UV = UTV = I
flws :

/y - Awlk
n
diag (5, 52)
=

...,

XTX =
VXUT
Ow f(w) : <
XTXw-2XTy
Tw f(w) = 2 x+ X
XX + XI = VNUP + <2 = UNUT + XUUT =
U(N + XIJUT
2) *TX)E =
1/ Xz112 >0 =>
f(w) is convex
x 1) "Vi
"
(xTX + X 1) = V(N +

decrease
X more
regularization ,
more bias , variance

1-0 Wridge >

vs Linear Algebra
++ 0
Fridge >
- 0 16 is not
regularized ( -

A G /RM" Orthonormal :
AAT = A" A :
I

Bias-Variance 11 Axll2 Hx/k

Property
=
>
-

Assume XTX = n] Xw + 9 E -N10 02 I

BEIRM invertible and

symmetric
(B Bi =

y
: .
, ,
,

EYIX , D [ (Y -
↑ Wridge) /X =
x ] &" is also symmetric
CERO PSD
EYix [CY xTwY/X x] Ep[(X'w -x ↑
Fridge) xTCx >
-

= - = + :
symmetric s . t .
O
E

= of + (ntTwix) +
MG DX1K >
-
its
eigenvales are
non-negative
o
ridge
=
(XTX + x2)" XTXw +
(x + 12) x &
-

12/k =
dI zit 1) zlk :
ETz

<y
(
= W + n+
x XT 9 Based estimator
-

x,
y orthogonal : = 0

>
-

change in bias affect vaniance -

If columns of A are orthogonal ,

>
-

change in variance might not affect bias A TA is

diagonal
>
- Stein paradox : overall sum might be reduced -

eigenvalle x , eigenvector x

overfit Ax x 1 AGRUXN(
few data points likely x
=
>
-
: more to

Span([X1 An3) :
90 : 0= [ aixi]
of
"

LOOCU
, ,
>
-
: overestimate true error

bias with S'< bias with S A :

Ruxn
feature
-

>
- SIS' :

[VG/RM Ax]
range (A) v=
: :

nuUspace A =
[XEIR" Ax : = 0
3
& PSD
-
xiX is
always sym
V not invertible

~ null space is
non-empty
X columns are
linearly independent
Useful Gradient Chuster jag
Ow(wTAx) =
Ax w(W A) =
A K-machs clustering and non-convex :
converges " ...
-
>
-

- W(Aw) A be spherical
.

Jw (x Aw) Aix
, .

=
+ =
-
assumes clusters to
.

For A Jw(WT A w) =L Aw
symmetric ,
F(M , C) =
Ill Maj -
*
j/
Mixture of Gaussians observed data
Xc11 Xw)" (y > z :

Xw)
-

1) y - =
<y
Oilg(2d0c(yi))
-

LogDrm)
-

110 , z , 0) =
I < 1- 0i) 40 (y .
:
) +
- : unobserved data
yTy PLOi=P(y:10:=1) responsibilty to
+ +
= -
<Wi X y
+ w X" Xw choose Vi(O) E(0i 10 , E) P101
how O:
1(yi)
= =
to = =
nu 28 O known ,
symmetric
model complex cluster shapes
and density ,
accommodate elliptical
-
shapes
flws /y - Awlk Kernel Density Estimate

Bayes optimalclassifier: =argmaxPCXSYP

: >
-

- gr p(X x) =

Ow f(w) : <
XTXw-2XTy all
generative
.
model

full joint dist. P(X , y) ; enables prob infere (Bayes classificat)

Tw f(w) = 2 x+ X >
-
generative
: learn . .

m needs a lot of data (at all possible could. of 3x Y3) , .

2) *TX)E =
1/ Xz112 >0 =>
f(w) is convex discriminative :
just learn what you need to make a specific class .
of pred
-

just P(Y(D) ey
. reg .
No regard for P(X) or PCX , %)
. An easier
modeling
full rank
inverse limited about P(Y(X)
Algebra
=
Linear problem less data, but
utility is to
quaries
requires
-

Span([X1 ,
"

, An3) :
90 : 0= [ aixi]
Feature extraction
-
A :
Ruxn
uninformative sparse (LASSO)
[VG/RM Ax]
set

range (A)
:
: :
v=
dimension reduction / down-sample
superfluous/correlated :

nuUspace A =
[XEIR" Ax : = 0
3 autoencoder :
find low
a dimensional representation by prediction
& PSD set of learnable filters (kernels) slide
over
-
xiX is
always sym convolutional layer : a

determined and sizes also of Kernel

V not invertible # weight by # of filters , .

of inprot image stride , padding impact output size

resolution ,

different features ; ' robustness

~ multiple filters for single layer capture
:
null space is a
non-empty add non-linearity ↓ size of input
Non-linear poliag layer : ,

X columns are
linearly independent the input with a filter
Pooling dim red summarizes output of convolving
: ,

A G /RM" Orthonormal :
AAT = A" A :
I
Last few layers typically FC = Conolayers are feature extraction

>
-
11 Axll2 =
Hx/k preparing input for final FC layly
-

BEIRM invertible and (B Bi = Data augmentation :

mirror /Zoom i

symmetric
.

&" also
dolvehigh-dimentality
is symmetric .
Kernels
-
CERO PSD :
symmetric s t xTCx > O than features
. .

much more efficient to compute .

[-"***
-

Gaussian (MBF) exp

:
*
K(x , x')
> its
eigenvales are
non-negative Poly of degree exactly K: (x) x')
- =

to ki 21 + xix'
*
sigmoid tahn(2x"x'tr)
-

12/k =
dI zit 1) zlk :
ETz up
Wi X : " ↑ Hwll?
arguin [ (y
+
E
-

Regularized LS =
.
Kernel trick for
<y
>

orthogonal
-

x,
:
0
y
-
=
I aix
< ERR : W = .

If columns of A orthogonal arg min [1Yi -I aj Xj > + xI]didj Xisxj >

< xi <
-
are , I : ,

min [ /y [Cjk(xj xi)) x 2I d :

6jK(xi ,j)
-

diagonal
+
A TA any
= :
is
,

=
arg min Iy -Kall2 + x < TKX
.
-

eigenvalle x , eigenvector x
: .
=
(k + x2)"y .

the predictor For i.

Ax =
x x 1 AGRUXN( -
x regularizes
or bandwidth of Keond
the regularizes the predictor off :: ::
A has distinct eigenvales
ind
eigenvectors form linearly
=) .
set
Tree brast variance'
data V intuitive + interpretable
Midterm
bins 4 varianced U deal well with categorical
regularization , ,

↑
RE bias & variance +
bray to predictors
weakly correlated
variance
overfit ,
Power of
:
:

=>
1-0 Wridge >
-

vs >
-

Bagging (bootstrap aggregating)

:
ang trees on bots trapped
data that used all
a .
riv
&+ 0
Fridge >
- 0 Lunderfit)
and distinctly separable trained botstrapped data used
log weg Linear that
:
> RF :
average trees on

mad
-

correlation 4 performance
. r V

reducing
.

distinct classes .
.

is
X not inthitite

between the optional pred and the exp works well

with default parameters
.
Bias :
diff hard
model
. ↓ computation
of the best possible trained
version of your
Boosting
add input features additive models : in , P min
[lossly :, We Pelic,
..
I
Reduce biay
"weak" learners (also tree)
:

U v computationally efficient with

<Orerfit generalize
hyperparameter tuning
: ,

overshoot the min , < converge Bagging Us .

boosting
learning rate too
high :
vari

chais , out-ch= 28 K:9 stride = 1,

pad =
K bagging averages : low-bias ,
lightly dependent classifier to reduce
classifi
Convdd (in of high-boas highly dependent
, ,

(hin- k learns linear combination

+
up
-

3
--
,

boosting
:

(N 28 , 250) Nibatch size +1

reduce error
output , 180 , S to .
-

-
* L
Comparison
o b 2 on trees

-9-9 ind
f RF builted Using bagging
.
learning from prev
:
.

errors
each
sequentially
tree
built trees
Boosting
: ,
los) Value Decomposition (SUD) VTV 1
Neural Network Singular
=
-
cross entropy
UTU
SEIRUY" diagonal
=
I,
a IRMXN A = USUT is pos. .

=
X AG ,

....
·E 9 =
g(zR)) z =
pl all >
-
ATAU = (USULY (USUP) 0 T
:
=
US SUTVi : US e; = ViSic
AA" with
glds an al
+
1 U the first reigenvalues of
g(z) 1) z(l 1) oll) all Sit
+ are
alls
+ U: =
...

= A AT Ui =

eigenvales diaglSt)
:
PCA .

g(z(b
+

a + K K( US UTSVD on de-mean
data matrix I running
y
=
=
[(Xi *
)T) x=
-

>
-
x =

VSTUT
log(1 y
#XTX
5) y (og(a) * ACT = =
(1 y) I I(Xi -5)(Xi
-

+
L(y
-
-
:
=
,
discard low-rank components A : USUT
SVD :

g(z) Hez =

- S has values its diagnol only

2L(yy
>
-
on .

Ple ol-y2 of A ordered from

(

Train by Stochastistic g d .: the singular

diagonal
values
0 (b)
E contains
>
-

the top left

j
:

to the smallest starting at .

64(y , ) dziletk the largest

2((y 5) Ajll k
all
, +
=

Pigles
=
Ce+ I
-
pi 26 i <

PCA
84B all I wil , diht' o
>
.
-

I PCA
gichts gcd
=
=
E . .. En EIR8
compressed expression
. .
:

automatic differentration
Python package
, error
VgUq
:
I min
Vgzi mat
=
> X +
.

> Xi
-proj
-

=
- .
.

, GPU support
>

convenient library +

x-UgZilk x) -8
quq (xi
* )/k
.

[11 min I 1) (xi

min Xi -
= -

gradient blow up/

-
new convex
training issue :
> ,
dir
projuar along
.
U
-

When max
vizo
.

Vi
9
=
:
1 max
zero (Res net) , stepsize batch size , , momentum hall
NUTV X eigenvale is the var of data proj
large impact optimizing training
VTIV
=
on error
. = =

errord also val data

dimgoCUdwork
on
all pred> Choose (19 , ,

sigmoid will
>
cause
-

ReLl then
using choose explain 95 % of variance
-

Ad-hos approach : to

represent Prob.
·
to
normalize the output IPL
-softmax :

Up : the first q eigenractors of

I = [ (xi -
*) (X : -
*T
include loss calculation ,
-
forwardnight 1) updates weight ↓ -
IxT = USUT = I Vi Vit Sii
backward updates gradients Step
.

projected data (x-2*T)Uc :

=
UnSz G /Rux2
4 bias
weight
+

input
:s 3x 4 : 12
Kernel PCA
bits
weight
+ d

[EnewSj IE Vij FLXis

bin 492 8
hidden
=
I
: 4 = Xnew
,
↓x + 1 = 26
id +bras
output
Find eigen ectors of JKJT initialized
0(z)) z( W> all min I (EUSTij -xij)
Less =
Ely -

55 j = =

=> How to solve Fix U , mini over U

is U
al = 0(z) z = w X/
· gradient descentB alternating Fix U, min over .

22( -al 2z"

-Loss
↓ Loss zy wl
=
29(k It'l a
GW' Cy 2z(

mis O'CE)
KNN
=
(y
-

j)7)5((j) ,
X, -
linear decision boundary
.
-
kt boas & variance ↑
Convex loss Logistics hinge
-

:
MSE , .
~ Ls norm
L norm

logistic loss differentiable every where

:
. mahalanobis L
infinity
sigmoid/logistic Herz Rell maxco z)
: : ,
activation
computation#
:

but
funding NN
large data > =

training
Uno X in

(l x)
-
x +
xy GK and local linear can 1
smoothing reg
.

f(( x)x ((x) f(-) + xf(y) lot of data , neighbors aren't local
xy) x without
= a ,
-
+

curse of dimensionality
.
(X +
X + x 2J
xTy WMLE (X
+
X) XT
y
Wridge contrast between the nearest
= :

=> num of dimt ,

y) BLs :
] Yi and furthest point from a
given reference point
minimize (Exy ((Y-y(X))")
tends to decease
.
g(x)
=
Ey(x [Y(X x] =

EyIx [Ep[ (Y
-

Eg(x ] /X =
x]
=
Ex1x [ (Y -

y(x)) /x x] =
+
Ex[(y(x) -

8b(x))) enough
not
data
Bootstrap sid
F
(simple model) (D
= Ez.s zn3-Fz =
ECD
noise)
.. -

irreducible (label
crow
learning error
Jid Ez bootstrapped
learning fp()) *** = 9216, E bth data
(Yo(s))2 Ept) Ep (fb()) ]
...

error =
(g(x) -
Ep + -
.

I
drawing a samples with replacement from D
based varian
square
a

Exb = t(D
*
b)
>
-
estimate parameters that escape simple analysis
(variance /medians
>
-
CI
>
-
estimate error for a
.
particular example
with sample sample
~ general , simple , meaningful
~ asymptotic .

sample gurantee
X few meaningful finite
intensive
X computation
E F (unknown
of convergence of
to
on test stat rate
x rely ,

extreme (max)
x poor performance on
S
4) + )
-
=
+

-214 : 20

=
34 2

Solution Manual for Control Systems Engineering 7th ed – Norman Nise (4)
No ratings yet
Solution Manual for Control Systems Engineering 7th ed – Norman Nise (4)
15 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Chapter 3. Linear Regression
No ratings yet
Chapter 3. Linear Regression
41 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
ML 3 (1)
No ratings yet
ML 3 (1)
50 pages
Prob1 Slides Lec Balazs 16f PDF
No ratings yet
Prob1 Slides Lec Balazs 16f PDF
438 pages
Mauryan Empire
No ratings yet
Mauryan Empire
11 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
Lecture 3
No ratings yet
Lecture 3
61 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
Lecture04. Training Models (Regression in Chapter 4)
No ratings yet
Lecture04. Training Models (Regression in Chapter 4)
44 pages
Bias
No ratings yet
Bias
62 pages
IML-Summary
No ratings yet
IML-Summary
12 pages
Math 100
0% (1)
Math 100
56 pages
EE353 - 769 08 Linear Classification
No ratings yet
EE353 - 769 08 Linear Classification
22 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
10 Linear Regression
No ratings yet
10 Linear Regression
61 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
13 pages
Cours ML
No ratings yet
Cours ML
14 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
Ch-9 RD Maths
No ratings yet
Ch-9 RD Maths
55 pages
Lect4 Log Reg
No ratings yet
Lect4 Log Reg
20 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
ML-1
No ratings yet
ML-1
24 pages
lecture3_supervised_learning_I
No ratings yet
lecture3_supervised_learning_I
84 pages
Chapter 1
No ratings yet
Chapter 1
45 pages
Cheat Sheet For Exam
No ratings yet
Cheat Sheet For Exam
2 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
Supervised Regression Notes
No ratings yet
Supervised Regression Notes
11 pages
Midterm F02soln
No ratings yet
Midterm F02soln
14 pages
189 Cheat Sheet Nominicards PDF
No ratings yet
189 Cheat Sheet Nominicards PDF
2 pages
Machine learning
No ratings yet
Machine learning
19 pages
murphysolns
No ratings yet
murphysolns
45 pages
9_Linear Regression-Problems and Solutions
No ratings yet
9_Linear Regression-Problems and Solutions
23 pages
Practice 1130
No ratings yet
Practice 1130
20 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
Revisiting Revisiting Logistic Regression & Naïve Logistic Regression & Naïve Bayes Bayes
No ratings yet
Revisiting Revisiting Logistic Regression & Naïve Logistic Regression & Naïve Bayes Bayes
46 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
Chapter 4: Using Matlab For Performance Analysis and Simulation
No ratings yet
Chapter 4: Using Matlab For Performance Analysis and Simulation
26 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
CH 1
No ratings yet
CH 1
24 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
Bernoulli PDF
No ratings yet
Bernoulli PDF
19 pages
A Comparative Analysis of Assignment Problem: Shweta Singh, G.C. Dubey, Rajesh Shrivastava
No ratings yet
A Comparative Analysis of Assignment Problem: Shweta Singh, G.C. Dubey, Rajesh Shrivastava
15 pages
Week11_regularization and optimization
No ratings yet
Week11_regularization and optimization
75 pages
Srikanth External Lab Answers
No ratings yet
Srikanth External Lab Answers
19 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Class 12 Maths Project Helpful
No ratings yet
Class 12 Maths Project Helpful
23 pages
STAT 513 Solutions
No ratings yet
STAT 513 Solutions
16 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Lecture3_upload
No ratings yet
Lecture3_upload
28 pages
Lab Manual 05
No ratings yet
Lab Manual 05
13 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
Gauss Seidel Iterative Method
No ratings yet
Gauss Seidel Iterative Method
8 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
MAE101_SU2018 (có đáp án)
No ratings yet
MAE101_SU2018 (có đáp án)
7 pages
ECE_449_Notes
No ratings yet
ECE_449_Notes
5 pages
PI and PID Controller Tuning Rules For Time Delay Processes: A Summary. Part 2: PID Controller Tuning Rules
No ratings yet
PI and PID Controller Tuning Rules For Time Delay Processes: A Summary. Part 2: PID Controller Tuning Rules
10 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Bayesian Estimation of Concordance among Gene Trees
No ratings yet
Bayesian Estimation of Concordance among Gene Trees
15 pages
Root Locus Notes - Matlab
No ratings yet
Root Locus Notes - Matlab
4 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
Math MA1101 1st Sem Midsem
No ratings yet
Math MA1101 1st Sem Midsem
2 pages
5.13 Rational Chebyshev Approximation: Evaluation of Functions
No ratings yet
5.13 Rational Chebyshev Approximation: Evaluation of Functions
5 pages
Project 3 Fuzzy Control Logic
No ratings yet
Project 3 Fuzzy Control Logic
9 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Homework 3
No ratings yet
Homework 3
10 pages
LFT FS: Lfs FT
No ratings yet
LFT FS: Lfs FT
8 pages
A Spring in Imaginary Time: Math 241 Homework John Baez
No ratings yet
A Spring in Imaginary Time: Math 241 Homework John Baez
2 pages
Chapter Test-12 (Matrices and Determinants) (1)
No ratings yet
Chapter Test-12 (Matrices and Determinants) (1)
2 pages
10 - Matrix Representations of Groups - Chemistry LibreTexts
No ratings yet
10 - Matrix Representations of Groups - Chemistry LibreTexts
2 pages
Newton-Raphson method suspension bridges
No ratings yet
Newton-Raphson method suspension bridges
5 pages
Unit 2: Analysis of Continuous Time Signals
No ratings yet
Unit 2: Analysis of Continuous Time Signals
10 pages
2016 Sydney Boys Trial
No ratings yet
2016 Sydney Boys Trial
48 pages
ASIE Model - MAS KHAIRULIZA BINTI ABDUL HALIR-4STEM1-Fizik (DLP) - Minggu 42 PDF
No ratings yet
ASIE Model - MAS KHAIRULIZA BINTI ABDUL HALIR-4STEM1-Fizik (DLP) - Minggu 42 PDF
1 page
3_Curve Fitting
No ratings yet
3_Curve Fitting
4 pages
Wk05 machine learning
No ratings yet
Wk05 machine learning
6 pages
189 Cheat Sheet Minicards
No ratings yet
189 Cheat Sheet Minicards
2 pages
Inequalities Final
No ratings yet
Inequalities Final
52 pages
Examination Syllabus For TGT (Math) : Real Numbers: Representation of Natural Numbers, Integers, Rational
No ratings yet
Examination Syllabus For TGT (Math) : Real Numbers: Representation of Natural Numbers, Integers, Rational
3 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
appc_2.8_ca1
No ratings yet
appc_2.8_ca1
2 pages

cheatsheet 2

Uploaded by

cheatsheet 2

Uploaded by

not convex

Logistic Regression A Gradient Descent

HexpIwTX) Ridge Regression

Feature can be cont /dis. We + He I Ne +

Wo = 0 , WFI Lasso o f (We) = -

i · j See chastic Gradient Descent

If IIWK-Wolk 11 Oli (w)/k = G

=> When linearly separable ,

Prediction Pitfall with datapoint

Normalize 18jK (k) more

Eliminate correlation => F = 0

f((x)x + xy) = ((x) f(-) + xf(y)

in ary min I . (yi xiTWK X w

- f(x) 20 + E dom (f)

convey but non-smooth

subgd for non-smooth :NON-SmooTH - small get

than simple reg

M d (Exp) feasible set shronks

& guarantee only when f is convex ↓ Model Selection &

& Feature selection (non-zero)

Bias-Variance Trade-off ③ retrain with

irreducible crow (label noise) error (simple model

1-0 Wridge >

Bias-Variance 11 Axll2 Hx/k

Assume XTX = n] Xw + 9 E -N10 02 I

BEIRM invertible and

change in bias affect vaniance -

If columns of A are orthogonal ,

change in variance might not affect bias A TA is

bias with S'< bias with S A :

Bayes optimalclassifier: =argmaxPCXSYP

full joint dist. P(X , y) ; enables prob infere (Bayes classificat)

m needs a lot of data (at all possible could. of 3x Y3) , .

determined and sizes also of Kernel

of inprot image stride , padding impact output size

different features ; ' robustness

BEIRM invertible and (B Bi = Data augmentation :

much more efficient to compute .

Gaussian (MBF) exp

If columns of A orthogonal arg min [1Yi -I aj Xj > + xI]didj Xisxj >

min [ /y [Cjk(xj xi)) x 2I d :

the predictor For i.

Bagging (bootstrap aggregating)

between the optional pred and the exp works well

U v computationally efficient with

overshoot the min , < converge Bagging Us .

chais , out-ch= 28 K:9 stride = 1,

(hin- k learns linear combination

(N 28 , 250) Nibatch size +1

- S has values its diagnol only

Ple ol-y2 of A ordered from

Train by Stochastistic g d .: the singular

the top left

to the smallest starting at .

64(y , ) dziletk the largest

[11 min I 1) (xi

gradient blow up/

errord also val data

Up : the first q eigenractors of

projected data (x-2*T)Uc :

[EnewSj IE Vij FLXis

=> How to solve Fix U , mini over U

22( -al 2z"

logistic loss differentiable every where

=> num of dimt ,

You might also like