0% found this document useful (0 votes)
21 views

AAI Module 3

This are the notes for module 3 of AAI summarized format.

Uploaded by

ahmed.412052.cs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
21 views

AAI Module 3

This are the notes for module 3 of AAI summarized format.

Uploaded by

ahmed.412052.cs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 11
MODULE 3 ' Variational Autoencoders CHAPTER 3 University Prescribed Syllabus ational Autoencoders (VAEs), Architecture and Training of VAEs the loss erence, Application of VAEs in image generation Introduction : Basi function, Latent space r Types of Autoencoders : U ‘oders, Sparse Autoencoders, Contractive Autoencoders, Denoising Autoencoders, Variational Qi my 3.4 INTRODUCTION > 3.1.1 Basic Components of Variational Autoencoders (VAEs) ‘tiem aco ions) hats the need ofan uoencde, (hs) iat te properties fan uoeeoer (orks) Descbe the achecre cf an atencode (4 Marks) i (eto Vaan cndc AES) we inn 2013 ty Kinga ots anl velo oa Teme ono In neural net 4 variational autoencoder consists of an 2 : i i eo] t Data: x Fi Decoder (v2) Reconstruction S.l : Encoder Decoder ts input is a daapoit x, its output is a hidden representation 2, and it has weights and biases 8, To be comerae, e's say xis a 28 by 28-pixel photo of a handwritten number. ‘The encoder ‘encodes’ the data which is 784784- Gimensional into a latent (hidden) representation Space 2, which is much less than 784784 dimensions. ‘This is typically refered to asa “holtleneck’ because the encoder mast learn en ecient compression of the daa int his lower-dimensonal space, Let's denote the encoder gy (l. ‘We ote thatthe lower-dimensional space is stochastic: the encoder outputs parameters 0 (ls), whichis & Gaussian probability densi fonction. We can sample fiom this distibuon 10 get noisy values of the representations 2 “The decoder is another neural network. input isthe representation 2, it outputs the parameters to the probabtity distation of he data, an has weights and biases. The decoder is denoted by pp. (xk. Executing the handwritten digit example, lets say the photos are black and white and represent each pixel asO¥or Il. The probability distibution of single pixel can be then represented using. a Bernoulli distbation 2. explain the architecture and training process of Vssitional_Autooncuders The Architecture of Variational Autoencoder + The encoderdecrdcr architecture fies at the heart of Varstional Autoencoders (VAEs), distinguishing them from traditional autoencoders + The encoder network Ikes raw input data and transforms it into a probability distribution withini fe Intent space. + The latent code generated hy the encoder is a probatilistic encoding allowing the ‘VANE to express not just a sing point inthe latent space but a disitbution of potenti tur, takes 9 sampled point from the Ialen! distribution and reconstruc it back into dats space. + During training, the model refines both the encoder and decoder parumcters 10 minimize the reconstruction loss ~ the disparity between the inpul data ard the decoded output. «The goal isnot just to achieve accurate reconstruction but also to regutarize the latent space, ensuring that itconforms to. specified distribution. + In VATA, the erceder all maps the inp data fo lowsr-dimensiona lent space, but instead ata single pot inthe latent space, the encoder generat a probability diseibution ‘over the Item space. + The decader thea amples from this diviution to generte a new dla point, This robsbiliie approsch to encoding the Inpuallows VAEs to leam a more structed and onlirvous Intent space representation, which it use for gonoraive rnodeling and daw sym “To go from a inditonal auocncoder na VAR, we need to make twa key modifistons. 1, Fit, we need to replace the encoder's output witha probsbilty distribution, Instead ofthe encoder upaing a pola ithe Latent space, it eutputs the paracters ef a probability distbusion, such es mean and variance. This disribution ts typically & multivariate Gaussian distribution but can be some other dstbution as well ep. Berout 2. Second, we introduce a new term in the Hots funeton called the Kulthct-LeiAler (KL) divergence, Thister measures the differeace between the learned probability elsrbation ver the latent space and a predefined ioe distibuon (usually a standard. corral distribution) — pepuanim Sen 7ANDS) _Uurorion-tnepareet aes Pa oO ya 3.1.3 Undercomplete Autoencoder oO isthe need of an indercompete avtoencoder, the @ ete autoencoder. ‘the middle compared to = -asremglte Avoncder hs fer nde (diensions tnt and Output layers which to obtain porta fare fram the dae 12 tnestps, wo tend tcl mid layer a btn + (ba the case of Undercomplete Autoencoders, we are squeezing the information into {ewer dimensions (hence the bottleneck) while trying to ensure that we can still got beck to the original valuei Therefore, we are creating a custom function, that ‘compresses the data, which is a way to reduce the dimensionality and extract ‘meaningful information. : + Aer training the Undereomplete Autoencder, we tpinly discard the Deo nly use the Encoder part. + Aw obnctive of undercompleteautoencoder ito capture thie most important features present in data, It minimizes the loss function by penalising the (fi) for being different from the input x. oder and Fig. 3.17 : Undercomplete autoencoder Architecture ‘The figure below shows the I/P and O/P of undercomplete autoencoder. Fig. 3.1.8 : Undercomplete autoencoder I/P and O/P 3.1.4 Overcomplete Autoencoder GQ. What is the need of an overcomplete autoencoder. GQ. Draw the architecture of overcomplete autoencoder. ; (2 Maris) | Overcomplete Autoencoder has more nodes (dimensions) in the middle compared ty Input and Output layers. ‘© While poor generalization can happen even in undercomplete autoencoders, it is an even more serious problem with overcomplete autoencoders. To avoid poor generalization, we need to introduce generalization, Fig. 3.19 : Overcomplete autoencoder Architecture CONOR YS 3.2.1 Denoising Autoencoder (9) + /One of the simpler variations of autoencoder is the denoising autoencoder, where th inputs are corrupted and the outputs are clean; the autoencoder basically learns s clean corrupted samples. Such denoising autoencoders can generate more robut representation which improves classification. ‘Autoencoder learns useful features by adding random noise to its inputs and making it recover the original noise-free data. This way the autoencoder can’t simply copy the input to its output with learning the features in data because the input also contains random noise. (MU-New Syllabus w.e.f Academic Year 23-24) (M7-141) [ab rech-neo Publications...A SACHIN SHAH Ventu ca poop Leeming (MU-Som 7-AIBDS) (Autoencoders: Unsupervised Learning)...Page no. (3-14) are aking itt subtract the noise and prove the underiyingmesninil date AS is called a denoising autoencoder. : . top row contains the original images. We add random Gaussian noise to them and z Peon eur a ee rae opiginal image. “he bottom row is the autoencoder output. We can do better by using more complex autooncoder architecture, such as convolutional autoencoders. ‘Ongar mages 2] /[ol4) Ei ‘Aulooneader Quiput Z] / Jo} 4 Fig. 3.21 : Original image, notsy UP and O/P + [Basic autooncoder trains to minimize the loss between x and the reconstruction 10 minimize the loss between’ x and g(f(x + w), fix)). Denoising autoencoders train t smorize the input ‘where w is random noise, Denoising autoencoders can't simply me output relationship. + Intuitively, a denoising autoencoder learns a projection from a neighborhood of our training data back onto the training data. In figure below, noise is added to original image, i is encoded into the code which is decoded to return the original input image. cee - eos code Output Original Noisy image, Input Fig, 322: Denoting Autoencoder Advantage of denoising autoencoder s(ft is simpler to implemen [it requires js ling one or two lines of code to regular autoencoder} [Phere is no need to compute ‘obian of hidden layer.) = ou ste Sb wet Academic Your 22-24) 7-141) Tel rech-teo Pubtictons.A SACHIN SHAH Venture 3.2.2 Sparse Autoencoder 7 Regularisation is also used to learn useful features apart from keeping the cody small and denoising autoencoders. « Wo can regularize the autoencoder by using a sparsity constraint such that a, fraction of the nodes would have nonzero values, called active nodes. + {Bn particular, we add a ponalty term to the loss function such that only a fr the nodes become active\(Chis forees the autoencoder to represent each input. combination of small number of nodes, and demands it to discover in structure in the data.(Dxis method works evon if the eode size is large, since ay ‘small subset of the nodes will be active at any time} ig. 3.23 : Sparse Auloencoder Architecture |Sparsity was introduced in terms of firing neurons, if the neurons are of high val war about 2), itis allowed to be fired, the rest are not, Sparse autoencoders consti 4 loss function to penalize activations ‘within a layer. They usually regularize the weights of a network and not the activations] individu! ve acisatinel model that activate are data-dependent. Different inp will 2 regions ofthe nor terent nodes through the network. They selectively activa network depending on the input data. Egs. © Li Regularization : Penali layer hor cbernat ete the absolute value ofthe véetor of activations 8 - ep Learning (MU-Sem Z-AI8DS) (Autoencoders : Unsupervised Learning)...Page no. (3-16) . jae neuron with sigmoid activation function will have values 0 and 1. We say at neuron is activated when its output is close to 1 and not activated when its output is close to 0. A sparse autoencoder tries to ensure that neuron is inactive most of the times. Sparse autoencoders have hidden nodes greater than input nodes. \ + Sparsity may be introduced by additional terms in the loss function during the training process, either by comparing the probability distribution of the hidden unit activations with some low desired value or by manually zeroing al but the strongest hidden unit activations. . lvantage of sparse autoencoder : We can achieve an information bottleneck (same information with fewer neurons) without reducing the number of neurons in the hidden layers. yy. 5.2.3 Contractive Autoencoder Gs ins eee nbect ste ies 1s it provides a robust learned representation. We can achieve this by addi jeve this by adding pnalty term or regularizer to whatever cost or objective function or loss algorithm is trying to minimize. Pea ‘he rault reduces tho Toned rpresntatins sna toward the traning ip ‘This regularizer neods to conform to the Frobenius norm of the Jacobian matrix for the encoder activation sequence concerning the input. Frobenius norm of the Jacobian matrix forthe hidden layer is ealeulated w-r. the japut and is basically the sum of square ofall eloments as in Figure below. If this we don't observe any change in the learned hidden representations as value is zero, then the learned model is wwe change the input values. But if the value is huge, unstable as the input values change. + We generally employ Contractive autoencoders as one of several other autoencader nodes It is in active mode only when other encoding schemes fail to label a data point Frobenius Norm — Vector norm, L2 norm der partial derivatives of a vector-valued Jacobian matrix — matrix of all first-o function. 2 ot) Regularizing term : || Ju) I] p =24(&e oa + Contractive autoencoders arrange for similar layer activation: ice, the derivative of the hidden TE. no pbiation-A SACHIN SHAY Verne have similar activations inputs to pe tothe input. sare small with respect 4 (Autoencoders : Unsupervised Leaming)...Page no. (a, Denoising autooncoders make the reconstruction function (encoder + decoder) ‘small perturbations of the input while Contractive autoencoders make the fa ‘extraction function (ie. encoder) resist infinitesimal perturbations of the input, Masur loss again cfg ag, Fig. 324 : Contractive Autoencoder "= Advantage of contractive autoencoder Sinco gradiert is deterministic, we can use second order optimizers e.g. conjugal? gradient, LBFGS, ete. which might be more stable than denoising autoencoder, and it us a sampled gradient, — Tain odseraons a Leamed reconsuien ‘Siar np fincton are cones near dently funtion ‘oaconsnt (Gerectreconsburton) toutwtina reighborioos, Doce on ceva Sr ‘vaiing ee ea x Fig. 325 : Contractive Autoencoder eb

You might also like