CMPT 413/713: Natural Language Processing: Nat Langlab
CMPT 413/713: Natural Language Processing: Nat Langlab
Fall 2021
2021-09-28
1
Neural networks for NLP
Feed-forward NNs Recurrent NNs Recursive NNs
3
Neural networks
• The neurons are connected to each other, forming a network
• The output of a neuron may feed into the inputs of other neurons
(
0 if z 0
g(z) =
1 if z > 0
h1 = g(x1 + x2)
h2 = g(x1 + x2 − 1)
5
Why nonlinearities
Learn to classify whether points should belong to the blue curve or red curve
6 https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
Expressiveness of neural networks
• Multilayer feed-forward neural nets with nonlinear activation
functions are universal approximators
• True for both shallow networks (infinitely wide) and (infinitely) deep
networks.
• Consider a network with just 1 hidden layer (with hard threshold
D
activation functions) and a linear output. By having 2 hidden units,
each of which responds to just one input configuration, can model
any boolean function with D inputs.
(
0 0 1 z>0
f (z) = f (z) ⇥ (1 f (z)) f (z) = 1 0
f (z) 2 f (z) =
0 z<0
<latexit sha1_base64="3GcbAayqBhW++9Kwcpn77Vsrhoc=">AAACB3icbVDLSsNAFJ3UV62vqEtBBovYLiyJCLoRim5cVrAPaEOZTCft4OTBzI1QQ3du/BU3LhRx6y+482+cpFlo64FhDufcy733uJHgCizr2ygsLC4trxRXS2vrG5tb5vZOS4WxpKxJQxHKjksUEzxgTeAgWCeSjPiuYG337ir12/dMKh4GtzCOmOOTYcA9TgloqW/ue0eVhyq+wF769YD7TOGKjY8zodo3y1bNyoDniZ2TMsrR6JtfvUFIY58FQAVRqmtbETgJkcCpYJNSL1YsIvSODFlX04DoeU6S3THBh1oZYC+U+gWAM/V3R0J8pca+qyt9AiM166Xif143Bu/cSXgQxcACOh3kxQJDiNNQ8IBLRkGMNSFUcr0rpiMiCQUdXUmHYM+ePE9aJzXbqtk3p+X6ZR5HEe2hA1RBNjpDdXSNGqiJKHpEz+gVvRlPxovxbnxMSwtG3rOL/sD4/AFGEZW7</latexit>
<latexit sha1_base64="MqVGA4i/sJsTCssjlSr5qxWmCNA=">AAAB+3icbZDLSsNAFIZP6q3WW6xLN4NFrAtLUgTdCEU3LivYC7SxTKaTduhkEmYmYg19FTcuFHHri7jzbZy2WWj1h4GP/5zDOfP7MWdKO86XlVtaXlldy68XNja3tnfs3WJTRYkktEEiHsm2jxXlTNCGZprTdiwpDn1OW/7oalpv3VOpWCRu9TimXogHggWMYG2snl0MjsqPx+gCuegEBQbvqj275FScmdBfcDMoQaZ6z/7s9iOShFRowrFSHdeJtZdiqRnhdFLoJorGmIzwgHYMChxS5aWz2yfo0Dh9FETSPKHRzP05keJQqXHom84Q66FarE3N/2qdRAfnXspEnGgqyHxRkHCkIzQNAvWZpETzsQFMJDO3IjLEEhNt4iqYENzFL/+FZrXiOhX35rRUu8ziyMM+HEAZXDiDGlxDHRpA4AGe4AVerYn1bL1Z7/PWnJXN7MEvWR/fnxSRkw==</latexit>
<latexit sha1_base64="9/X76zUp6jklbnfvYpAmK6D/r4Q=">AAACH3icbZDLSgMxFIYzXmu9VV26CRa1bsqMiLpQKbpxWcFeoFNKJj3ThmYyQ5IR2qFv4sZXceNCEXHXtzFtZ6GtBwIf/39OkvN7EWdK2/bIWlhcWl5Zzaxl1zc2t7ZzO7tVFcaSQoWGPJR1jyjgTEBFM82hHkkggceh5vXuxn7tCaRioXjU/QiaAekI5jNKtJFauXP/uDA4wdfY9aDDRELNXWqIHXyEB/gG266L7QlfGQbRThtaubxdtCeF58FJIY/SKrdy3247pHEAQlNOlGo4dqSbCZGaUQ7DrBsriAjtkQ40DAoSgGomk/2G+NAobeyH0hyh8UT9PZGQQKl+4JnOgOiumvXG4n9eI9b+ZTNhIoo1CDp9yI851iEeh4XbTALVvG+AUMnMXzHtEkmoNpFmTQjO7MrzUD0tOnbReTjLl27TODJoHx2gAnLQBSqhe1RGFUTRM3pF7+jDerHerE/ra9q6YKUze+hPWaMfrzyfEQ==</latexit>
8
Activation functions
Problems of ReLU? “dead neurons”
Leaky ReLU (
z z 0
f (z) =
0.01z z<0
<latexit sha1_base64="hBHq/48/smLgMJ23MnABJb6l66s=">AAACJnicbVDLSgMxFM3UV62vqks3waLUTZkRQRcWim5cVrAP6JSSSe+0oZnMmGSEdujXuPFX3LioiLjzU0zbWWjrgQuHc+69yT1exJnStv1lZVZW19Y3spu5re2d3b38/kFdhbGkUKMhD2XTIwo4E1DTTHNoRhJI4HFoeIPbqd94AqlYKB70MIJ2QHqC+YwSbaROvuwXR2e4jF0Pekwk1KxSYzzCp6bcHjxiG7sutku2M9euse2C6KaNnXzBWDPgZeKkpIBSVDv5idsNaRyA0JQTpVqOHel2QqRmlMM458YKIkIHpActQwUJQLWT2ZljfGKULvZDaUpoPFN/TyQkUGoYeKYzILqvFr2p+J/XirV/1U6YiGINgs4f8mOOdYinmeEuk0A1HxpCqGTmr5j2iSRUm2RzJgRn8eRlUj8vOXbJub8oVG7SOLLoCB2jInLQJaqgO1RFNUTRM3pFE/RuvVhv1of1OW/NWOnMIfoD6/sHSVWh/Q==</latexit>
9
What is the best activation function?
• Depends on the problem!
• ReLU/Leaky ReLU is often a good choice
• Research into families of activation functions
1 x
<latexit sha1_base64="GUh8uELOlEGy5IkjpK1QR7b6T60=">AAACHXicbVDLSgMxFM34rPVVdekmWIQWocyUoi6LblwqWBU6pWTSO21o5mFyR1qG+RE3/oobF4q4cCP+jeljoa0HEg7nnEtyjxdLodG2v62FxaXlldXcWn59Y3Nru7Cze6OjRHFo8EhG6s5jGqQIoYECJdzFCljgSbj1+ucj//YBlBZReI3DGFoB64bCF5yhkdqFmusrxlMnS6sZHdAjWnLM5SIMMAXlZ6WJP8hSV98rNKmsXG4XinbFHoPOE2dKimSKy3bh0+1EPAkgRC6Z1k3HjrGVMoWCS8jybqIhZrzPutA0NGQB6FY63i6jh0bpUD9S5oRIx+rviZQFWg8DzyQDhj09643E/7xmgv5pKxVhnCCEfPKQn0iKER1VRTtCAUc5NIRxJcxfKe8xUweaQvOmBGd25XlyU604xxX7qlasn03ryJF9ckBKxCEnpE4uyCVpEE4eyTN5JW/Wk/VivVsfk+iCNZ3ZI39gff0AACahSg==</latexit>
(
↵x for x < 0 x
<latexit sha1_base64="i4r5X/BkfRXUB24kf/nXej0Rk9M=">AAACT3icbVHBTttAFFwHSsFtIYVjL09EVEFqIxu1KgeKkHrpESQCSHEarTfPySrrtbX7XDmy8oe9wK2/0QsHqqqb4AMQ5jSaeft2djbOlbQUBL+9xsrqi7WX6xv+q9dvNreab7cvbFYYgV2Rqcxcxdyikhq7JEnhVW6Qp7HCy3jybe5f/kRjZabPaZpjP+UjLRMpODlp0EySdsRVPuYfoNyHr+BHMY6kroTbaWf+vdfGH+XHcB/eR4QlVZBkBmZQQqQQAogiv1yyjiHwI9TDetGg2Qo6wQKwTMKatFiN00HzJhpmokhRk1Dc2l4Y5NSvuCEpFLpghcWciwkfYc9RzVO0/WrRxwz2nDJcREkyTbBQH56oeGrtNI3dZMppbJ96c/E5r1dQctivpM4LQi3uL0oKBZTBvFwYSoOC1NQRLox0WUGMueGC3Bf4roTw6ZOXycVBJ/zcCc4+tU6O6jrW2Tu2y9osZF/YCfvOTlmXCfaL/WF37K937d16/xr1aMOryQ57hMbGf3J+sA4=</latexit>
<latexit sha1_base64="gDD8xibaXQwZssQxCrAV2z+4Q7c=">AAACHnicbVDLSgNBEJyNrxhfUY9eBoMQEcOuGPQiBL14jGAekI1hdtKbDJl9MDMrCct+iRd/xYsHRQRP+jdONnvQaEFDUdVNd5cTciaVaX4ZuYXFpeWV/GphbX1jc6u4vdOUQSQoNGjAA9F2iATOfGgopji0QwHEczi0nNHV1G/dg5As8G/VJISuRwY+cxklSku9YtUtjw/xBR5jW7KBR8q2A4rgVLNdQWg8TmILH2G4i48zL0l6xZJZMVPgv8TKSAllqPeKH3Y/oJEHvqKcSNmxzFB1YyIUoxySgh1JCAkdkQF0NPWJB7Ibp+8l+EArfewGQpevcKr+nIiJJ+XEc3SnR9RQzntT8T+vEyn3vBszP4wU+HS2yI04VgGeZoX7TABVfKIJoYLpWzEdEh2K0okWdAjW/Mt/SfOkYlUr5s1pqXaZxZFHe2gflZGFzlANXaM6aiCKHtATekGvxqPxbLwZ77PWnJHN7KJfMD6/AXW6oEQ=</latexit>
↵(ex 1) for x 0
f (↵, x) = f (↵, x) = f (x) = x ( x) = x
x for x > 0 x for x 0 1+e
β = 0 → f(x) = x/2
α, β can either be constant or trainable parameter β = ∞ → f(x) = ReLU
10
Matrix Notation
• Learn parameters ✓ = {W
<latexit sha1_base64="80hiwkos5ckF2sLXO4G6OjDxjMw=">AAACV3icbZFdS8MwFIbTbuqcX1MvvQkOQUFGK4LeCENvvJzgPmCtI81SF0zTkpwqo/RPijf7K95ouk3Q1QOBl+e8h5y8CRLBNTjOzLIr1bX1jdpmfWt7Z3evsX/Q03GqKOvSWMRqEBDNBJesCxwEGySKkSgQrB+83BX9/itTmsfyEaYJ8yPyLHnIKQGDRg3pwYQBwTfYy7yIwCQIs37+lJ26Z/k5/iFBicw9FyXPH/JWkLggwUJ4+ajRdFrOvHBZuEvRRMvqjBrv3jimacQkUEG0HrpOAn5GFHAqWF73Us0SQl/IMxsaKUnEtJ/Nc8nxiSFjHMbKHAl4Tn9PZCTSehoFxlmsrFd7BfyvN0whvPYzLpMUmKSLi8JUYIhxETIec8UoiKkRhCpudsV0QhShYL6ibkJwV59cFr2Lluu03IfLZvt2GUcNHaFjdIpcdIXa6B51UBdR9IE+rYpVtWbWl71u1xZW21rOHKI/Ze9/AyNuss4=</latexit>
(1)
,b (1)
,W (2)
,b (2)
,w (o) (o)
,b }
Output
Input
<latexit sha1_base64="To0vQVkT+0dQdve6SBYL2lCXlls=">AAACKXicbZDLSsNAFIYnXmu9RV3qYrAILUJJRNSFQsGNywr2Ak0tk+mkHTqZhJmJUkKeoW/hxrdw7UZBUbc+g3unTQVtPTDw8f/nMOf8bsioVJb1bszMzs0vLGaWsssrq2vr5sZmVQaRwKSCAxaIuoskYZSTiqKKkXooCPJdRmpu73zo126IkDTgV6ofkqaPOpx6FCOlpZZZ6sMz6Eja8VHe8ZHqul58m1zH+aAAHcoVERix5Mfppk4C96GbUqFl5qyiNSo4DfYYcqWdgZP/ehiUW+az0w5w5BOuMENSNmwrVM0YCUUxI0nWiSQJEe6hDmlo5MgnshmPLk3gnlba0AuEflzBkfp7Ika+lH3f1Z3DleWkNxT/8xqR8k6aMeVhpAjH6UdexKAK4DA22KaCYMX6GhAWVO8KcRcJhHVCMqtDsCdPnobqQdE+Kh5e6jROQVoZsA12QR7Y4BiUwAUogwrA4A48ghfwatwbT8ab8ZG2zhjjmS3wp4zPb4R1qds=</latexit>
<latexit sha1_base64="Pxmha19eaTXPM6LP1MufJIpVKHg=">AAACDnicbZC7SgNBFIZn4y3GW9TSZjAEYhN2g6iFl4CNZRRzgWQNs5PZZMjs7DIzq4Rln8DGyvewsVBELARrO5/DF3CSTaGJPwz8fOcc5pzfCRiVyjS/jNTM7Nz8Qnoxs7S8srqWXd+oST8UmFSxz3zRcJAkjHJSVVQx0ggEQZ7DSN3pnw7r9WsiJPX5pRoExPZQl1OXYqQ0amfzLQ+pnuNGN/FVVPB3YtiiHCbQiS407LRLcTubM4vmSHDaWGOTO/l+U8f3R++Vdvaz1fFx6BGuMENSNi0zUHaEhKKYkTjTCiUJEO6jLmlqy5FHpB2NzolhXpMOdH2hH1dwRH9PRMiTcuA5unO4p5ysDeF/tWao3AM7ojwIFeE4+cgNGVQ+HGYDO1QQrNhAG4QF1btC3EMCYaUTzOgQrMmTp02tVLT2irvnZq58CBKlwRbYBgVggX1QBmegAqoAg1vwAJ7As3FnPBovxmvSmjLGM5vgj4yPHwYVoH4=</latexit>
(o) d2
w 2R <latexit sha1_base64="e+dSCgIdYZ0r/SoSH+vcacyeN2o=">AAAB/3icbVC7TsMwFHXKq5RXAImFxaJCKkuVIAQMPCqxMBZEH1ITKsd1W6uOE9kOUhUy8BN8AAsDCLEyMvADbHwHP4DTdoCWI1k6Oude3ePjhYxKZVlfRmZqemZ2LjufW1hcWl4xV9eqMogEJhUcsEDUPSQJo5xUFFWM1ENBkO8xUvN6Z6lfuyFC0oBfqX5IXB91OG1TjJSWmuaGdx0Xgp0EOpRDx0eq63nxZdI081bRGgBOEntE8qff7+rk/vij3DQ/nVaAI59whRmSsmFboXJjJBTFjCQ5J5IkRLiHOqShKUc+kW48yJ/Aba20YDsQ+nEFB+rvjRj5UvZ9T0+mCeW4l4r/eY1ItQ/dmPIwUoTj4aF2xKAKYFoGbFFBsGJ9TRAWVGeFuIsEwkpXltMl2ONfniTV3aK9X9y7sPKlIzBEFmyCLVAANjgAJXAOyqACMLgFD+AJPBt3xqPxYrwORzPGaGcd/IHx9gPVIpnC</latexit>
(o)
b 2R
<latexit sha1_base64="irtFVMvTfGN3YvYKHEA6HK2ferc=">AAACN3icbVDLSgMxFM34rPU16lIXwSK0CGWmiLpQKLhxJRXsAzq1ZNJMG5rJDElGKEO/oT/jxq2f4E43LhRx68K96Uts64HA4Zxzyb3HDRmVyrKejbn5hcWl5cRKcnVtfWPT3NouySASmBRxwAJRcZEkjHJSVFQxUgkFQb7LSNltX/T98h0Rkgb8RnVCUvNRk1OPYqS0VDevHB+pluvFre5tnM5luvAcOgrxVnpslMfGRNLWwuGv5I4ymbqZsrLWAHCW2COSyu/1nPT3Y69QN5+cRoAjn3CFGZKyaluhqsVIKIoZ6SadSJIQ4TZqkqqmHPlE1uLB3V14oJUG9AKhH1dwoP6diJEvZcd3dbK/qJz2+uJ/XjVS3mktpjyMFOF4+JEXMagC2C8RNqggWLGOJggLqneFuIUEwkpXndQl2NMnz5JSLmsfZ4+udRtnYIgE2AX7IA1scALy4BIUQBFgcA9ewBt4Nx6MV+PD+BxG54zRzA6YgPH1A5Mvrs8=</latexit>
<latexit sha1_base64="9I8t79sYT5d70QJap3gG8m4uWyU=">AAACDnicbZC7SgNBFIZnvcZ4W7W0GQyB2ITdIGrhJWBjGcVcIFmX2clsMmR2dpmZFcKyT2Bj5XvYWCgiFoK1nc/hCzi5FJr4w8DPd85hzvm9iFGpLOvLmJmdm19YzCxll1dW19bNjc2aDGOBSRWHLBQND0nCKCdVRRUjjUgQFHiM1L3e2aBevyFC0pBfqX5EnAB1OPUpRkoj18y3AqS6np946XVSKO2msEU5HEEvudSw7ZZS18xZRWsoOG3sscmdfr+pk/vj94prfrbaIY4DwhVmSMqmbUXKSZBQFDOSZluxJBHCPdQhTW05Coh0kuE5Kcxr0oZ+KPTjCg7p74kEBVL2A093DvaUk7UB/K/WjJV/6CSUR7EiHI8+8mMGVQgH2cA2FQQr1tcGYUH1rhB3kUBY6QSzOgR78uRpUysV7f3i3oWVKx+BkTJgG+yAArDBASiDc1ABVYDBLXgAT+DZuDMejRfjddQ6Y4xntsAfGR8/gfegLA==</latexit>
(1) d1 (2) d2
2R 2R
<latexit sha1_base64="0gz88APMXHPD9OESneWedLqussc=">AAACF3icbVDLSsNAFJ34rPVVdelmsAi6KYmIuvAFblyqWFtoYplMJjp0MgkzN0IJ+QsR/BU3Ij5wqzu/wx9w2rhQ64ELh3Pu5d57/ERwDbb9YQ0Nj4yOjZcmypNT0zOzlbn5Mx2nirI6jUWsmj7RTHDJ6sBBsGaiGIl8wRp+56DnN66Y0jyWp9BNmBeRC8lDTgkYqV2puRGBSz/MGvl5tuKs5tjlEhein50YMWg72AUeMY2DvF2p2jW7DzxInG9S3ft8gN2bneejduXdDWKaRkwCFUTrlmMn4GVEAaeC5WU31SwhtEMuWMtQScweL+v/leNlowQ4jJUpCbiv/pzISKR1N/JNZ+9g/dfrif95rRTCLS/jMkmBSVosClOBIca9kHDAFaMguoYQqri5FdNLoggFE2XZhOD8fXmQnK3VnI3a+rFd3d9GBUpoES2hFeSgTbSPDtERqiOKrtEdekRP1q11b71Yr0XrkPU9s4B+wXr7AgO4o5s=</latexit> <latexit sha1_base64="VNZqcYTQOLvxwDSDtZr4doj9KsA=">AAACGXicbVDLSsNAFJ34tr6qLt0MFqFuSlJEXfgCNy6rWCs0NUwmk3boZBJmboQS8hvd+CtuXCjiUld+hz/g9LFQ64ELh3Pu5d57/ERwDbb9aU1Nz8zOzS8sFpaWV1bXiusbNzpOFWV1GotY3fpEM8ElqwMHwW4TxUjkC9bwu+cDv3HPlOaxvIZewloRaUseckrASF7RdiMCHT/MGvldVq7u5tjlEo9EP7syYuBVsQs8YhoHnpN7xZJdsYfAk8QZk9Lp1xOc9I+fa17x3Q1imkZMAhVE66ZjJ9DKiAJOBcsLbqpZQmiXtFnTUEnMplY2/CzHO0YJcBgrUxLwUP05kZFI617km87ByfqvNxD/85ophIetjMskBSbpaFGYCgwxHsSEA64YBdEzhFDFza2YdogiFEyYBROC8/flSXJTrTj7lb1Lu3R2hEZYQFtoG5WRgw7QGbpANVRHFPXRI3pGL9aD9WS9Wm+j1ilrPLOJfsH6+AZNvqRB</latexit>
Output
Input
|
d y = (w h2 + b)
x2R
<latexit sha1_base64="9t5JxOOFJos7Wnh/aZJ6smWc/vo=">AAACG3icbVDLSsNAFJ34rPUVdelmsAgVoSRF0IVCwY3LCvYBTQyT6aQdOpmEmYlSQv7Djb/ixoUirgQX/o2TNoK2HrhwOOde7r3HjxmVyrK+jIXFpeWV1dJaeX1jc2vb3NltyygRmLRwxCLR9ZEkjHLSUlQx0o0FQaHPSMcfXeZ+544ISSN+o8YxcUM04DSgGCkteWZ9DC+gI+kgRFUnRGroB+l9dps6lCsiMGLZjzr06hk8hv6RZ1asmjUBnCd2QSqgQNMzP5x+hJOQcIUZkrJnW7FyUyQUxYxkZSeRJEZ4hAakpylHIZFuOvktg4da6cMgErq4ghP190SKQinHoa8780PlrJeL/3m9RAVnbkp5nCjC8XRRkDCoIpgHBftUEKzYWBOEBdW3QjxEAmGdiyzrEOzZl+dJu16zrZp9fVJpnBdxlMA+OABVYINT0ABXoAlaAIMH8ARewKvxaDwbb8b7tHXBKGb2wB8Yn9+dqqEa</latexit>
d2
w2R
<latexit sha1_base64="r88cLpKiZJtXS/wgD8iCVci7o1w=">AAACBHicbVC7TsMwFL3hWcorwNjFokJiqhKEBANDJRbGguhDakrlOE5r1XEi20FUUQYWfoWFAYRY+Qg2/gan7QAtR7J0fM69uvceP+FMacf5tpaWV1bX1ksb5c2t7Z1de2+/peJUEtokMY9lx8eKciZoUzPNaSeRFEc+p21/dFn47XsqFYvFrR4ntBfhgWAhI1gbqW9XvAjroR9mDznymEDTr5/d5HdB3646NWcCtEjcGanCDI2+/eUFMUkjKjThWKmu6yS6l2GpGeE0L3upogkmIzygXUMFjqjqZZMjcnRklACFsTRPaDRRf3dkOFJqHPmmsthRzXuF+J/XTXV43suYSFJNBZkOClOOdIyKRFDAJCWajw3BRDKzKyJDLDHRJreyCcGdP3mRtE5qrlNzr0+r9YtZHCWowCEcgwtnUIcraEATCDzCM7zCm/VkvVjv1se0dMma9RzAH1ifP+g+mDo=</latexit>
<latexit sha1_base64="OQGWOHHDxod0Lr8suLTzGRzsBoM=">AAACCHicbVC7TsMwFHXKq5RXgJEBiwqJqUoqJBgYKrEwFkQfUhMix3Faq44T2Q6oijKy8CssDCDEyiew8Tc4bQZoOZKl43Pu1b33+AmjUlnWt1FZWl5ZXauu1zY2t7Z3zN29roxTgUkHxywWfR9JwignHUUVI/1EEBT5jPT88WXh9+6JkDTmt2qSEDdCQ05DipHSkmceOhFSIz/MHnLoUA5nXz+7ye+ywGvmnlm3GtYUcJHYJamDEm3P/HKCGKcR4QozJOXAthLlZkgoihnJa04qSYLwGA3JQFOOIiLdbHpIDo+1EsAwFvpxBafq744MRVJOIl9XFnvKea8Q//MGqQrP3YzyJFWE49mgMGVQxbBIBQZUEKzYRBOEBdW7QjxCAmGls6vpEOz5kxdJt9mwrYZ9fVpvXZRxVMEBOAInwAZnoAWuQBt0AAaP4Bm8gjfjyXgx3o2PWWnFKHv2wR8Ynz/78Znq</latexit>
h1 = tanh(W1 x + b1 )
<latexit sha1_base64="tJT8ci0MBQZl4hoeA/0XzNX9tv4=">AAACJXicbVDLSsNAFJ3UV62vqEs3g0WoCCURQRcVCm5cVrAPaEOYTCft0MkkzEzEEvIzbvwVNy4sIrjyV5y0qWjrgYFzz7mXufd4EaNSWdanUVhZXVvfKG6WtrZ3dvfM/YOWDGOBSROHLBQdD0nCKCdNRRUjnUgQFHiMtL3RTea3H4iQNOT3ahwRJ0ADTn2KkdKSa9Z6AVJDz0+GqWvDa9hTiA8rc7GdifPiMYVnP4WnnVPXLFtVawq4TOyclEGOhmtOev0QxwHhCjMkZde2IuUkSCiKGUlLvViSCOERGpCuphwFRDrJ9MoUnmilD/1Q6McVnKq/JxIUSDkOPN2ZLSkXvUz8z+vGyr9yEsqjWBGOZx/5MYMqhFlksE8FwYqNNUFYUL0rxEMkEFY62JIOwV48eZm0zqu2VbXvLsr1Wh5HERyBY1ABNrgEdXALGqAJMHgCL+ANTIxn49V4Nz5mrQUjnzkEf2B8fQO8cKS+</latexit>
h2 = tanh(W2 h1 + b2 )
<latexit sha1_base64="6qo1Sk9pMYMRHaF55H3jONO0Lxc=">AAACJ3icbVDLSsNAFJ3UV62vqEs3g0WoCCUpgi5UCm5cVrAPaEKYTCft0MkkzEyEEvo3bvwVN4KK6NI/cdJGqa0HBs6ccy/33uPHjEplWZ9GYWl5ZXWtuF7a2Nza3jF391oySgQmTRyxSHR8JAmjnDQVVYx0YkFQ6DPS9ofXmd++J0LSiN+pUUzcEPU5DShGSkueeeWESA38IB2MvRq8hI5CfFD5EduZOFNhw5Pfr6+9Y88sW1VrArhI7JyUQY6GZ744vQgnIeEKMyRl17Zi5aZIKIoZGZecRJIY4SHqk66mHIVEuunkzjE80koPBpHQjys4UWc7UhRKOQp9XZktKee9TPzP6yYqOHdTyuNEEY6ng4KEQRXBLDTYo4JgxUaaICyo3hXiARIIKx1tSYdgz5+8SFq1qm1V7dvTcv0ij6MIDsAhqAAbnIE6uAEN0AQYPIAn8ArejEfj2Xg3PqalBSPv2Qd/YHx9A/dopVU=</latexit>
d1 ⇥d d1 d2 ⇥d1 d2
W1 2 R
<latexit sha1_base64="2wvpmjhpxYNVstsZLu8X4dfHiOc=">AAACE3icbVBNS8NAEJ3Ur1q/qh69LBZBPJREBD14KHjxWMV+QFPDZrNpl242YXcjlJD/4MW/4sWDIl69ePPfuGl70NYHA4/3ZpiZ5yecKW3b31ZpaXllda28XtnY3Nreqe7utVWcSkJbJOax7PpYUc4EbWmmOe0mkuLI57Tjj64Kv/NApWKxuNPjhPYjPBAsZARrI3nVEzfCeuiHWSf3HOQygaaCn93m91lQaJpFVKEg96o1u25PgBaJMyM1mKHpVb/cICZpRIUmHCvVc+xE9zMsNSOc5hU3VTTBZIQHtGeowGZPP5v8lKMjowQojKUpodFE/T2R4UipceSbzuJgNe8V4n9eL9XhRT9jIkk1FWS6KEw50jEqAkIBk5RoPjYEE8nMrYgMscREmxgrJgRn/uVF0j6tO3bduTmrNS5ncZThAA7hGBw4hwZcQxNaQOARnuEV3qwn68V6tz6mrSVrNrMPf2B9/gCLVZ3p</latexit>
b1 2 R
<latexit sha1_base64="PVi41hGXwKM/EROs5iBVxJnxOBc=">AAACCnicbVC7TsMwFHXKq5RXgJHFUCExVQlCgoGhEgtjQfQhNSGyHae16jiR7SBVUWYWfoWFAYRY+QI2/gan7QAtR7rS0Tn36t57cMqZ0o7zbVWWlldW16rrtY3Nre0de3evo5JMEtomCU9kDyNFORO0rZnmtJdKimLMaRePrkq/+0ClYom40+OU+jEaCBYxgrSRAvvQi5Ee4ijHReBCjwk4FXB+W9znYeAWgV13Gs4EcJG4M1IHM7QC+8sLE5LFVGjCkVJ910m1nyOpGeG0qHmZoikiIzSgfUMFiqny88krBTw2SgijRJoSGk7U3xM5ipUax9h0lneqea8U//P6mY4u/JyJNNNUkOmiKONQJ7DMBYZMUqL52BBEJDO3QjJEEhFt0quZENz5lxdJ57ThOg335qzevJzFUQUH4AicABecgya4Bi3QBgQ8gmfwCt6sJ+vFerc+pq0VazazD/7A+vwBEieaeA==</latexit> <latexit sha1_base64="Og1rZA4fC86AjGFewrAci9YM3wI=">AAACFXicbVBNS8NAEN34WetX1KOXxSJ4kJIUQQ8eCl48VrEf0Naw2WzapZtN2J0IJeRPePGvePGgiFfBm//GTduDtj4YeLw3w8w8PxFcg+N8W0vLK6tr66WN8ubW9s6uvbff0nGqKGvSWMSq4xPNBJesCRwE6ySKkcgXrO2Prgq//cCU5rG8g3HC+hEZSB5ySsBInn3aiwgM/TBr514N97jEU8HPbvP7LCg04BHTOPDc3LMrTtWZAC8Sd0YqaIaGZ3/1gpimEZNABdG66zoJ9DOigFPB8nIv1SwhdEQGrGuoJGZTP5t8leNjowQ4jJUpCXii/p7ISKT1OPJNZ3GynvcK8T+vm0J40c+4TFJgkk4XhanAEOMiIhxwxSiIsSGEKm5uxXRIFKFggiybENz5lxdJq1Z1nap7c1apX87iKKFDdIROkIvOUR1dowZqIooe0TN6RW/Wk/VivVsf09YlazZzgP7A+vwB0rmejw==</latexit>
W2 2 R b2 2 R
<latexit sha1_base64="tvOh1VuDGh+gByDJJsf9gRGz3Gs=">AAACCnicbVC7TsMwFHV4lvIKMLIYKiSmKqmQYGCoxMJYEH1IbYhsx2mtOk5kO0hVlJmFX2FhACFWvoCNv8FpM0DLka50dM69uvcenHCmtON8W0vLK6tr65WN6ubW9s6uvbffUXEqCW2TmMeyh5GinAna1kxz2kskRRHmtIvHV4XffaBSsVjc6UlCvQgNBQsZQdpIvn00iJAe4TDDud+AAybgTMDZbX6fBX4j9+2aU3emgIvELUkNlGj59tcgiEkaUaEJR0r1XSfRXoakZoTTvDpIFU0QGaMh7RsqUESVl01fyeGJUQIYxtKU0HCq/p7IUKTUJMKms7hTzXuF+J/XT3V44WVMJKmmgswWhSmHOoZFLjBgkhLNJ4YgIpm5FZIRkohok17VhODOv7xIOo2669Tdm7Na87KMowIOwTE4BS44B01wDVqgDQh4BM/gFbxZT9aL9W59zFqXrHLmAPyB9fkDFUWaeg==</latexit>
12
Loss functions
• Binary classification • Regression
y = w · h2 + b
<latexit sha1_base64="V0Qby0eCxuwXwCxVJ0a3oB6NZto=">AAACDnicbVDLSsNAFJ3UV62vqktFBktBEEpSRN0IRTcuW7APaEKZTCft0MkkzEyUELJ05cZfcSNFEbeu3fkN/oTTB6LVAxcO59zLvfe4IaNSmeaHkZmbX1hcyi7nVlbX1jfym1sNGUQCkzoOWCBaLpKEUU7qiipGWqEgyHcZabqDi5HfvCZC0oBfqTgkjo96nHoUI6WlTr4YwzNo+0j1XS+5SaGNu4H6FvpppwwPodvJF8ySOQb8S6wpKVR2h7XP271htZN/t7sBjnzCFWZIyrZlhspJkFAUM5Lm7EiSEOEB6pG2phz5RDrJ+J0UFrXShV4gdHEFx+rPiQT5Usa+qztHd8pZbyT+57Uj5Z06CeVhpAjHk0VexKAK4Cgb2KWCYMViTRAWVN8KcR8JhJVOMKdDsGZf/ksa5ZJ1XDqq6TTOwQRZsAP2wQGwwAmogEtQBXWAwR14AE/g2bg3Ho0X43XSmjGmM9vgF4y3L3usntI=</latexit>
⇤ ⇤ ⇤ ⇤ ⇤ 2
L(y, y ) =
<latexit sha1_base64="rRgl1nuiddQVks0Gj7Z1U34kvgg=">AAACI3icbVDLSgMxFM3UV62vUZdugkVoxZYZERRBKLpx4aKCfUBnLJk0bUMzD5KMMAzzL278FTculOLGhf9iZjoLbT2Q5OSce0nucQJGhTSML62wtLyyulZcL21sbm3v6Lt7beGHHJMW9pnPuw4ShFGPtCSVjHQDTpDrMNJxJjep33kiXFDfe5BRQGwXjTw6pBhJJfX1S8tFcowRi++SSnQCo8fjKryCNXVazB/FUQJrsGKqLXUyaXarJn29bNSNDHCRmDkpgxzNvj61Bj4OXeJJzJAQPdMIpB0jLilmJClZoSABwhM0Ij1FPeQSYcfZjAk8UsoADn2ulidhpv7uiJErROQ6qjKdSMx7qfif1wvl8MKOqReEknh49tAwZFD6MA0MDignWLJIEYQ5VX+FeIw4wlLFWlIhmPMjL5L2ad006ub9WblxncdRBAfgEFSACc5BA9yCJmgBDJ7BK3gHH9qL9qZNtc9ZaUHLe/bBH2jfP8ZjoJU=</latexit>
y log y (1 y ) log (1 y) LMSE (y, y ) = (y
<latexit sha1_base64="qXd8yxiBWzOEL57x+yWmNbP8hNE=">AAACF3icbVDJSgNBFOxxjXGLevTSGIQoGmaCoBchKIIHhYhmgWz0dHqSJj0L3W/EYZi/8OKvePGgiFe9+Td2loMmFjQUVe/xusoOBFdgmt/GzOzc/MJiaim9vLK6tp7Z2KwoP5SUlakvfFmziWKCe6wMHASrBZIR1xasavfPB371nknFfe8OooA1XdL1uMMpAS21M/mGS6BHiYivknbcAPYA8fXtRZLkogMctfb38CnORfhwyFuFdiZr5s0h8DSxxiSLxii1M1+Njk9Dl3lABVGqbpkBNGMigVPBknQjVCwgtE+6rK6pR1ymmvEwV4J3tdLBji/18wAP1d8bMXGVilxbTw5SqElvIP7n1UNwTpox94IQmEdHh5xQYPDxoCTc4ZJREJEmhEqu/4ppj0hCQVeZ1iVYk5GnSaWQt8y8dXOULZ6N60ihbbSDcshCx6iILlEJlRFFj+gZvaI348l4Md6Nj9HojDHe2UJ/YHz+AK9WnaI=</latexit>
y )
W2R ,b 2 R
<latexit sha1_base64="JMe9y29+++gJ/sKiGS3/HILkxE8=">AAACJXicbVDLSsNAFJ34tr6iLt0MLUKlUJIi6kKh6Malgm2FpoTJdNIOTh7M3EhDyF/4BW78FTcuFBFc+StOWxVtPTBw5px7ufceLxZcgWW9GzOzc/MLi0vLhZXVtfUNc3OrqaJEUtagkYjktUcUEzxkDeAg2HUsGQk8wVrezdnQb90yqXgUXkEas05AeiH3OSWgJdc8Tl2OT7ADbACZinwIyCB3edkJCPQ9P2vl+Jv2c7eGKz9fL99zzZJVtUbA08T+IqV60ancvdfTC9d8cboRTQIWAhVEqbZtxdDJiAROBcsLTqJYTOgN6bG2piEJmOpkoytzvKuVLvYjqV8IeKT+7shIoFQaeLpyuKKa9Ibif147Af+ok/EwToCFdDzITwSGCA8jw10uGQWRakKo5HpXTPtEEgo62IIOwZ48eZo0a1X7oLp/qdM4RWMsoR1URGVko0NUR+foAjUQRffoET2jF+PBeDJejbdx6Yzx1bON/sD4+AQXA6iY</latexit>
yi = softmaxi (Wh2 + b)
C
X
⇤ ⇤
L(y, y ) = yi log yi
<latexit sha1_base64="vO7WpM8DRV37pdQR4k280HdhKGE=">AAACHnicbVDLSgMxFM34rPVVdekmWIQqWmZE0U2h2I0LFxXsAzrtkEnTNjTzIMkIQ5gvceOvuHGhiOBK/8ZMOwttPRA4nHMvN+e4IaNCmua3sbC4tLyymlvLr29sbm0XdnabIog4Jg0csIC3XSQIoz5pSCoZaYecIM9lpOWOa6nfeiBc0MC/l3FIuh4a+nRAMZJacgoXtofkCCOmbpNSfALj3vERrMBTW0Seo2jFSnqqligtO9RmwVDFDk0Sp1A0y+YEcJ5YGSmCDHWn8Gn3Axx5xJeYISE6lhnKrkJcUsxIkrcjQUKEx2hIOpr6yCOiqybxEniolT4cBFw/X8KJ+ntDIU+I2HP1ZBpGzHqp+J/XieTgqquoH0aS+Hh6aBAxKAOYdgX7lBMsWawJwpzqv0I8QhxhqRvN6xKs2cjzpHlWtsyydXderF5ndeTAPjgAJWCBS1AFN6AOGgCDR/AMXsGb8WS8GO/Gx3R0wch29sAfGF8/Vf2iCA==</latexit>
i=1
✓ = {W1 , b1 , W2 , b2 , w, b}
<latexit sha1_base64="CuKUbEiIa4HR5DqaviWEEzZQDyU=">AAACO3icbVDJSgNBEO1xjXGLevTSJAiCIcwEUS9C0IvHKGaBTAg9nZ6kSc9Cd40ShvkLP8aLP+EtFy8eFPHq3c4iZPFBw6v3quiq54SCKzDNgbG0vLK6tp7aSG9ube/sZvb2qyqIJGUVGohA1h2imOA+qwAHweqhZMRzBKs5veuhX3tgUvHAv4d+yJoe6fjc5ZSAllqZOxu6DAi+xHZsewS6jhvXkpaVx3+VM1NprzjjTVWPSR47dtLK5MyCOQJeJNaE5EpZ++RpUOqXW5lXux3QyGM+UEGUalhmCM2YSOBUsCRtR4qFhPZIhzU09YnHVDMe3Z7gI620sRtI/XzAI3V6IiaeUn3P0Z3DNdW8NxT/8xoRuBfNmPthBMyn44/cSGAI8DBI3OaSURB9TQiVXO+KaZdIQkHHndYhWPMnL5JqsWCdFU5vdRpXaIwUOkRZdIwsdI5K6AaVUQVR9Ize0Af6NF6Md+PL+B63LhmTmQM0A+PnF/9tsHk=</latexit>
13
Optimization
(t+1) (t)
✓
<latexit sha1_base64="2xbrEJR+XVhUcysjVyGPSHic0HY=">AAACJnicbVDLSgNBEJz1GeMr6tHLYBAiYtgVQS+C6EU8RTBRyK5L72RihszOLjO9QljyNV78FS8eIiLe/BQnD/BZMFBd1U1PV5RKYdB1352p6ZnZufnCQnFxaXlltbS23jBJphmvs0Qm+iYCw6VQvI4CJb9JNYc4kvw66p4N/et7ro1I1BX2Uh7EcKdEWzBAK4WlYx87HOE2r+Cut9Onx/RLsOUe9W1BfQWRhDAfe/2LypjshKWyW3VHoH+JNyFlMkEtLA38VsKymCtkEoxpem6KQQ4aBZO8X/Qzw1NgXbjjTUsVxNwE+ejMPt22Sou2E22fQjpSv0/kEBvTiyPbGQN2zG9vKP7nNTNsHwW5UGmGXLHxonYmKSZ0mBltCc0Zyp4lwLSwf6WsAxoY2mSLNgTv98l/SWO/6rlV7/KgfHI6iaNANskWqRCPHJITck5qpE4YeSBPZEBenEfn2Xl13satU85kZoP8gPPxCRNypFM=</latexit>
=✓ ⌘r✓ J(✓)
Gradient Descent
<latexit sha1_base64="t1YEIuqVaoSBe7wheKeApNWiXy0=">AAACMHicbVDLSgNBEJz1bXxFPXoZFCEihl0R9SIEPejBQwSjQjaG3snEDM7OLjO9Qlj2L/wNL36KXgwo4tWvcJIVfBYMVFd1M90VxFIYdN2eMzQ8Mjo2PjFZmJqemZ0rzi+cmSjRjNdYJCN9EYDhUiheQ4GSX8SaQxhIfh5cH/T98xuujYjUKXZj3gjhSom2YIBWahYPfexwhMu0hOveWkb36Jdgyw3q24L6CgIJzTT3Mj8E7DCQ6XFWyqW1ZnHFLbsD0L/E+yQrlWV//bZX6VabxQe/FbEk5AqZBGPqnhtjIwWNgkmeFfzE8BjYNVzxuqUKQm4a6eDgjK5apUXbkbZPIR2o3ydSCI3phoHt7K9qfnt98T+vnmB7t5EKFSfIFcs/aieSYkT76dGW0Jyh7FoCTAu7K2Ud0MDQZlywIXi/T/5LzjbL3nZ568SmsU9yTJAlskxKxCM7pEKOSJXUCCN35JE8kxfn3nlyXp23vHXI+ZxZJD/gvH8A7y2r+w==</latexit>
(t+1) (t)
✓ =✓ ⌘r✓ L(✓)
17
Neural networks as
computational graphs
18
Computational graph
z1
x
⋅ W1x
+ f1 h1
z2
⋅ W2h1
+ f2 h2
zo
W1 b1 ⋅ Woh2
+ σ
y
W2 b2
fLCE
Wo bo
Focus on computation
y*
Nodes represent operations LCE
Edges are values from one
operator to the next
W b
h = f(z)
z f h
∂h ∂f(z)
∂L ∂L ∂h = ∂L
= ∂z ∂z
∂z ∂h ∂z ∂h
Downstream
Local
Upstream
= ×
gradient gradient gradient
Gradient wrt input
21
Backpropagation
Multiple inputs Multiple output branches
x1 z1
x f1
∂L ∂L ∂z z = f(x1, x2)
∂x1
=
∂z ∂x1 f
x2 ∂L x z2
∂z f2
∂L ∂L ∂z
=
∂x2 ∂z ∂x2 Sum gradients of branches
<latexit sha1_base64="Mw2UZijWSwWvWR/cDN+pJuopb+0=">AAACWHicdVFdS8MwFE3r5j78qvPRl+AQfBqtiPrgYOCLDz5McB+wzpJm6RaWpiVJxVn6JwUf9K/4YroNnJteCBzOOffe5MSPGZXKtj8Mc6tQ3C6VK9Wd3b39A+uw1pVRIjDp4IhFou8jSRjlpKOoYqQfC4JCn5GeP73N9d4zEZJG/FHNYjIM0ZjTgGKkNOVZkRsIhFM3RkJRxOB99oNfMtiErkxCL6VNJ3tKuRb/t796NINres6tDsw8q2437HnBTeAsQR0sq+1Zb+4owklIuMIMSTlw7FgN03wiZiSruokkMcJTNCYDDTkKiRym82AyeKqZEQwioQ9XcM6udqQolHIW+toZIjWR61pO/qUNEhVcD1PK40QRjheLgoRBFcE8ZTiigmDFZhogLKi+K8QTpKNR+i+qOgRn/cmboHvecC4bFw8X9dbNMo4yOAYn4Aw44Aq0wB1ogw7A4B18GQWjaHyawCyZlYXVNJY9R+BXmbVvIGm2fA==</latexit>
n
X
@L @L @zi
Compute gradients for each input =
@x i=1
@zi @x
{z1 , . . . , zn } = successors of x
<latexit sha1_base64="8Lnb5mt5i14JVtOfZ2CC0/k4EEA=">AAACGHicbVDLSgNBEJz1bXxFPXoZDIIHibsS1IOC4MWjgtFANiyzk14dnJ1ZZnolyZLP8OKvePGgiFdv/o2Tx8FXQUNR1U13V5xJYdH3P72Jyanpmdm5+dLC4tLySnl17crq3HCocy21acTMghQK6ihQQiMzwNJYwnV8dzrwr+/BWKHVJXYzaKXsRolEcIZOisq7YdGLgh0ayrZGu0N7kQr79JiGCB0sqM05B2u1sVQntE87UbniV/0h6F8SjEmFjHEelT/CtuZ5Cgq5ZNY2Az/DVsEMCi6hXwpzCxnjd+wGmo4qloJtFcPH+nTLKW2aaONKIR2q3ycKllrbTWPXmTK8tb+9gfif18wxOWwVQmU5guKjRUkuKWo6SIm2hQGOsusI40a4Wym/ZYZxdFmWXAjB75f/kqu9arBfrV3UKidH4zjmyAbZJNskIAfkhJyRc1InnDyQJ/JCXr1H79l7895HrRPeeGad/ID38QVIM59P</latexit>
22
Backpropagation API
Multiple inputs
x1
Each node (operator) implement
local forward/backward API
∂L ∂L ∂z f(x1, x2) • forward(inputs)
∂x1
=
∂z ∂x1 f ∂L f(x1, …, xk)
x2
∂z • backward(upstream gradient)
∂L ∂L ∂z ∂f ∂f ∂f
= , , …,
∂x2 ∂z ∂x2 ∂x1 ∂x2 ∂xk
23
Example: MultiplyGate
http://web.stanford.edu/class/cs224n/readings/cs224n-2019-notes03-neuralnets.pdf
24
Credits: Chris Manning (Stanford cs224n)
Backpropagation in general computational graph
Xn
@L @L @zi
=
@x i=1
@zi @x
{z1 , . . . , zn } = successors of x
<latexit sha1_base64="8Lnb5mt5i14JVtOfZ2CC0/k4EEA=">AAACGHicbVDLSgNBEJz1bXxFPXoZDIIHibsS1IOC4MWjgtFANiyzk14dnJ1ZZnolyZLP8OKvePGgiFdv/o2Tx8FXQUNR1U13V5xJYdH3P72Jyanpmdm5+dLC4tLySnl17crq3HCocy21acTMghQK6ihQQiMzwNJYwnV8dzrwr+/BWKHVJXYzaKXsRolEcIZOisq7YdGLgh0ayrZGu0N7kQr79JiGCB0sqM05B2u1sVQntE87UbniV/0h6F8SjEmFjHEelT/CtuZ5Cgq5ZNY2Az/DVsEMCi6hXwpzCxnjd+wGmo4qloJtFcPH+nTLKW2aaONKIR2q3ycKllrbTWPXmTK8tb+9gfif18wxOWwVQmU5guKjRUkuKWo6SIm2hQGOsusI40a4Wym/ZYZxdFmWXAjB75f/kqu9arBfrV3UKidH4zjmyAbZJNskIAfkhJyRc1InnDyQJ/JCXr1H79l7895HrRPeeGad/ID38QVIM59P</latexit>
25
Simplified example
Forward pass: compute function value Chain rule Local derivatives
z y
x
⋅ Wx
+ σ fL L
∂L
∂L
=1
L(y, y ) = ⇤ ⇤
y log y (1 ⇤
y ) log (1 y) ∂L ∂L ∂y ∂L ∂σ(z) ∂σ(z)
= = = σ(z)(1 − σ(z))
∂z ∂y ∂z ∂y ∂z
<latexit sha1_base64="rRgl1nuiddQVks0Gj7Z1U34kvgg=">AAACI3icbVDLSgMxFM3UV62vUZdugkVoxZYZERRBKLpx4aKCfUBnLJk0bUMzD5KMMAzzL278FTculOLGhf9iZjoLbT2Q5OSce0nucQJGhTSML62wtLyyulZcL21sbm3v6Lt7beGHHJMW9pnPuw4ShFGPtCSVjHQDTpDrMNJxJjep33kiXFDfe5BRQGwXjTw6pBhJJfX1S8tFcowRi++SSnQCo8fjKryCNXVazB/FUQJrsGKqLXUyaXarJn29bNSNDHCRmDkpgxzNvj61Bj4OXeJJzJAQPdMIpB0jLilmJClZoSABwhM0Ij1FPeQSYcfZjAk8UsoADn2ulidhpv7uiJErROQ6qjKdSMx7qfif1wvl8MKOqReEknh49tAwZFD6MA0MDignWLJIEYQ5VX+FeIw4wlLFWlIhmPMjL5L2ad006ub9WblxncdRBAfgEFSACc5BA9yCJmgBDJ7BK3gHH9qL9qZNtc9ZaUHLe/bBH2jfP8ZjoJU=</latexit>
∂z
Backward pass: gradient using chain rule
∂L ∂L ∂z ∂L ∂(Wx + b) ∂(Wx + b)
∂L ∂L ∂L ∂L = = =1
∂b ∂z ∂b ∂z ∂b ∂b
∂y ∂L
⋅
x ∂Wx ∂z
+ σ fL L ∂L ∂L ∂z ∂L ∂(Wx + b) ∂(Wx + b)
= = =1
∂L ∂Wx ∂z ∂Wx ∂z ∂Wx ∂Wx
∂L
∂W ∂b ∂Wx
∂L ∂L ∂Wx = xT
W b =
∂W ∂Wx ∂W ∂W
26
An example
Try to compute the gradients yourself!
http://web.stanford.edu/class/cs224n/readings/cs224n-2019-notes03-neuralnets.pdf
27
Designing classifiers
with neural networks
Feature design: partly eliminated,
28
Rise of deep-learning frameworks
Pytorch, TensorFlow, Keras, Theano, …
Provide frequently used components that can
be connected together
No longer need to code all the pieces of
• Easy to build complex models
your model and optimizer by yourself
• Connect up neural building blocks
• Mix and match selection of loss functions,
regularizers, and optimizers
• Optimize using auto-differentiation, no need Allows researchers and developers
to hand code optimizers for specific models to focus on
• Deals with numerical stability issues • Modeling the problem
• Deals with efficient computation (e.g. • Designing the network
batching, using GPUs)
• Provides (some) experiment logging and
visualization tools
29
Resources
• There is a lot more to learn about neural networks!
• TA Tutorials
• Optimization
30
Resources
• Deep learning books
31