0% found this document useful (0 votes)
114 views2 pages

LLM Lec-Transformer Architecture

Uploaded by

Tuan Anh Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views2 pages

LLM Lec-Transformer Architecture

Uploaded by

Tuan Anh Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Transformer Architecture

Large Language Model ((LM) is a neural network

Self-supervised learning SSL) is a machine


learning technique where models are

trained of the data the rest of the data.


to predict part input using
Masked Language Modeling (MLM)

ATTENTION MECHANISM

Midterm Project for LLM course :

: Use Attention , Transformen to build


a
Simple LLM Model

Self-Attention

W-q = nn . Linear(d-in ,
d-out , bias =
False)
W .
K = nn . Linear (d-in ,
d-out , bias =
False)
W V -
= nn . Linear (d-in ,
d-out) bias =
False)

TRANSFORMER MODEL

~ Embedding

V Positional
Embedding

V Attention

Dense Layer

Residual Connections x :
1) F(x) + X
,timax-attention
attention scores
weights >
Context Vector

(for a given query (word) Embedding

Dot-product Attention :

Q k
~

.6
3 .
Extending Single-head Attention

We simply stack multiple single-head attention

modules to obtain a multi-head attention

You might also like