Module1_L5_GPT_variants
Module1_L5_GPT_variants
1
Generative Pre-trained Transformer (GPT)
• GPT is a family of AI models built by OpenAI.
• It stands for Generative Pre-trained Transformer,
• Generative: Generative AI is a technology capable of producing content, such as text and imagery.
• Pre-trained: Pre-trained models are saved networks that have already been taught, using a large data set,
to resolve a problem or accomplish a specific task.
• Transformer: A transformer is a deep learning architecture that transforms an input into another type of
output.
• GPT is a generative AI technology that has been previously trained to transform its input into a different
type of output.
• Initially, GPT was made up of only LLMs (large language models). But OpenAI has expanded this to include
two new models:
• GPT-4o: a large multimodal model (LLM)
• GPT-4o mini: a small language model (SLM)
• Generate human-like responses to a prompt – initially text based.
• But GPT-4o and GPT-4o mini can also work with images and audio inputs because they're multimodal.
2
• GPT-1
• is the first version of OpenAI’s language model.
• It followed Google’s 2017 paper Attention is All You Need, in which researchers introduced
the first general transformer model.
• serves as the framework for Google Search, Google Translate, autocomplete, and all large
language models (LLMs), including Bard and Chat-GPT.
• GPT-2
• is the second transformer-based language model by OpenAI.
• It’s open-source, unsupervised, and trained on over 1.5 billion parameters.
• GPT-2 was designed specifically to predict and generate the next sequence of text to follow a
given sentence.
3
GPT-3
• The third iteration of OpenAI’s GPT model is trained on 175 billion parameters,.
• Trained on Wikipedia entries as well as the open-source data set Common Crawl.
• can generate computer code and improve performance in niche areas of
content creation such as storytelling.
GPT-4
• GPT-4 is the most recent model from OpenAI.
• It’s a large multimodal model (LMM), meaning it's capable of parsing image
inputs as well as text.
• exhibits human-level performance across a variety of benchmarks in the
professional and academic realm.
4
How GPT works
1. Pre-training https://www.youtube.com/watch?v=8Lqi-F8g_ps
• The model is trained on a large dataset consisting of text from the internet.
• During this phase, GPT learns grammar, facts about the worldand some
reasoning abilities for predicting the next word in a sentence.
• It builds a general understanding of language and context from this extensive
data.
2. Fine Tuning
• After pre-training, the model undergoes fine-tuning on a smaller and more
focused dataset.
• This dataset usually contains examples directly related to the intended
application.
5
3. Tokenization
• The process of breaking down input text into smaller parts, known as tokens, can
include words, subwordsor individual characters.
• These tokens are then converted into numerical representations that the model
can process.
4. Transformer architecture
• GPT uses the transformer architecture, which includes mechanisms like self-
attention.
• Self-attention enables the model to assess the significance of individual words
within a sentence, enhancing its comprehension of context and connections
among words.
6
5. Generation
• During the generation phase, the model receives an AI prompt and generates a
coherent and contextually relevant continuation based on its training data.
• It predicts one token at a time, using the previously generated tokens as context.
• This process continues until the desired output length is reached.