0% found this document useful (0 votes)
2 views

Module1_L5_GPT_variants

The document provides an overview of the Generative Pre-trained Transformer (GPT) models developed by OpenAI, detailing their evolution from GPT-1 to GPT-4, including their capabilities and architectures. It explains the processes involved in training these models, such as pre-training, fine-tuning, tokenization, and generation. GPT models are designed to generate human-like responses and can handle multimodal inputs, including text, images, and audio.

Uploaded by

shushmareddyy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module1_L5_GPT_variants

The document provides an overview of the Generative Pre-trained Transformer (GPT) models developed by OpenAI, detailing their evolution from GPT-1 to GPT-4, including their capabilities and architectures. It explains the processes involved in training these models, such as pre-training, fine-tuning, tokenization, and generation. GPT models are designed to generate human-like responses and can handle multimodal inputs, including text, images, and audio.

Uploaded by

shushmareddyy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

L5: GPT and its variants

1
Generative Pre-trained Transformer (GPT)
• GPT is a family of AI models built by OpenAI.
• It stands for Generative Pre-trained Transformer,
• Generative: Generative AI is a technology capable of producing content, such as text and imagery.
• Pre-trained: Pre-trained models are saved networks that have already been taught, using a large data set,
to resolve a problem or accomplish a specific task.
• Transformer: A transformer is a deep learning architecture that transforms an input into another type of
output.
• GPT is a generative AI technology that has been previously trained to transform its input into a different
type of output.
• Initially, GPT was made up of only LLMs (large language models). But OpenAI has expanded this to include
two new models:
• GPT-4o: a large multimodal model (LLM)
• GPT-4o mini: a small language model (SLM)
• Generate human-like responses to a prompt – initially text based.
• But GPT-4o and GPT-4o mini can also work with images and audio inputs because they're multimodal.

2
• GPT-1
• is the first version of OpenAI’s language model.
• It followed Google’s 2017 paper Attention is All You Need, in which researchers introduced
the first general transformer model.
• serves as the framework for Google Search, Google Translate, autocomplete, and all large
language models (LLMs), including Bard and Chat-GPT.
• GPT-2
• is the second transformer-based language model by OpenAI.
• It’s open-source, unsupervised, and trained on over 1.5 billion parameters.
• GPT-2 was designed specifically to predict and generate the next sequence of text to follow a
given sentence.

3
GPT-3
• The third iteration of OpenAI’s GPT model is trained on 175 billion parameters,.
• Trained on Wikipedia entries as well as the open-source data set Common Crawl.
• can generate computer code and improve performance in niche areas of
content creation such as storytelling.
GPT-4
• GPT-4 is the most recent model from OpenAI.
• It’s a large multimodal model (LMM), meaning it's capable of parsing image
inputs as well as text.
• exhibits human-level performance across a variety of benchmarks in the
professional and academic realm.

4
How GPT works
1. Pre-training https://www.youtube.com/watch?v=8Lqi-F8g_ps
• The model is trained on a large dataset consisting of text from the internet.
• During this phase, GPT learns grammar, facts about the world‌and some
reasoning abilities for predicting the next word in a sentence.
• It builds a general understanding of language and context from this extensive
data.
2. Fine Tuning
• After pre-training, the model undergoes fine-tuning on a smaller and more
focused dataset.
• This dataset usually contains examples directly related to the intended
application.

5
3. Tokenization
• The process of breaking down input text into smaller parts, known as tokens, can
include words, subwords‌or individual characters.
• These tokens are then converted into numerical representations that the model
can process.
4. Transformer architecture
• GPT uses the transformer architecture, which includes mechanisms like self-
attention.
• Self-attention enables the model to assess the significance of individual words
within a sentence, enhancing its comprehension of context and connections
among words.

6
5. Generation
• During the generation phase, the model receives an AI prompt and generates a
coherent and contextually relevant continuation based on its training data.
• It predicts one token at a time, using the previously generated tokens as context.
• This process continues until the desired output length is reached.

You might also like