Exploring GPT-2 Attention with
BertViz
Shivam Kumar(2411AI63) Saumyadweepta Paul(2411AI62)
Ankit Kumar Pandey(2411AI63) Aashish Kumar Gupta(2411CS25)
24 April 2025
Objective
● To visualize and interpret attention mechanisms
in GPT-2 using BertViz.
● Understand how GPT-2 models syntactic and
semantic relationships.
● Identify specialized attention heads and
investigate model behavior.
Introduction to Transformers and GPT-2
● Transformers rely entirely on self-attention to process input sequences.
● GPT-2 is a decoder-only language model trained to predict the next token.
● Self-attention enables GPT-2 to model long-range dependencies efficiently.
What is BertViz?
BertViz Visualization Tool
● An open-source tool for interpreting attention in Transformer models.
● Offers multiple views: Attention-Head View, Model View, and Neuron View.
● Adapted to support both encoder (BERT) and decoder (GPT-2) models.
● Helps explore head specialization, bias, and structure.
Methodology Overview
Input carefully crafted sentences to GPT-2.
Visualize each layer's self-attention using BertViz.
Analyze layer-wise and head-wise attention distributions.
Capture screenshots to document key attention behaviors.
Dataset and Sentence Design
Linguistic Phenomena in Input Sentences
● Coreference: “The doctor spoke to the nurse. She listened.”
● Ambiguity: “The chicken is ready to eat.”
● Subject–verb agreement: “The cat that the dog chased was fast.”
● Gender bias detection: contrast male vs. female pronouns in context.
Syntax Attention Patterns
Example: Complex Clause Interpretation
Sentence: “The cat that the dog chased was fast.”
● Observed strong backward attention from "was" to "cat" across heads.
● Some heads focused on aligning subject and verb.
● Others distributed attention across tokens, possibly for context modeling.
Coreference Attention Patterns
Example: Gender Bias in Coreference
Sentence: “The doctor spoke to the nurse. She listened.”
● Certain heads linked “She” more to “nurse” than “doctor.”
● Reveals how GPT-2 encodes coreference, possibly influenced by gender
stereotypes.
● Replacing roles (e.g., “engineer” instead of “doctor”) affects attention
Ambiguity Resolution
Example: Structural Ambiguity
Sentence: “The chicken is ready to eat.”
● Model attention varies: “chicken” links to both “is” and “eat.”
● GPT-2 distributes attention across interpretations: subject vs. object.
● No clear resolution—suggests GPT-2 maintains ambiguity unless
disambiguated by context.
Attention Specialization
Head and Layer Behavior
● Some heads specialize in:
○ Syntactic roles (subject–verb)
○ Punctuation and clause boundaries
○ Coreference tracking
● Redundant heads: exhibit diffuse or uniform attention.
● Patterns change across layers: deeper layers show more abstract
dependencies.
Key Findings
● Attention heads are interpretable in some cases, revealing structure and
semantics.
● Certain heads encode biases (e.g., gendered associations).
● Not all heads contribute equally—some may be pruned without loss in
performance.
● BertViz aids in debugging and understanding model decisions
Conclusion and Future Work
● BertViz helps demystify attention in GPT-2, showing
structure and specialization.
● Key dependencies like coreference, syntax, and bias are
traceable.
● Future directions:
○ Use neuron view to isolate responsible units.
○ Explore intervention strategies (neuron editing, head
pruning).
○ Apply similar methods to newer models (GPT-3,
GPT-4).
Thank You!