Outcome School reposted this
What Each Transformer Block Learns? LLMs learn in layers inside the Transformer: We have N Transformer blocks, stacked one after another. Assume N = 12. What each Transformer block learns: • Block 1 to 4: basic patterns (tokens, positions, simple relations) • Block 5–8: syntax (phrases, grammar) • Block 9-12: semantics (meaning, long-range context) Just like reading a book: letters → words → sentences → meaning. Humans can’t reverse this process while reading. A similar thing happens in this architecture, and this is an emergent behavior, not something explicitly hard-coded. This happens because of: • Stacked composition: The model learns by building ideas step by step, layer on top of layer. • Residual learning: Each layer adds a small improvement without forgetting what was already learned. Understanding emerges layer by layer. This is what each Transformer block learns. #llm #ai #deeplearning