Aymeric Roucher’s Post

View profile for Aymeric Roucher

Building Agents, formerly at Hugging Face | Polytechnique - Cambridge

STOP EVERYTHING NOW - we might finally have a radical architecture improvement over Transformers!!! 🚨 A lone scientist just proposed Tiny Recursive Model (TRM), and it is literally the most impressive model that I've seen this year. ➡️ Tiny Recursive Model is 7M parameters ➡️ On ARC-AGI, it beats flagship models like Gemini-2.5-pro Consider how wild this is: Gemini-2.5-pro must be over 10,000x bigger and had 1,000 as many authors 😂 (Alexia is alone on the paper) What's this sorcery? In short: it's a very tiny Transformers, but it loops over itself at two different frequencies, updating two latent variables (i.e. two vectors): one is the proposed answer and the other is... the reasoning. Representing reasoning with a vector, this makes sense: it's much more efficient than building reasoning by generating loads of tokens. Alexia Jolicoeur-Martineau started from the paper Hierarchical Reasoning Model, published a few months ago, that already showed breakthrough improvement on AGI for its small size (27M) Hierarchical Reasoning Model had introduced one main feature: 🔎 Deep supervision In their model, one part (here one layer) would run at high frequency, and another would be lower frequency, running only every n steps. They had used a recurrent architecture, where these layers would repeat many times ; but to make it work they had to do many approximations, including not fully backpropagating the loss through all layers. Alexia studied what was useful and what wasn't, and cleaned the architecture as follows : Why use a recurrent architecture, when you can just make it a loop? ➡️ She made the network recursive, looping over itself Why use 2 latent variables ? ➡️ She provides a crystal clear explanation : the one that changes frequently is the reasoning, the one that changes at low frequency is the proposed answer. ➡️ She runs ablation studies to validate that 2 is indeed optimal. Like with all great research, when reading this paper I felt like everything felt in place naturally : this new setup is a much more elegant way to process reasoning than generating huge chains of tokens as all flagship models currently do. One caveat : TRM does not generate text, it works on fixed length outputs, like the grids of sudoku or ARC. But there's not real blocker to adapting it to text, and I see a high probability this gets done over the next weeks. This might be the breakthrough that we've awaited for so long!

  • diagram
Aymeric Roucher

Building Agents, formerly at Hugging Face | Polytechnique - Cambridge

1w
Dr. Dzmitry Ashkinadze

NLP Engineer @MDPI | PhD in Bioinformatics @ETH Zürich

1w

interesting stuff, I am wondering what are those 2 cycles from the brain that the model was inspired from

Charles Lorin

Sales | Marketing | Partnerships -> AI Startups

1w

Louis-Marie Lorin on “better thinking per joule”. Wonder how the dual rythm impacts training stability and generalisation..

Ankush Singal

Senior Data Scientist | Prompt Engineering, Large Language Models| Freelancer, Patent Law | TS/SCI + FBI clearance

1w

Aymeric Roucher How is it different than MOA(Mixture of Agents)?

Seems to good to be true, I will wait to see the results on the official website. Interesting idea though !

Andrés Cotton

Chevening Scholar | LLMs and NLP | MSc at University of Edinburgh

1w

Would you mind sharing the link to the paper? I'd like to check it out

Like
Reply
Anel Music 🍉

Senior Machine Learning Engineer @ Accenture Applied Intelligence

1w

I love research being done in the small-model regime, but comparing a model trained for a specific task, like ARC and Sudoku, to a general-purpose model is at the very least somewhat unfair.

Alex S.

Open-minded researcher | Consultant | Author | Impact-driven | Long-term focus | Science, Tech, Biz | IT/AI

1w

HRM, TRM and other similar architectures are good for tests like ARC not more, but if to combine XRM + LLM...

See more comments

To view or add a comment, sign in

Explore content categories