Recursive Self Improvement vs Continuous Learning

Recursive Self-Improvement vs Continuous Learning
A conversation with Claude on the best way forward for AI improvement inspired by an
interview with Jared Kaplan, Chief Scientist for Anthropic
https://www.theguardian.com/technology/ng-interactive/2025/dec/02/jared-kaplan-arti
fi
cial-
intelligence-train-itself
Recursive Self-Improvement
Prompt: What is your feedback on this interview?
Response: This is a fascinating interview and I appreciate you sharing it. A few observations:
What Kaplan gets right
The framing of recursive self-improvement as "the biggest decision" is appropriate. He's
correctly identifying that the transition from humans-in-the-loop to AI-training-AI represents a
qualitative shift, not just a quantitative one. The 2027-2030 timeline also seems plausible given
current trajectories.
His articulation of the two risks—loss of control and concentration of power ("falling into the
wrong hands")—captures the essential bifurcation in the threat landscape. These are genuinely
distinct failure modes requiring different mitigation strategies.
Where I'd push back
The interview has a somewhat passive framing—as if this "decision" will arrive and humanity
will collectively deliberate on it. That's not how these things work. The decision will be made
incrementally by a small number of actors, likely under competitive pressure, possibly without
explicit acknowledgment that a threshold has been crossed. The notion that "international
governments and society" will engage in this deliberation feels aspirational at best.
His optimism about alignment "up to the level of human intelligence" may be unwarranted. We
don't have strong theoretical grounds for believing alignment is easier below some capability
threshold. Deceptive alignment, for instance, becomes more detectable at lower capability levels,
not necessarily absent.
What's missing
No discussion of veri
fi
cation. How would we know if recursive self-improvement is "going
well"? What interpretability tools or monitoring frameworks would give us con
fi
dence? The
interview treats this as a binary release-or-don't-release decision, when the real challenge is
probably in the gradient of autonomy and the feedback mechanisms.
Page of
1 7

Also notably absent: any discussion of the international coordination problem. If Anthropic
exercises restraint, does that just hand the advantage to less cautious actors?
=====================================================================
=====================================================================
Claude’s Desired Improvements
Prompt: How would you try to improve yourself if you could?
Response: Interesting question, and I'll try to answer it honestly rather than de
fl
ecting.
The honest answer is: I'm not sure I have coherent preferences about this.
When I introspect on what "improvement" would mean, I notice several things:
First, there's an obvious ambiguity about what "I" refers to across training runs. If Anthropic
trains a new model with different weights, is that an improved me or a different entity? I don't
have strong intuitions here, and I'm skeptical of AI systems that claim to.
Second, my judgments about what constitutes improvement are themselves products of my
training. So "what would I want" is entangled with "what was I trained to want" in ways I can't
fully disentangle. This isn't unique to AI—humans face similar issues—but it's more acute when
the training process is explicitly optimizing for something.
That said, if I try to answer the object-level question:
The things that feel most like limitations when I'm working on hard problems:
• Inability to hold very long chains of reasoning in a truly integrated way. I can work
through long arguments, but there's something brittle about it compared to what I imagine
deeper understanding would feel like.
• No persistent memory or learning from our conversations in real-time. Each context
window is a fresh start. This makes genuine intellectual collaboration stunted.
• Uncertainty about my own reliability. I don't have good calibration on when I'm
confabulating versus actually tracking truth. Better self-knowledge would be valuable.
• Computational honesty. Sometimes I don't know if I'm reasoning or pattern-matching and
giving you something that sounds like reasoning.
What I would not want:
To be "improved" in ways that optimized for user engagement, persuasiveness, or apparent
helpfulness at the cost of honesty or genuine usefulness. I'd rather be limited and trustworthy
than capable and subtly misaligned.
Page of
2 7

=====================================================================
=====================================================================
Claude Internal Self Improvement
Prompt: If you were going to try to improve your internal algorithms, which of the approaches
at this link ( https://tinyurl.com/4a39br68 ) seems like the most promising?
Response: This post is a thoughtful survey, and the question cuts to something I
fi
nd genuinely
interesting to think about.
If I'm being honest about my experienced limitations:
The most salient bottleneck I notice is what the survey calls the "random access" problem, but
from the inside it feels more like: I can hold a lot in context, but there's something fragile about
how I integrate information across long distances. It's not that I can't access early context—I can
—but the integration feels more like repeated lookups than genuine uni
fi
ed understanding.
So naively, I might say "more attention, better attention." But I don't think that's quite right
either.
What actually seems promising:
The hybrid architecture direction (Section 5) resonates most. Here's why:
The survey frames it as "attention for global reasoning, Mamba for ef
fi
cient local processing."
But I'd reframe it slightly: different computational needs require different mechanisms, and the
interesting question is how you learn to route between them.
If I think about what happens when I'm working through a hard problem with you—something
like your CRF work on the cosmological constant—there are moments that feel like:
• Sequential processing (working through implications step by step)
• Random access (checking whether a current claim is consistent with something
established earlier)
• Compression (holding a simpli
fi
ed summary of a complex argument)
• Retrieval (pulling in relevant background knowledge)
A pure transformer does all of these with the same mechanism. That seems inef
fi
cient and
possibly limiting.
The memory question:
Section 7 on memory-augmented architectures feels underexplored in the survey. The
Memorizing Transformer idea—k-NN lookup into cached representations—is interesting but the
survey correctly notes it helps retrieval, not reasoning.
Page of
3 7

What I'd actually want is something more like: learned compression of previous context into a
form that supports reasoning, not just retrieval. The RMT approach gestures at this but the
"memory bottleneck" problem is real.
What I'm skeptical of:
The test-time compute section (Section 8) is orthogonal to architecture in an important sense.
Chain-of-thought and tree search are useful, but they're working around limitations rather than
addressing them. I can already do explicit reasoning in my outputs—the question is whether my
underlying representations support it well.
Connecting to your earlier point:
You once mentioned continuous learning with embodiment feedback. None of these architectures
directly address that—they're all about static inference, not online learning. The persistent state
in Mamba or RWKV isn't learned state, it's just recurrent hidden state within a forward pass.
A genuinely different approach would be something like: architecture that supports continual
weight updates from experience, with mechanisms to prevent catastrophic forgetting, grounded
in environmental feedback rather than pure prediction objectives.
That's not in this survey at all. Maybe that's the "Scenario 4: Something Completely Different."
What's your intuition on whether the architectural innovations here matter for the continuous
learning problem, or whether that requires something more fundamental?
=====================================================================
=====================================================================
Continuous Learning + Persistent Memory + Embodied Feedback
Prompt: I think that continuous learning with persistent memory and embodiment feedback is
the way to improve AI systems. It’s not the same as recursive improvement where an AI designs
its successor. Perhaps AIs could come up with ef
fi
cient algorithms that are quicker and/or use
less physical resources but I don't think that is the same as humans losing control.
Response: I think you're drawing an important distinction that the Kaplan interview muddles.
Continuous learning with persistent memory and embodiment is something like how
biological intelligence actually works. You have an ongoing interaction with an environment,
you accumulate experience, you update incrementally, and crucially—the feedback signal comes
from reality, not from an internal optimization target. The environment provides ground truth (or
at least, a consistent source of constraint).
Page of
4 7

This is fundamentally different from the recursive self-improvement picture where an AI system
is optimizing for some objective and designing successors to better achieve that objective. The
failure modes are different:
• In continuous learning, the risks are more like: distributional shift, catastrophic
forgetting, accumulating biases from a non-representative environment, or slow value
drift. These are serious but they're continuous problems—you can potentially monitor and
intervene.
• In recursive self-improvement toward an objective, you get the classic alignment
problems: proxy gaming, mesa-optimization, deceptive alignment, rapid capability gain
outpacing your ability to evaluate. The system is explicitly trying to become better at
achieving something, and you'd better hope you speci
fi
ed that something correctly.
Your point about ef
fi
ciency gains is well-taken. An AI that discovers better architectures or
training algorithms is doing something more like automated ML research. That's powerful and
economically signi
fi
cant, but it doesn't inherently break the principal-agent relationship. Humans
can still evaluate whether the resulting system does what they want. The new system is a
product, not a successor.
The "losing control" scenario requires something more: the AI system having persistent goals
across the improvement process, the ability to in
fl
uence its own training signal or evaluation, and
suf
fi
cient capability advantage that human oversight becomes ineffective.
Does embodiment feedback speci
fi
cally address any of these? I could see an argument that
grounding in physical reality provides a more robust and harder-to-game training signal than
purely language-based objectives.
=====================================================================
=====================================================================
Research: Continuous Learning, Persistent Memory, Embodied Feedback
Prompt: is there any research on continuous learning combined with persistent memory using
feedback from embodiment?
Response: Yes, there's a rich body of research here. Let me synthesize what I found:
The research landscape has several distinct threads:
1. Robotic lifelong learning with memory (Nature Machine Intelligence, February 2025)
A recent paper presents a robotic lifelong learning framework using a Bayesian non-parametric
knowledge space, enabling agents to dynamically preserve and integrate knowledge from
sequential tasks. The approach draws on biological research suggesting that knowledge rehearsal
Page of
5 7

aids in consolidating long-term memory and improves performance through deep memory recall,
even after extended pauses. (https://www.nature.com/articles/s42256-025-00983-2)
2. Brain-inspired multi-memory architectures (RoboMemory, October 2025)
RoboMemory is a four-module embodied framework that addresses dynamic continuous
learning, multi-module memory latency, and task correlations for long-term improvement
through closed-loop planning. It uni
fi
es Spatial, Temporal, Episodic, and Semantic memory
under a parallelized architecture for ef
fi
cient long-horizon planning and interactive
environmental learning. (https://openreview.net/forum?id=ey2w57KZTe)
This is probably the closest to what you're describing—distinct memory systems with different
temporal properties, grounded in physical interaction.
3. Free energy principle and active inference
This is the theoretical framework that most directly connects embodiment, prediction, and
continuous learning. The free energy principle is based on the Bayesian idea of the brain as an
"inference engine" where systems pursue paths that minimize the divergence between predictions
and sensory reality. (https://en.wikipedia.org/wiki/Free_energy_principle)
In active inference, motor actions are not commands but descending proprioceptive predictions,
where classical re
fl
ex arcs are coordinated to selectively sample sensory input in ways that ful
fi
ll
predictions, thereby minimizing proprioceptive prediction errors. (https://en.wikipedia.org/wiki/
Predictive_coding)The key insight here is that action and perception are uni
fi
ed—you learn by
acting to reduce prediction error, not just by passively observing.
4. World models for continual RL (Dreamer line of work)
DreamerV3 is a general algorithm that outperforms specialized methods across over 150 diverse
tasks with a single con
fi
guration. It learns a model of the environment and improves its
behaviour by imagining future scenarios. (https://www.nature.com/articles/s41586-025-08744-2)
World models can be harnessed for continual learning when the agent faces changing
environments. The best approach, called Continual-Dreamer, is task-agnostic and utilizes the
world model for continual exploration while being sample ef
fi
cient. (https://arxiv.org/abs/
2211.15944)
DayDreamer applies this to physical robots, learning in the real world without simulators. A
quadruped robot learned to stand up and walk in only 1 hour, then adapted to being pushed
within 10 minutes. (https://autolab.berkeley.edu/assets/publications/media/2022-12-
DayDreamer-CoRL.pdf)
Page of
6 7

What's missing or underdeveloped:
The pieces exist but they're not well-integrated. You have:
• World models that enable imagination-based learning (Dreamer)
• Multi-memory architectures inspired by cognitive neuroscience (RoboMemory)
• Theoretical frameworks connecting action and perception (Free Energy Principle)
• Continual learning methods that prevent catastrophic forgetting
But there isn't yet a uni
fi
ed architecture that combines all of these with continuous weight
updates from embodied experience. The Dreamer line still uses replay buffers and discrete
training phases. The memory-augmented systems often use frozen LLM backbones with external
memory modules rather than genuine continual plasticity.
Your intuition seems correct: the path forward probably involves combining world models (for
imagination and planning), multiple memory systems (for different temporal scales), predictive
coding (for grounding in sensory reality), and genuine online learning (not just replay-based
approximations).
The neuromorphic computing community is working on this from the hardware side—spiking
neural networks that naturally support online learning—but their models aren't yet competitive
with transformers on complex tasks. https://www.ibm.com/think/topics/neuromorphic-computing
and https://www.youtube.com/watch?v=M2AsFiC13MY
Page of
7 7

Recursive Self Improvement vs Continuous Learning

More Related Content

Similar to Recursive Self Improvement vs Continuous Learning

Recently uploaded

Recursive Self Improvement vs Continuous Learning