letta (formerly MemGPT): the context surgeon for long-lived AI agents
A plain-English guide to Letta — the agent framework (originally MemGPT) that manages an LLM's context window so your agent remembers across weeks of conversation. Apache-2.0 licensed. 15-minute install.
Short version: Letta (the company and framework that commercialized the MemGPT research from UC Berkeley) is the context surgeon for long-lived AI agents. Instead of treating memory as a database, it manages the LLM's context window directly — pulling relevant memories into view, pushing stale ones to archival, maintaining coherence across weeks of conversation with one agent. Apache-2.0 licensed. 15-minute install. By Letta.
What is Letta?
Letta came out of the MemGPT paper from UC Berkeley in 2023. The central insight: the LLM's context window is the bottleneck, not retrieval. If you can manage what's in the agent's context — always-available core memory, searchable archival memory, recent recall memory — you can simulate a much larger effective memory than the window allows.
Where mem0 scales breadth (many users, many agents), Letta scales depth (one agent, many turns, deep history). Its core abstraction is the agent — a long-lived object that manages its own memory decisions. You don't write retrieval logic by hand; the agent does.
Who this is for
- Teams building 1–10 long-lived agents (research assistant, coding pair, negotiation bot, therapy companion) where the same agent talks to the same user over weeks.
- Developers who want the agent itself to manage memory — core/archival/recall — rather than coding retrieval logic from scratch.
- Python-first teams comfortable with the agent-as-a-unit mental model.
- Anyone who read the MemGPT paper and wanted to build on it.
Skip this if
You're building a multi-user product where thousands of users each need scoped memory — that's mem0's job. Letta's agent model is heavier per-instance; don't reach for it when you need horizontal scale.
What problem it solves
The LLM's context window is finite. Even with 200k tokens, you can't keep weeks of conversation history fully in view. Classic solutions (RAG over vectors) pull relevant chunks on each turn but lose coherence — the agent doesn't know why a memory matters, only that it scored well on similarity.
Letta treats context as a managed resource. The agent decides what to pull forward. Core memory (your name, your goals) stays in-context forever. Archival memory (your full conversation history) gets pulled when relevant. Recall memory (recent turns) stays warm. The result feels like an agent that actually remembers you — because its context is curated, not retrieved.
How to install it (plain English)
- Install.
pip install letta. - Start the server. Letta runs as a service; spin it up locally with
letta server. - Create an agent. Python SDK: one call to create an agent with a name and system prompt.
- Talk to it. SDK exposes
send_message(). The agent manages its own memory across turns. - Watch it remember. After 20+ turns, start a new session. The agent recalls prior context coherently.
Full walkthrough: /memory/tools/letta.
What you can do with it (for a non-technical founder)
If your team is building a single-agent product:
- An assistant that genuinely remembers you — after a month of use, it knows you the way a good chief of staff would.
- Coherent multi-turn reasoning — the 200th turn feels as sharp as the first.
- Self-managing memory — you don't write "should I save this" logic; the agent does.
- Model-agnostic — Letta supports Claude, GPT-4, and local models out of the box.
- Managed hosting available — letta.com offers a hosted tier if you don't want to run Docker.
What CLO adds on top
Letta gives your product's agent deep memory across turns. Cognition CLO gives your internal team a retention layer on top of organizational knowledge. Different scope. Run both if your product has Letta agents AND you want your own team to retain the institutional knowledge your agents surface.
FAQ
Is Letta the same as MemGPT?
Yes. MemGPT was the research (UC Berkeley, 2023). Letta is the company and framework built on that research — same architecture, plus production polish, managed hosting, and ongoing maintenance.
Can I use Letta with Claude?
Yes. Letta supports Claude, GPT-4, and local models (Ollama, vLLM). Model is configurable per agent.
How is "core memory" different from a system prompt?
Core memory is always in context, like a system prompt — but the agent can update it during conversation. Your name changes; it updates. Your goals shift; it updates. A static system prompt can't do that.
How does it scale?
Per-agent, well. Per-user-base, worse than mem0 — each Letta agent is a heavier object. If you want 10k users each with an agent, mem0's per-user model is the better fit.
Can I migrate from mem0 to Letta?
Possible but non-trivial — they have different primitives. Usually teams run both for different purposes, not migrate.
What's the community size?
Smaller than mem0 but active, concentrated around agent researchers and MemGPT enthusiasts. Good Discord presence.
How does it compare to mem0?
See /blog/letta-vs-mem0 for the full comparison. Short version: Letta for depth, mem0 for breadth.
Ready to install? Full walkthrough at /memory/tools/letta. Comparison: /blog/letta-vs-mem0. Credit to the Letta team and the original MemGPT authors — star the repo if context surgery solves your agent-coherence problem.
Share this post: