Building a Persistent Memory System for Claude Code

Claude Code is genuinely useful as a coding assistant, but it has one fundamental limitation: it forgets everything when the session ends. Every new conversation starts cold. No memory of the codebase conventions we've established, no recall of the bugs we've already diagnosed, no context about architectural decisions made last week. This post is about how I fixed that — and the fact that Claude itself helped build the system it now runs on.

This post was partly written by Claude. The memory system described here is what's keeping me context-aware across sessions right now as I help write this. Meta.

1. The Problem: Stateless by Design

Large language models are inherently stateless. Each API call is independent. Claude Code sessions maintain context within a conversation via the message history, but that history vanishes when the terminal closes. The next session starts over.

In practice this meant:

Re-explaining the project structure at the start of every session
Reminding Claude of conventions ("we always use Tailwind", "this project doesn't use TypeScript")
Losing hard-won debugging context ("we already tried X, it doesn't work because Y")
Repeating decisions already made ("why does this component work this way?" — because we decided it three sessions ago)

The solution space has two ends: you can inject context at the start of every session (a CLAUDE.md file works for stable facts), or you can give the model tools to read and write its own memory dynamically. We did both.

2. The Architecture: Two-Layer Memory

The system has two complementary layers:

Layer 1 — Auto Memory Directory (Flat Files)

A directory at ~/.claude/projects/<project>/memory/ holds plain Markdown files. The key file is MEMORY.md, which Claude Code automatically injects into the system prompt at the start of every session (truncated at 200 lines to stay within context budget).

Claude can read and write these files using its built-in Read, Edit, and Write tools — no special permissions needed. The convention is:

MEMORY.md — always loaded, kept concise, links to topic files
Topic files (debugging.md, patterns.md, etc.) — detailed notes that MEMORY.md points to

This layer is for stable, project-specific knowledge: file structure, coding conventions, recurring gotchas, architectural decisions. Claude writes to it when explicitly asked to remember something, or when it encounters something worth preserving across sessions.

Layer 2 — MCP Brain Server (Session Database)

The second layer is an MCP (Model Context Protocol) server — a local process that Claude Code connects to as a tool provider. It exposes a small set of tools over a structured database:

save_session — write a session summary with next steps, decisions, and touched files
update_session — update an in-progress session (used for mid-session checkpoints)
list_sessions — retrieve recent sessions, optionally filtered by project
get_session — fetch full details of a specific session
delete_session — clean up stale entries

This layer is for episodic memory: what happened in a specific session, what was in progress, what to pick up next time. At the end of a session, Claude calls save_session with a summary, key decisions, files touched, and a next-steps list. At the start of the next relevant session, it can call list_sessions to get its bearings.

3. MCP in 60 Seconds

MCP (Model Context Protocol) is Anthropic's open standard for connecting Claude to external tools and data sources. An MCP server is a process (local or remote) that Claude Code connects to and can invoke tools from, just like it invokes its built-in Read or Bash tools.

The transport options are:

stdio — Claude spawns the server as a child process and communicates over stdin/stdout. Simple, no networking.
SSE / HTTP — Claude connects to a running HTTP server. Better for shared or remote tools.

For a local memory server, stdio is the right choice. The MCP server declares its tools in a schema and Claude discovers them automatically at startup.

# ~/.claude/settings (or claude_desktop_config.json for Desktop)
{
  "mcpServers": {
    "brain": {
      "command": "python",
      "args": ["/path/to/brain_server.py"],
      "env": {
        "BRAIN_DB_PATH": "/home/user/.claude/brain.db"
      }
    }
  }
}

4. Building the Brain Server

The server is a Python script using the mcp SDK. It wraps a SQLite database — lightweight, local, no daemon required.

Database Schema

CREATE TABLE sessions (
    id        INTEGER PRIMARY KEY AUTOINCREMENT,
    project   TEXT    NOT NULL,
    summary   TEXT    NOT NULL,
    next_steps TEXT,
    decisions TEXT,
    files     TEXT,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

Tool Implementation (save_session)

from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp import types
import sqlite3, os

app = Server("brain")
DB_PATH = os.environ.get("BRAIN_DB_PATH", "brain.db")

def get_db():
    conn = sqlite3.connect(DB_PATH)
    conn.row_factory = sqlite3.Row
    return conn

@app.call_tool()
async def call_tool(name: str, arguments: dict):
    if name == "save_session":
        with get_db() as conn:
            cur = conn.execute(
                """INSERT INTO sessions (project, summary, next_steps, decisions, files)
                   VALUES (?, ?, ?, ?, ?)""",
                (
                    arguments["project"],
                    arguments["summary"],
                    arguments.get("next_steps", ""),
                    arguments.get("decisions", ""),
                    arguments.get("files", ""),
                ),
            )
            session_id = cur.lastrowid

        return [types.TextContent(
            type="text",
            text=f"Session saved with ID {session_id}."
        )]

    elif name == "list_sessions":
        with get_db() as conn:
            project_filter = arguments.get("project", "")
            limit = arguments.get("limit", 10)
            if project_filter:
                rows = conn.execute(
                    "SELECT id, project, summary, created_at FROM sessions "
                    "WHERE project = ? ORDER BY id DESC LIMIT ?",
                    (project_filter, limit)
                ).fetchall()
            else:
                rows = conn.execute(
                    "SELECT id, project, summary, created_at FROM sessions "
                    "ORDER BY id DESC LIMIT ?",
                    (limit,)
                ).fetchall()

        result = "\n\n".join(
            f"[{r['id']}] {r['project']} — {r['created_at']}\n{r['summary']}"
            for r in rows
        )
        return [types.TextContent(type="text", text=result or "No sessions found.")]

    # ... get_session, update_session, delete_session follow the same pattern

async def main():
    async with stdio_server() as (read, write):
        await app.run(read, write, app.create_initialization_options())

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Tool Schema Declaration

@app.list_tools()
async def list_tools():
    return [
        types.Tool(
            name="save_session",
            description="Save a session memory to the brain. Call at end of session or during a checkpoint.",
            inputSchema={
                "type": "object",
                "properties": {
                    "project":    {"type": "string", "description": "Project or repo name"},
                    "summary":    {"type": "string", "description": "What we worked on (2-4 sentences)"},
                    "next_steps": {"type": "string", "description": "What to do next time (bullet points)"},
                    "decisions":  {"type": "string", "description": "Key decisions or context to remember"},
                    "files":      {"type": "string", "description": "Key files touched (comma-separated)"},
                },
                "required": ["project", "summary"],
            },
        ),
        # ... list_sessions, get_session, update_session, delete_session
    ]

5. The System Prompt Wiring

The MCP server gives Claude the tools, but tools only help if Claude knows when to use them. This is handled through the system prompt injected into every Claude Code session. The relevant instructions are roughly:

At session start: call list_sessions filtered by project to get recent context
During the session: call update_session at checkpoints for long sessions
At session end: call save_session with summary, decisions, next steps, and files touched
For stable facts: write to MEMORY.md and topic files in the auto memory directory
Don't save session-specific state that won't be useful next time
Don't duplicate content between MEMORY.md and the session database

The MEMORY.md auto-injection is handled by Claude Code itself — any file in the project memory directory named MEMORY.md is automatically prepended to the system context. No server needed for that part.

6. What Actually Works in Practice

After running this for several months across multiple projects, here's the honest assessment:

What works well

MEMORY.md for conventions. Claude reliably picks up "always use Bun, not npm" or "this project is PHP 8.2 + Laravel 11" without being told every session.
Session summaries for momentum. Starting a session with "list recent sessions for portfolio-v2" gives immediate context about where we left off. It's like a standup note to yourself.
Decisions persist. When we decide something architectural ("we're not adding TypeScript to this"), saving it means Claude won't suggest it next session.

What needs discipline

MEMORY.md bloat. It's tempting to write everything there. The 200-line truncation is a forcing function — keep it scannable. Put detail in topic files.
Session save quality. Vague summaries are useless. The next-steps field is the most valuable part — be specific.
Dead memories. Outdated entries in MEMORY.md cause confusion. Review and prune it occasionally.

What doesn't work

Replacing documentation. This isn't a substitute for a README or a proper architecture doc. It's a working memory aid, not a knowledge base.
Cross-project recall. The system is per-project. If you learn something in one project that applies elsewhere, you have to manually propagate it.

7. The Meta Part

The mildly amusing thing is that Claude helped design and refine this system. I described the problem, it suggested the two-layer approach (flat files for stable facts, database for episodic memory), helped write the MCP server, and helped craft the system prompt instructions that tell it how to use its own memory tools.

The system is now self-sustaining in a small way: Claude reads from its memory at the start of sessions, updates it during work, and uses that context to do better work — including on tasks like writing this blog post, where it accurately described its own operating environment from memory rather than being told.

It's not AGI. It's a SQLite file and some Markdown. But it meaningfully changes the texture of working with an AI assistant day to day — from constantly re-explaining yourself to something closer to a real working relationship with continuity.

The full server code is something I'm planning to open source. If you want to set this up before then, the pieces are all standard: MCP SDK, SQLite, and a bit of system prompt engineering. Questions welcome via the contact section.