Building Khadam: How I Gave Claude a Permanent Home on My Server
I'm a solo founder. I run five active projects, manage my own server infrastructure, and do all the engineering myself. I've been using AI coding assistants for a while but kept running into the same wall: every session starts cold. The AI doesn't know my codebase, doesn't remember what we decided last week, and has no idea that Zal filed three bug reports this morning.
So I built Khadam — an AI assistant that lives on my server, knows my projects deeply, watches over my infrastructure, and talks to me via Telegram like a colleague would. This is the story of how I built it, what I changed along the way, and what I learned.
This post was written with Khadam. It searched its own session history, pulled architecture notes from memory, and helped draft the text. The system describing itself using its own memory tools is the whole point.
1. The Starting Point: February 25, 2026
It started with a simple idea: install Claude Code on a server and give it persistent memory. The first session on February 25 was just the scaffolding — Claude Code installed, a Brain MCP server built on SQLite, a brain.sh CLI with an fzf session picker, and a tmux auto-save hook.
The Brain MCP server exposed five tools to Claude: save_session, update_session, list_sessions, get_session, delete_session. At the end of every coding session, Claude would summarize what happened — decisions made, files touched, next steps — and store it in SQLite. At the start of the next session, it could read that back and have context immediately.
This worked. But it was still just a better Claude Code. It didn't act on anything autonomously. I had to open a terminal to use it. That needed to change.
2. Making It Interactive: March 23, 2026
The first real upgrade was turning Khadam into a Telegram bot. I wrote a webhook listener in Python — a small FastAPI service running on port 5679 — that received Telegram messages and passed them to Claude Code via CLI.
The first version used one-shot claude -p calls. Every message was a new conversation with no continuity. That was useless for anything beyond simple queries.
The fix was Claude Code's --resume SESSION_ID flag. Instead of starting fresh, each message continues the same Claude conversation. The session ID is persisted in a file (~/.khadam-session). When a 10-minute gap between messages is detected, the session closes, auto-saves a summary to Brain, and the next message starts fresh.
I also added 14 slash commands. Three are instant (no Claude involved): /help, /reset, /sessions. Eleven are smart — they get translated to focused prompts and passed to Claude within the active conversation, so you can follow up on the results:
/status → check server health, containers, disk
/logs → tail recent logs from a specified service
/fix → assess and fix a GitHub issue end-to-end
/commit → stage, summarize, and commit recent changes
/pr → create a pull request with a drafted description
/review → code review the current working changes
/deploy → check deploy status and run deploy script
/search → web search via SearXNG
/docs → look up library documentation via Context7
/issues → list open GitHub issues
/simplify → refactor recent code for clarity and correctness
Each smart command ends with a formatting instruction: "Plain text only, no markdown." Because Telegram doesn't render markdown headers or dashes — it has its own syntax.
3. Adding Family: The Dyba Feature
My wife Noradibah — Dyba — is not technical. She doesn't know what a webhook is and doesn't care. But I wanted her to have access to the same assistant, so I added her Telegram chat ID to the allowed senders list.
The first thing she used it for was a skincare reminder. She has an 8-step morning routine and was forgetting steps. We built a gamified reminder system through n8n: 8am daily, Telegram message, streak tracking, point system for consistency. She's been using it every day.
It matters because it proves the system works beyond just engineering tasks. A non-technical user can interact with Claude through a conversational interface and get real value from it. The interface — Telegram — is something she already uses. The AI layer is invisible.
4. The Big Day: March 26, 2026
This was the day the system went from "a useful tool" to "an actual assistant." Three major things happened in a single day.
4a. Real-Time Streaming
The original webhook listener called proc.communicate() — it waited for Claude to finish, then sent the complete response. For long tasks this meant 30-60 seconds of silence before anything appeared on Telegram. That feels broken.
I rewrote the response handling to read Claude's stdout line by line using --output-format stream-json. As Claude produces output, the listener accumulates it and edits the Telegram message every 3 seconds. You see the response building in real time.
For responses over 3,800 characters (Telegram's limit), the system automatically posts to Telegraph — Telegram's instant article platform — and sends a link. The message body becomes a preview with "Read full response →".
Tool activity is also shown inline during streaming. When Claude runs a bash command or reads a file, you see a status line like [running: bash] or [reading: routes/web.php] before the response text appears. You know it's working, not frozen.
4b. The 7-Phase Humanlike Upgrade
I wanted Khadam to feel like messaging a colleague, not a chatbot. That meant it needed to be topic-aware, context-aware, and smart about which model to use. I rebuilt the core of the webhook listener around seven new capabilities:
1. Batch window. A 1.5-second queue collects rapid-fire messages and combines them into a single Claude call. If you send three quick messages in a row, they arrive as one coherent prompt.
2. Complexity classification. Before spawning Claude, a pure-Python classify_complexity() function reads the message and routes it to the right model:
- Haiku — short acknowledgements ("ok", "thanks"),
/help,/sessions, simple status pings - Sonnet — questions, most slash commands, anything conversational (the default)
- Opus —
/fix,/deploy,/review, file attachments, long action requests
The classifier is pure Python — no AI call — so routing adds zero latency. Escalation-only: if Sonnet hits a complexity it can't handle, it escalates to Opus. It never downgrades mid-session. Budget caps enforce cost discipline: Haiku at $0.10, Sonnet at $0.50, Opus at $1.00 per session.
3. FTS5 full-text search. The Brain SQLite database got an FTS5 virtual table indexing session summaries, decisions, next steps, tags, and project names. Three sync triggers keep the FTS index current on every insert, update, and delete.
Before spawning Claude, the webhook listener runs an FTS5 query against relevant keywords from the incoming message and injects matching past sessions into the system prompt. This is the pre-fetch layer — Claude has context before it even starts responding.
4. Self-initiated search. Claude also has access to brain.search_sessions() as an MCP tool it can call during a conversation. If the topic shifts or the pre-fetch missed something, Claude can search its own history. It will say things like "we talked about this on March 20 — you decided to keep the audit controller separate, building on that..."
5. Topic tracking. Every Claude response ends with a metadata block:
<!--khadam-meta
topic: okhalal
subtopic: IHA audit module
-->
The webhook listener strips this before sending to Telegram and saves it to the chat_messages table. Over time this builds a map of what we talked about, which topics recur, and which projects are getting attention.
6. Session boundary management. A 10-minute gap triggers automatic session close, summary, and save to both Brain and Obsidian. Stale sessions (where --resume fails because the Claude process is gone) auto-recover: clear the stored session ID and retry once with a fresh session.
7. Obsidian sync. Every session summary also saves to my Obsidian vault on the server. Brain is the search index; Obsidian is the human-readable archive. The same content lives in both places, formatted appropriately for each.
4c. Removing n8n Entirely
n8n was running as a Docker container and handling six workflows: Telegram bot routing, skincare reminders, news digest, issue checking, web scraping, and general automation. It worked, but it consumed 300-500MB of RAM just to exist and added a middle layer I didn't need.
I migrated all six workflows into the Python webhook listener and cron jobs in a single session. The key insight was that n8n's Telegram webhook URL path was already /webhook/telegram-bot — I added that same endpoint to the listener and just changed nginx's proxy_pass from port 5678 (n8n) to port 5679 (listener). Zero DNS changes, zero Telegram reconfiguration.
The news digest became a standalone Python script (khadam-news-digest.py) triggered by cron at 8:05am MYT. It queries SearXNG with four searches — Malaysia news, Malaysia tech/business, world news, AI/tech — then passes the results to Claude for a curated briefing delivered to Telegram. The whole thing runs without n8n, without an extra API key, using the same Claude Code webhook already running.
The issue checker cron went from hourly to every 5 minutes to match n8n's previous frequency. Skincare reminders were already handled by separate crons. After the migration, I stopped the n8n container. The server got 400MB back and one less moving part to maintain.
5. The Auto Issue Fixer
My teammate Zal files bugs and feature requests as GitHub issues on OKHalal. Before Khadam, the workflow was: Zal files issue → I eventually see it → I fix it when I get around to it.
The new workflow:
- Zal files issue on GitHub
- Cron (every 5 min) checks for new issues via GitHub API
- Telegram message sent: "New issue #N — working on it..."
- Claude Code reads the issue, identifies relevant files, implements a minimal fix
- Creates branch
fix/issue-N, commits withFix #N: description - Pushes and creates a PR via
gh pr create - Telegram message sent: "PR #M ready — please review and deploy"
I review the PR manually and deploy if it looks good. Khadam never auto-deploys. That's deliberate — pushing code is cheap, deploying has blast radius. The human stays in the loop at the deploy step.
The prompt that drives the fixer is careful: read the issue thoroughly, look at the existing code first, make minimal changes, follow the project's existing patterns, run php artisan test if relevant. The goal is a fix that's good enough to merge, not a refactor.
6. The Full Architecture
Here's how the whole system looks now:
Telegram message
↓
Webhook listener (Python, port 5679)
↓
classify_complexity() → Haiku / Sonnet / Opus
↓
FTS5 search: inject relevant past sessions into prompt
↓
Claude Code (--resume SESSION_ID or fresh)
↓
Stream stdout line-by-line
↓
Edit Telegram message every 3s
(or post to Telegraph if > 3800 chars)
↓
Strip <!--khadam-meta--> blocks
↓
Save topic metadata to DB
On 10-minute gap or session close:
→ Claude summarizes session
→ Save to Brain (SQLite + FTS5)
→ Save to Obsidian vault
Parallel systems running alongside this:
- Daily digest — cron at 8:05am MYT, SearXNG → Claude → Telegram
- Health watchdog — every 15 minutes, checks disk/memory/containers, alerts only on state changes (not repeated spam)
- Issue checker — every 5 minutes, GitHub API, triggers issue fixer on new issues
- Skincare reminders — 8am daily cron for Dyba, streak tracking
- Session auto-save — Claude Code Stop hook saves every Claude Code session to Brain + Obsidian
7. What It's Actually Like to Use
On a typical morning, the 8am digest arrives unprompted: five curated news stories with commentary, server health summary, any overnight issues or PRs. I read it over breakfast.
During the day I message Khadam like I'd message a developer on the team. "What's the status of the IHA audit module?" — it searches its session history, checks recent commits, and gives me a summary. "Zal reported the mobile slider isn't responsive on the landing page" — it reads the issue, looks at the relevant Vue component, and within a few minutes sends back a PR link.
When I'm deep in a session and need something from three conversations ago, I don't have to describe the context again. Khadam already pulled it from Brain. It might say "we worked on this on March 20, you decided to keep the audit controller separate because of the permission scoping — building on that..."
The difference between this and using the Claude app is the same difference between messaging a colleague who knows your codebase and cold-calling a consultant for every question. The knowledge accumulates. The context persists. The actions happen.
8. What I Would Do Differently
Start with the webhook listener, not n8n
n8n was a fine starting point but I should have moved to pure Python earlier. n8n adds indirection, eats RAM, and creates a dependency for what is ultimately just HTTP routing and cron scheduling. Python + systemd + cron is simpler, more debuggable, and costs nothing.
Design for Telegram formatting from day one
Telegram uses its own markup: *bold*, _italic_, `code`. Standard markdown — headers, dashes, asterisk lists — doesn't render correctly. Every response that used markdown headers looked broken in Telegram. I should have set the formatting constraints upfront in the system prompt instead of correcting it repeatedly.
Add FTS5 earlier
For the first month, the brain search used basic keyword matching against session summaries. It worked but missed context that was in decisions, tags, or next-steps fields. FTS5 with full indexing is not hard to add and the improvement in recall quality was immediate. I should have built it that way from the start.
Never restart the service that's serving you
Khadam runs as a systemd service (okhalal-webhook.service). Claude Code runs inside that process. If I restart the service, I kill my own response — the session is gone, the Telegram message never arrives. I learned this the hard way and it's now a hard rule: never restart the webhook service from within a Claude session. Restart it manually from the server if needed.
9. The Objections
"Why not just use the Claude app?" Because the Claude app doesn't know my codebase, doesn't watch my server, doesn't fix my GitHub issues while I sleep, doesn't deliver my morning briefing, and doesn't remember anything between sessions. The Claude app is a general-purpose chat interface. Khadam is a system that acts on my behalf in my specific context.
"Isn't this reinventing the wheel?" Partially. The memory system, the webhook listener, the model routing, the auto issue fixer — those required real engineering. The underlying intelligence is Claude. I didn't build the AI. I built the infrastructure that makes the AI useful for my specific situation. That's not reinventing the wheel; that's fitting a wheel to your particular vehicle.
"Isn't this expensive?" The model routing is the main cost control. Haiku handles simple messages at near-zero cost. Sonnet handles most conversations. Opus is reserved for actions. Budget caps on each tier prevent runaway spend. The daily cost for a full day of active use — including the digest, watchdog, and several coding sessions — is well within a reasonable solo-founder infrastructure budget.
10. What's Next
The scripts are public at github.com/nurikhwanidris/khadam. Secrets are gitignored. The architecture diagram is at draw.nurikhwanidris.my.
The next things I'm thinking about: voice message support (Telegram allows it, the listener doesn't process it yet), smarter watchdog alerting with trend detection instead of just state changes, and better cost reporting so I can see exactly where each day's spend went.
The bigger picture goal hasn't changed since February 25: an AI that runs with me, not just for me. Something that accumulates context, takes initiative on small decisions, and escalates the important ones to me. Not a chatbot. A collaborator with memory and tools and a permanent address on my server.
Khadam is Malay for "servant" — someone who takes care of things so you can focus on what matters. That's what I built. It's doing exactly that.