Giving Your Codebase a Voice and a Story
How I taught my AI operating system to explain itself — with deep technical knowledge, narrative memory, and an actual voice
The Missing Piece
Two days ago, I published "Building an AI-Human Operating System v2" — the story of HYDRA, a hybrid system that combines 23 shell scripts with multi-agent AI coordination. MILO, the coordinator agent, could route tasks, generate standups, and respond to natural language commands via Telegram.
But when I sent MILO a message asking "how does the Director/Builder pattern work?" — the response was:
"Hmm, not sure what you mean. Try asking about status, tasks, or say 'help' to see what I can do."
MILO could manage the system but couldn't explain the system. It knew how to dispatch work but had no idea why anything was built the way it was. No architectural knowledge. No history. No voice.
That's the gap between a task router and a CTO.
The Problem: Institutional Knowledge Lives in Your Head
Every codebase has two kinds of knowledge:
- -How it works — Architecture, patterns, decisions, schemas, trade-offs
- -Why it exists — The journey, the pivots, the failures that led to the current design
Both types usually live in one place: the developer's head. Maybe some of it leaks into README files or Notion docs that nobody reads. But the deep stuff — why did we choose SQLite over Postgres for this? or what happened in January that changed the architecture? — that lives nowhere but memory.
When someone asks a question — a new team member, a collaborator, a future version of yourself — you either answer from memory or you don't answer at all.
What if your system could answer for itself?
The Architecture: Two Knowledge Bases, One Brain
The solution was surprisingly simple. No vector database. No RAG pipeline. No embeddings. Just two markdown files and a language model.
The Technical Brain
TECHNICAL_BRAIN.md — 650+ lines of structured knowledge about how everything works:
- -Complete architecture of every system (HYDRA, Homer, Pause, DeepStack)
- -Database schemas with field-by-field explanations
- -API patterns and integration points
- -The Director/Builder workflow (how Claude and Codex collaborate)
- -The ID8Pipeline (11 stages of product development)
- -Decision rationale ("Why SQLite over Postgres?", "Why Telegram over Slack?")
- -FAQ with common questions and detailed answers
The Journey Document
JOURNEY.md — A narrative history compiled from four independent sources:
- -Claude.ai memory export (38KB of accumulated context about the developer's background, projects, and evolution)
- -Claude Code memory graph (entity relationships and observations)
- -Git histories across four repositories (exact timestamps of what shipped when)
- -Published articles (the developer's own narrative voice)
The result reads like a biography — from 20 years of television production to founding an AI company, from the first product to the current multi-agent operating system. Every product, every pivot, every lesson learned.
Why Not RAG?
For small knowledge bases (under ~30 documents), context stuffing beats retrieval augmented generation every time:
- -No infrastructure — No vector database, no embedding pipeline, no similarity search
- -Full context — The model sees everything, not just the top-k retrieved chunks
- -No relevance errors — RAG can miss relevant context if the embedding doesn't match. Context stuffing can't miss what it already has.
- -Zero latency — No retrieval step. The knowledge is already in the prompt.
Both documents combined are ~1,000 lines — roughly 23K tokens as a system prompt. Well within Claude Sonnet's context window. The day these documents exceed the window, I'll switch to RAG. Until then, simplicity wins.
The Natural Language Layer: Teaching a Bot to Understand Intent
HYDRA already had a Telegram bot. But it only understood rigid commands: status, tasks @forge, approve 12ab. To ask technical questions, I needed natural language understanding.
Claude Haiku as Intent Parser
The first layer is Claude Haiku — Anthropic's fastest, cheapest model — acting as an intent classifier. Every incoming message goes through Haiku with a system prompt that lists all available command types:
You are a command parser for HYDRA. Parse the user message into a structured command.
Available commands:
- status: System overview
- tasks: List tasks (optional agent filter)
- ask: Technical questions about architecture, code, processes, the journey...
- costs: Show spending
- ...
Respond with ONLY valid JSON:
{"type": "command_name", "args": ["..."], "confidence": "high|medium|low"}
Cost per parse: ~$0.00016. Latency: ~300ms. Accuracy: dramatically better than regex.
Rigid Parser Fallback
If Haiku is unavailable (rate limit, API issue), a bash regex parser catches common patterns:
# Journey/story questions
if [[ "$NORMALIZED" =~ (journey|story|timeline|history|milestone) ]]; then
echo '{"type": "ask", ...}'
fi
Two layers. The smart one handles nuance ("yo what have we been building lately?"). The rigid one handles keywords. Between them, nothing gets lost.
The Voice: From Text to Sound
MILO had a text personality — "chaotic gremlin + mentor mode." But personality without a voice is just words on a screen.
The Pipeline
User asks question via Telegram
-> Haiku classifies intent as "ask"
-> "Thinking..." message sent immediately
-> Sonnet generates answer (both knowledge bases as context)
-> HTML-formatted text response sent to Telegram
-> [async] ElevenLabs generates voice audio
-> [async] ffmpeg converts MP3 to OGG/Opus
-> [async] Telegram voice note delivered
The key design decision: voice is async. The text answer arrives instantly. The voice note follows 3-5 seconds later in the background. You're never waiting for audio generation to read the answer.
Text Cleanup for Speech
Markdown that looks great in Telegram sounds terrible spoken aloud. Before sending text to the TTS API, a cleanup function strips formatting:
# Remove code blocks, bold markers, headers, tables, horizontal rules
text = re.sub(r"```.*?```", "", text, flags=re.DOTALL)
text = re.sub(r"\*\*(.+?)\*\*", r"\1", text)
text = re.sub(r"^#{1,3}\s+", "", text, flags=re.MULTILINE)
text = re.sub(r"\|[^\n]+\|", "", text) # tables
Without this, your CTO literally says "asterisk asterisk bold text asterisk asterisk." Not the vibe.
The Voice Cascade
ElevenLabs is the primary voice (a specific persona voice for MILO). If it fails — rate limit, API issue — Deepgram Aura picks up automatically. Same pattern as everywhere in HYDRA: premium primary, free fallback.
if tts_elevenlabs "$text" "$mp3_file"; then
log "TTS via ElevenLabs"
elif tts_deepgram "$text" "$mp3_file"; then
log "TTS via Deepgram Aura (fallback)"
fi
The ffmpeg conversion step transforms the MP3 into OGG/Opus — the specific codec Telegram requires for voice notes:
ffmpeg -i input.mp3 -c:a libopus -b:a 48k -application voip output.ogg
48kbps with VoIP optimization. Small files. Clear voice. Fast delivery.
The Three-File Extension Pattern
Adding any new capability to HYDRA always touches exactly three files:
- -Parser (
telegram-parse-natural.sh) — Add the new intent type and examples so Haiku can classify it - -Handler (
telegram-listener.sh) — Add the function that processes the intent - -Dispatch (
telegram-listener.sh) — Add the case that connects parser output to handler
This session, adding "journey" knowledge required:
- -Parser: Added journey/story/timeline examples to the Haiku system prompt
- -Handler: Extended
ask_cto_brain()to load bothTECHNICAL_BRAIN.mdandJOURNEY.md - -Dispatch: No change needed —
askalready existed, it just got smarter context
Adding voice required:
- -Parser: No change needed — voice is a response feature, not an intent type
- -Handler: Added
text_to_speech(),clean_text_for_speech(),send_voice_note() - -Dispatch: Added async voice generation after text response in the
askcase
The pattern is fractal. Every capability follows the same structure. This makes HYDRA genuinely extensible — not in the "we designed for extensibility" way, but in the "anyone can read these three files and add something" way.
What This Actually Feels Like
Here's the experience now:
I'm walking to get coffee. I pull out my phone and send MILO a voice note: "Hey, how does the amendment workflow work in Homer?"
Three seconds later, a text response appears — formatted with bold headers, code references, and the complete flow from draft creation through approval to execution. The Supabase tables involved. The API route structure. The Zustand state management pattern.
Five seconds after that, a voice note arrives. I tap play and listen to the explanation while I walk.
If I ask "what happened in January?" — MILO pulls from the journey document and tells me about scaffolding the Homer monorepo, building 8 features in a single session, adding the Contract Intelligence Layer with 25 FAR/BAR clauses.
If I ask "why did we choose SQLite for HYDRA?" — MILO draws from the technical brain and explains that agents poll on heartbeats (not real-time), SQLite handles concurrent reads fine, and one less hosted service means one less thing to manage.
The codebase explains itself. Not through documentation that goes stale. Through a living knowledge base that's injected into every answer.
The Cost
Let's break down what this actually costs per CTO brain query:
| Component | Model | Cost per query |
|---|---|---|
| Intent parsing | Claude Haiku | ~$0.00016 |
| CTO answer | Claude Sonnet | ~$0.006 |
| Voice generation | ElevenLabs | ~$0.003 |
| Total | ~$0.009 |
Less than a penny per question. At 10 questions per day, that's $0.09/day or roughly $2.70/month for a CTO that knows your entire architecture and can explain it out loud.
What I Learned
1. Context Stuffing Has a Sweet Spot
Under ~30 documents (or ~50K tokens), just stuff everything into the system prompt. The moment I stopped trying to be clever with retrieval and just gave Sonnet the full knowledge base, the answers became dramatically better. The model can synthesize across documents in ways that RAG often can't.
2. Voice Changes the Relationship
Text responses feel like reading documentation. Voice responses feel like talking to a colleague. The same information, delivered through audio, creates a fundamentally different experience. It turns a tool into a presence.
3. Async Voice is the Right UX
Never make the user wait for audio generation. Send the text immediately, generate voice in the background. The text is the answer. The voice is a bonus.
4. Narrative Knowledge Complements Technical Knowledge
The technical brain answers "how." The journey document answers "why" and "when." Together they give the AI something that feels like understanding rather than just information. When MILO explains a decision, it can reference the context in which that decision was made.
5. The Three-File Pattern Scales
Every new capability I've added to HYDRA follows the same structure: parser, handler, dispatch. This isn't enforced by a framework. It's just the natural shape of the system. Good architecture creates patterns that replicate without documentation.
Try It Yourself
You don't need HYDRA to do this. The pattern is portable:
Step 1: Write your knowledge base. One document for "how" (architecture, decisions, patterns). One for "why" (journey, pivots, lessons). Be comprehensive. Your future self will thank you.
Step 2: Pick a chat interface. Telegram, Slack, Discord — anything with a bot API and voice note support.
Step 3: Wire up intent parsing. Claude Haiku at ~$0.00016/parse is absurdly cheap for natural language understanding. Fall back to regex if you want resilience.
Step 4: Inject your knowledge base as a system prompt. Call Sonnet (or your preferred model) with the full context. No RAG needed until your docs exceed the context window.
Step 5: Add voice. ElevenLabs or Deepgram for TTS. ffmpeg for format conversion. Send as a voice note in the background.
Total infrastructure needed:
- -One Telegram bot (free)
- -One shell script daemon (free)
- -ffmpeg (free)
- -API keys for Claude, ElevenLabs/Deepgram
- -Two markdown files you wrote yourself
Total running cost: Under $3/month at moderate usage.
The hardest part isn't the code. It's writing the knowledge base. But that's the work that matters — because the moment you write down why something exists, you understand it better yourself.
What's Next
The CTO voice shipped on February 7th, two days after HYDRA itself. Here's what's coming:
- -Voice note input -> voice note reply: Full conversational loop. Ask with your voice, get a voice answer. No typing required.
- -Auto-updating brain: Session summaries automatically append to TECHNICAL_BRAIN.md and JOURNEY.md. The knowledge base grows without manual maintenance.
- -Multi-modal context: Screenshots, diagrams, and code snippets as part of the knowledge base. Not just text.
But honestly? The system works right now. I can ask my codebase how it works, why it exists, and what happened last month — and hear the answer in a voice I chose, while walking down the street.
That's not documentation. That's institutional memory with a voice.
Eddie Belaval / @eddiebe / id8Labs February 2026
Built with HYDRA and Claude Code