How I Built a Personal AI That Knows My Entire Business
I built an AI that can represent me accurately on my personal site — not with RAG or vector search, but with a simpler pattern that works better at this scale. Here's the architecture, the tradeoffs, and what building a real knowledge base about yourself actually involves.
There's a version of "personal AI" that most people imagine: you upload your documents, it searches them with vectors, and you can ask questions. That's a valid architecture. It's also more complex than necessary for what I actually wanted.
What I wanted was an AI that could represent me accurately on my personal site — not as a chatbot with canned responses, but as a version of me that speaks from genuine knowledge of how I think, what I've built, and what I actually believe. The simpler approach turned out to work better.
The Problem With Generic AI on a Personal Site#
If you put a generic AI assistant on your personal site — even a good one — it will confidently fabricate details about you based on cultural inference, patterns from its training data, or whatever sounds plausible. Ask it where you grew up and it might guess correctly. Ask it about your approach to pricing a freelance project and it'll give you generic advice about freelancing rather than how you specifically think about it.
This is the core problem. A model that doesn't know you specifically cannot represent you specifically. And "doesn't know" plus "confidently answers anyway" is worse than just not having a chatbot at all.
The solution isn't a smarter model. It's a model that only knows what you've told it.
The Architecture: Full-Context Injection#
My implementation is deliberately simple. There's no vector search, no embeddings, no chunking strategy. Instead, it works like this:
- A Supabase table —
personal_ai_entries— stores Q&A pairs. Each row is a question and my answer in my own words. - When a visitor opens the chat, the client fetches all entries from a public API route.
- Those entries are injected wholesale into the system prompt before any conversation happens.
- The model (Claude via OpenRouter) responds strictly from that context — explicitly instructed not to fill gaps with inference.
The system prompt makes the constraint explicit:
function buildSystemPrompt(entries: KBEntry[]): string {
const kb = entries.map((e) => `Q: ${e.question}\nA: ${e.answer}`).join("\n\n");
return `You are Wonsuk Choi, responding to a visitor on your personal website.
CRITICAL: The knowledgebase below is the ONLY source of truth about Wonsuk.
Do NOT use your training data, assumptions, or cultural inference to fill in gaps.
If the knowledgebase doesn't cover something, say "I haven't shared that yet."
KNOWLEDGEBASE (Wonsuk's own words — treat this as ground truth):
${kb}`;
}
The model gets one instruction it has to follow above everything else: if it's not in the knowledge base, say so. This prevents hallucination better than any prompt engineering trick I've tried, because it gives the model a clear fallback instead of leaving it to improvise.
Why Not RAG?#
RAG (retrieval-augmented generation) is the standard approach for large knowledge bases — you embed documents, store vectors in a database, and at query time you retrieve only the relevant chunks to fit in the context window.
For a personal knowledge base, this creates problems:
Retrieval misses matter more. If the model retrieves the wrong chunks or fails to retrieve a relevant one, it still has to answer. Without the right context, it'll either hallucinate or give a generic response — the exact failure mode I was trying to avoid. With full-context injection, either the information is there or it isn't.
Chunk boundaries lose voice. Splitting your own writing into 500-token chunks strips out the continuity that makes it sound like you. The model sees fragments rather than complete thoughts, and the responses feel assembled rather than genuine.
Latency and cost are acceptable at this scale. A personal knowledge base of 50–100 Q&A pairs fits comfortably in a context window and costs fractions of a cent per conversation. The complexity of vector search isn't justified until the knowledge base is much larger.
If the knowledge base grew to thousands of entries, I'd switch to RAG. At the scale of a personal site, full injection is simpler and more reliable.
The Supabase Schema#
create table public.personal_ai_entries (
id uuid primary key default gen_random_uuid(),
question text not null,
answer text not null,
created_at timestamptz not null default now()
);
Public read, authenticated write. The entries are fetched on the client side and passed to the chat API alongside the conversation history.
The API Route#
The chat endpoint receives the messages and the preloaded knowledge base from the client:
export async function POST(req: NextRequest) {
const { messages, knowledgebase } = await req.json();
const systemPrompt = buildSystemPrompt(knowledgebase ?? []);
const stream = await streamOpenRouter({
messages: [{ role: "system", content: systemPrompt }, ...messages],
model: "anthropic/claude-sonnet-4-5",
});
return new Response(stream, {
headers: { "Content-Type": "text/event-stream" },
});
}
Streaming keeps the response feeling fast. The client handles the SSE stream and renders incrementally.
Building the Knowledge Base#
The technical setup takes a few hours. Building the knowledge base is the actual work.
I approached it like a structured interview with myself. The categories I covered: how I work, what I've built and why, how I think about pricing and clients, what I believe about product development, things I've changed my mind about, what a typical week looks like.
The entries that produce the best chat responses are the ones where I resisted the urge to sound good and just described things accurately. "I prefer working alone because collaboration overhead slows me down" is more useful than a diplomatic version that hedges. The model mirrors your voice — if you're vague, it's vague.
The knowledge base also tells you something about yourself. Writing 80 honest Q&A pairs about how you think and work is a reasonably useful exercise independent of the AI application.
What the "Business" Part Means#
Beyond the personal site chatbot, I extended the same pattern to my dashboard. The AI there has access to live context — pending todos, site statuses, recent journal entries, subscriber counts, active subscriptions across all seven products — injected alongside the same personal knowledge base.
This creates something qualitatively different from a standard AI assistant: it knows the current state of everything I'm running, and it knows how I think about those things from the knowledge base. The combination is what makes it feel like talking to someone who understands the situation rather than explaining it from scratch every time.
The architecture for that is the same: system prompt injection, no vectors, just context. The live data comes from the same Supabase tables the dashboard reads from.
The Honest Limitation#
This only works as well as the knowledge base you build. A sparse or vague knowledge base produces a sparse and vague AI. The model can't add specificity it doesn't have.
The other limitation: it's static in time. If my views on something change, the knowledge base needs to be updated manually. There's no mechanism for it to learn from conversations or update its own entries. For a personal site that I control, that's a feature — I decide what's in it. But it means maintenance is ongoing.
Freelance
Butuh bantuan soal ini?
Saya membantu migrasi, produk baru, dan perbaikan performa web.
Hubungi →