Building a Knowledge Base Chat with Supabase and Claude
A complete walkthrough of building a RAG-based knowledge base chat: pgvector schema, embedding models, the retrieval function, chunking strategy, and the Claude prompt pattern — including the gotchas that cause silent failures.
The basic idea is simple: store your documents in Supabase, embed them as vectors, search for the relevant chunks when a user asks a question, and pass those chunks to Claude to generate an answer. The implementation has a handful of non-obvious decisions that determine whether the system actually works in production.
This is the full setup — schema, embeddings, retrieval function, and the Claude prompt pattern — with the gotchas that cost the most time.
When to Use This vs. Full-Context Injection#
If your knowledge base is small and curated — say, under 100 documents — you can often skip vector search entirely and inject all of it directly into the system prompt. Simpler, no embedding costs, and retrieval can't fail because there's no retrieval step.
Vector search becomes worth it when the knowledge base is large enough that injecting everything would overflow the context window or produce worse answers from noise. The crossover is roughly when your documents stop fitting comfortably in 50–100k tokens. Below that threshold, consider whether RAG complexity is actually necessary.
This post covers the RAG approach for when you genuinely need it.
Schema Setup#
Enable pgvector and create your documents table. Use halfvec instead of vector — it stores embeddings as 16-bit floats rather than 32-bit, cutting storage in half with negligible quality loss, and it unlocks HNSW indexing for models with more than 2000 dimensions.
create extension if not exists vector with schema extensions;
create table documents (
id bigint primary key generated always as identity,
title text not null,
content text not null,
embedding extensions.halfvec(1536),
metadata jsonb,
created_at timestamptz default now()
);
create index on documents
using hnsw (embedding extensions.halfvec_cosine_ops);
The HNSW index is the right default — unlike IVFFlat, it updates itself on write so you don't need to rebuild the index as you add documents. IVFFlat requires the table to have data before you create the index (its clusters are derived from the existing data distribution); build it on an empty table and the index is useless.
Freelance
Precisa de ajuda com isso?
Posso ajudar com migrações, novos produtos e performance web.
Entrar em contato →