30 de abril de 2026·6 min de leitura·Atualizado 15 de mai. de 2026

Building a Knowledge Base Chat with Supabase and Claude

A complete walkthrough of building a RAG-based knowledge base chat: pgvector schema, embedding models, the retrieval function, chunking strategy, and the Claude prompt pattern — including the gotchas that cause silent failures.

#supabase #claude #rag #pgvector #ai #knowledge-base

The basic idea is simple: store your documents in Supabase, embed them as vectors, search for the relevant chunks when a user asks a question, and pass those chunks to Claude to generate an answer. The implementation has a handful of non-obvious decisions that determine whether the system actually works in production.

This is the full setup — schema, embeddings, retrieval function, and the Claude prompt pattern — with the gotchas that cost the most time.

When to Use This vs. Full-Context Injection#

If your knowledge base is small and curated — say, under 100 documents — you can often skip vector search entirely and inject all of it directly into the system prompt. Simpler, no embedding costs, and retrieval can't fail because there's no retrieval step.

Vector search becomes worth it when the knowledge base is large enough that injecting everything would overflow the context window or produce worse answers from noise. The crossover is roughly when your documents stop fitting comfortably in 50–100k tokens. Below that threshold, consider whether RAG complexity is actually necessary.

This post covers the RAG approach for when you genuinely need it.

Schema Setup#

Enable pgvector and create your documents table. Use halfvec instead of vector — it stores embeddings as 16-bit floats rather than 32-bit, cutting storage in half with negligible quality loss, and it unlocks HNSW indexing for models with more than 2000 dimensions.

create extension if not exists vector with schema extensions;

create table documents (
  id bigint primary key generated always as identity,
  title text not null,
  content text not null,
  embedding extensions.halfvec(1536),
  metadata jsonb,
  created_at timestamptz default now()
);

create index on documents
  using hnsw (embedding extensions.halfvec_cosine_ops);

The HNSW index is the right default — unlike IVFFlat, it updates itself on write so you don't need to rebuild the index as you add documents. IVFFlat requires the table to have data before you create the index (its clusters are derived from the existing data distribution); build it on an empty table and the index is useless.

Autor

Wonsuk Choi

Wonsuk Choi é desenvolvedor full stack e AI builder focado em crescimento prático.

X GitHub LinkedIn Posts relacionados →

CompartilharX LinkedIn

Posts relacionados

Supabase Multi-Tenancy in Production: Schema, RLS, and the Patterns That Actually Work

Running multi-tenant apps on Supabase is mostly a schema and RLS problem — until it isn't. Here are the patterns I use across production apps: tenant isolation, JWT claims, request-time config, middleware, and the gotchas that cost the most time.

Supabase RLS in Production: The Patterns That Actually Work

Row Level Security is powerful but full of silent failures and performance traps. Here are the patterns I use across seven production apps — and the gotchas that cost me the most time.

Building a Multi-Tenant SaaS on a Single Supabase Database

How I run seven branded sites — one Next.js codebase, one Supabase project. A practical walkthrough of hostname resolution, feature flags, data isolation with RLS, and tenant-aware routing.

Freelance

Precisa de ajuda com isso?

Posso ajudar com migrações, novos produtos e performance web.

Entrar em contato →

Building a Knowledge Base Chat with Supabase and Claude

When to Use This vs. Full-Context Injection#

Schema Setup#

Posts relacionados

Precisa de ajuda com isso?

Choosing an Embedding Model#

The Retrieval Function#

Chunking Documents#

Writing to the Database#

Querying and Calling Claude#

The Gotchas That Cost Time#