Case study · RAG search

Cited RAG search across 2M legal documents

Search that answers with citations across two million documents — reranked, redacted, and audit-trailed for a legal environment.

<400ms

p95 search latency

client Legal services firmindustry Legalteam 120 fee earnerstimeline 9 weekscodename Mosaic

01 · Problem

Institutional knowledge, unfindable

Twenty years of precedent lived across document management silos. Finding relevant prior work meant asking senior people or re-doing research the firm had already paid for.

02 · Why it mattered

The cost of leaving it alone

Associates burned billable hours reconstructing work that existed. Worse, inconsistent precedent use created quality risk the partners could feel but not measure.

03 · Architecture

Retrieval with receipts

A retrieval pipeline where every answer carries citations to source documents, redaction rules run before display, and every query is audit-logged.

2.1M documents chunked into 12.4M segments, embedded with voyage-3-large
pgvector + HNSW for retrieval, cross-encoder reranking on top
Redaction layer strips privileged and client-identifying content by matter walls
Every query and result set is audit-trailed for compliance review

Stack: Pinecone · Claude · pgvector · Voyage embeddings

04 · Implementation

How it was built

Week 1–3: corpus ingestion, chunking strategy, and embedding pipeline
Week 4–5: retrieval quality tuning against a partner-built eval set of 200 questions
Week 6–7: redaction rules, matter walls, and audit logging with compliance
Week 8–9: rollout to two practice groups, then firm-wide

05 · Results

What the numbers say

p95 latency

380ms

answers cited

100%

eval precision

0.91

06 · After launch

What happened next

Usage settled at ~900 queries a day. The eval set grew into a living quality benchmark: every index update replays it, so retrieval quality is a number, not an opinion.

This system is an example of AI Agents & Internal Assistants work.

$ erick --find-bottleneck

Need a similar system?

Let's talk through your version of this — same architecture thinking, scoped to your operations and tools.

30 minutes · no pitch deck · reply within 24h if you write instead

Book a call →About AI Agents