BOOK A CALL →
Case study · RAG search

Cited RAG search across 2M legal documents

Search that answers with citations across two million documents — reranked, redacted, and audit-trailed for a legal environment.

<400ms
p95 search latency
client Legal services firmindustry Legalteam 120 fee earnerstimeline 9 weekscodename Mosaic
01 · Problem

Institutional knowledge, unfindable

Twenty years of precedent lived across document management silos. Finding relevant prior work meant asking senior people or re-doing research the firm had already paid for.

02 · Why it mattered

The cost of leaving it alone

Associates burned billable hours reconstructing work that existed. Worse, inconsistent precedent use created quality risk the partners could feel but not measure.

03 · Architecture

Retrieval with receipts

A retrieval pipeline where every answer carries citations to source documents, redaction rules run before display, and every query is audit-logged.

  • 2.1M documents chunked into 12.4M segments, embedded with voyage-3-large
  • pgvector + HNSW for retrieval, cross-encoder reranking on top
  • Redaction layer strips privileged and client-identifying content by matter walls
  • Every query and result set is audit-trailed for compliance review

Stack: Pinecone · Claude · pgvector · Voyage embeddings

04 · Implementation

How it was built

  • Week 1–3: corpus ingestion, chunking strategy, and embedding pipeline
  • Week 4–5: retrieval quality tuning against a partner-built eval set of 200 questions
  • Week 6–7: redaction rules, matter walls, and audit logging with compliance
  • Week 8–9: rollout to two practice groups, then firm-wide
05 · Results

What the numbers say

p95 latency
380ms
answers cited
100%
eval precision
0.91
06 · After launch

What happened next

Usage settled at ~900 queries a day. The eval set grew into a living quality benchmark: every index update replays it, so retrieval quality is a number, not an opinion.

This system is an example of AI Agents & Internal Assistants work.

$ erick --find-bottleneck 

Need a similar system?

Let's talk through your version of this — same architecture thinking, scoped to your operations and tools.

30 minutes · no pitch deck · reply within 24h if you write instead

Book a call →About AI Agents