RAG Without the Theater — Evidence‑Linked Retrieval Patterns You Can Defend

Abstract (30 sec). Retrieval‑augmented generation shines when answers cite evidence you can audit. Skip magic prompts; use small, testable patterns that tie claims to sources and fail honestly when evidence is thin.

Overview (2 min)

- **Goal.** Useful, defensible answers: grounded text + citations + uncertainty when needed. - **Method.** Treat RAG as a *pipeline*: normalize → retrieve → compose → verify → cite. - **Discipline.** Evaluate faithfulness, coverage, and *answerability*—then tune retrieval *before* clever prompting.

1) Anti‑patterns to avoid

“Prompt harder.” Masking poor retrieval with verbose prompts.
“Vector monoculture.” One embedding space for everything.
“Citation cosplay.” Dumping links that weren’t actually used to craft the answer.

2) Three small patterns that carry far

A) Attribution‑First Compose

Compose from snippets you’ve actually retrieved; attach cites at the sentence level.

Retrieve k passages with score + source.
Compose answer only from those passages.
Emit a claims→evidence map.

Why: prevents unsupported statements; reviewers can audit fast.

B) Query Routing (Cheap & Cheerful)

Route queries to the right retriever/index before embedding.

Simple router: keyword heuristics + intent classifier.
Separate indexes for policy, product, code, etc.
Fall back to a general index if confidence is low.

Why: improves recall without inflating context windows.

C) Hybrid Retrieval with “Must‑Include” Filters

Blend sparse + dense; enforce filters like date, author, or section.

BM25 (sparse) + embeddings (dense) with weighted merge.
Hard filters: date > last_policy_update or doc_type: FAQ.
Deduplicate near‑identical chunks pre‑compose.

Why: fewer near‑duplicate distractors; fresher, on‑topic evidence.

3) Guardrails & evaluation you can explain

Minimal eval set (start here)

25–50 real queries with gold citations.
Label: faithfulness (0/1), coverage (0–2), answerable? (Y/N).
Track retrieval metrics (recall@k, MRR) and answer metrics (exact‑cite rate).

CI‑style checks

Regression on recall@k when indexes change.
Drift alarms when answers cite stale sources.
Red‑team prompts for jailbreaking and over‑claiming.

4) Production notes

Chunking: align to semantic units (sections), store char offsets for precise cites.
Metadata: source, version, updated_at → enables “freshness” filters.
Transparency UI: show citations inline; let users expand the passage.
Honest fallback: if coverage < threshold → “Insufficient evidence” with suggested queries.

5) A reference flow (print + pin on a wall)

User Q
  → Normalize (lowercase, strip boilerplate, detect intent)
  → Route (policy|product|code|general)
  → Retrieve (hybrid; filters; top-k)
  → Compose (from retrieved; sentence-level cites)
  → Verify (faithfulness; coverage; answerable?)
  → Output (answer + cites + confidence + follow-ups)

Downloadables (optional)

Starter eval sheet (CSV)
Query router rules (YAML)
Claims→Evidence template (MD)

CTA. Want the eval starter kit? Email me; I’ll share a lean, no‑nonsense pack.