Abstract (30 sec). Retrieval‑augmented generation shines when answers cite evidence you can audit. Skip magic prompts; use small, testable patterns that tie claims to sources and fail honestly when evidence is thin.
Overview (2 min)
- **Goal.** Useful, defensible answers: grounded text + citations + uncertainty when needed. - **Method.** Treat RAG as a *pipeline*: normalize → retrieve → compose → verify → cite. - **Discipline.** Evaluate faithfulness, coverage, and *answerability*—then tune retrieval *before* clever prompting.1) Anti‑patterns to avoid
- “Prompt harder.” Masking poor retrieval with verbose prompts.
- “Vector monoculture.” One embedding space for everything.
- “Citation cosplay.” Dumping links that weren’t actually used to craft the answer.
2) Three small patterns that carry far
A) Attribution‑First Compose
Compose from snippets you’ve actually retrieved; attach cites at the sentence level.
- Retrieve kpassages with score + source.
- Compose answer only from those passages.
- Emit a claims→evidence map.
Why: prevents unsupported statements; reviewers can audit fast.
B) Query Routing (Cheap & Cheerful)
Route queries to the right retriever/index before embedding.
- Simple router: keyword heuristics + intent classifier.
- Separate indexes for policy, product, code, etc.
- Fall back to a general index if confidence is low.
Why: improves recall without inflating context windows.
C) Hybrid Retrieval with “Must‑Include” Filters
Blend sparse + dense; enforce filters like date, author, or section.
- BM25 (sparse) + embeddings (dense) with weighted merge.
- Hard filters: date > last_policy_updateordoc_type: FAQ.
- Deduplicate near‑identical chunks pre‑compose.
Why: fewer near‑duplicate distractors; fresher, on‑topic evidence.
3) Guardrails & evaluation you can explain
Minimal eval set (start here)
- 25–50 real queries with gold citations.
- Label: faithfulness (0/1), coverage (0–2), answerable? (Y/N).
- Track retrieval metrics (recall@k, MRR) and answer metrics (exact‑cite rate).
CI‑style checks
- Regression on recall@k when indexes change.
- Drift alarms when answers cite stale sources.
- Red‑team prompts for jailbreaking and over‑claiming.
4) Production notes
- Chunking: align to semantic units (sections), store char offsets for precise cites.
- Metadata: source, version, updated_at → enables “freshness” filters.
- Transparency UI: show citations inline; let users expand the passage.
- Honest fallback: if coverage < threshold → “Insufficient evidence” with suggested queries.
5) A reference flow (print + pin on a wall)
User Q
  → Normalize (lowercase, strip boilerplate, detect intent)
  → Route (policy|product|code|general)
  → Retrieve (hybrid; filters; top-k)
  → Compose (from retrieved; sentence-level cites)
  → Verify (faithfulness; coverage; answerable?)
  → Output (answer + cites + confidence + follow-ups)
Downloadables (optional)
- Starter eval sheet (CSV)
- Query router rules (YAML)
- Claims→Evidence template (MD)
CTA. Want the eval starter kit? Email me; I’ll share a lean, no‑nonsense pack.