AI with Michal

RAG (retrieval-augmented generation)

A pattern where the model answers after retrieving relevant chunks from your documents, CRM notes, or knowledge base, instead of relying only on weights baked in at training time.

Michal Juhas · Last reviewed May 2, 2026

What is RAG (retrieval-augmented generation)?

RAG means the AI pulls short passages from your own files before it writes an answer, instead of guessing from memory alone. You get answers that lean on your playbooks and policies, which is easier to check than a free-form essay.

Illustration: Internal handbook snippets feeding an AI answer with small source chips attached

In practice

  • Internal chatbots often say answers come from your help center or handbook; behind the scenes that is the same RAG idea recruiters meet when TA says "only quote our policy PDF." Engineers may shorten it to "it needs retrieval" when answers sound generic.
  • When your team uploads approved interview questions into a shared folder and tells the assistant to read there first, that is the spirit of RAG in everyday language, even if nobody says the acronym in kickoff.
  • Vendor demos talk about "grounding in your content" when they show a sidebar of source snippets next to the reply. That is the moment recruiters recognize the pattern without caring about embeddings.

Quick read, then how hiring teams use it

This is for recruiters, sourcers, TA, and HR partners who need the same vocabulary in debriefs, vendor calls, and policy reviews. Skim the first section when you need a fast shared picture. Use the second when you are deciding how it shows up in the ATS, sourcing tools, or candidate communications.

Plain-language summary

  • What it means for you: RAG means "read our files first, then answer," so the assistant is not guessing from memory alone.
  • How you would use it: You put the policy PDF where the bot is told to look, you ask the question, you check the cited snippet.
  • How to get started: Upload three short Markdown pages, ask the same question twice after you edit a page, watch the answer change.
  • When it is a good time: When answers sound generic or when legal asks where a sentence came from.

When you are running live reqs and tools

  • What it means for you: Retrieval-augmented generation is retrieve-then-read: chunk sources, embed or index, fetch top-k, condition the model, cite. It competes with giant chat threads and with fine-tuning.
  • When it is a good time: When Markdown for AI packs replace "paste the whole drive," and when ownership of stale docs is explicit.
  • How to use it: Measure hit rate on internal eval questions, monitor embedding drift when vendors change models, and keep human editors.
  • How to get started: Read r/Rag "how do I learn" threads, prototype on one handbook, then decide if you need a vector DB.
  • What to watch for: Confident citations to outdated comp bands, and "RAG" slides without deletion policy for old PDFs.

Where we talk about this

Recruiting and sourcing sessions keep praising Markdown knowledge bases and project folders over one long chat thread. That is RAG culture even before you buy vectors. We rehearse it in Workshops.

Around the web (opinions and rabbit holes)

Third-party creators move fast. Treat these as starting points, not endorsements, and double-check anything before you wire candidate data.

YouTube

Reddit

Quora

Long chat versus RAG

PatternStrengthWeakness
Long thread memoryConvenientDrift, hard audit
RAG from filesGrounded, portableNeeds curation
HybridBest of bothMore moving parts

Related on this site

Frequently asked questions

What problem does RAG solve for TA teams?
It grounds answers in text you already approved: employer brand lines, interview rubrics, relocation summaries, and internal FAQs. That cuts generic "AI slop" and gives reviewers a path from a sentence back to a source chunk, which matters when hiring managers ask "where did that number come from" in a debrief. RAG is not magic; garbage retrieval still produces confident wrong answers, so you invest in file hygiene, owners, and deletion rules the same way you would for a wiki. Pair with Markdown for AI so diffs stay readable when legal requests a change log.
Is a folder of Markdown files RAG?
It can be the knowledge half. Full RAG still needs retrieval (search, embeddings, or hand-picked links per question) plus a prompt that forces cite-or-quote behavior and a human who retires stale files. Many workshop setups start with organized Markdown in a project or repo before buying vectors, because curation beats embedding math early on. If filenames still say "final_FINAL_v3," retrieval will confidently cite the wrong era. Treat the folder like a product surface with owners and review dates, not a junk drawer.
How is RAG different from pasting a long PDF into chat?
Blind paste burns LLM tokens, buries the instructions you actually care about, and mixes multiple policy versions in one blob. RAG selects smaller slices per question so the model sees relevant paragraphs and leaves headroom for user context. Product decision, not model pick: chunk boundaries, table handling, and languages all change quality. You still need humans to confirm that the retrieved snippet is the current policy, especially after comp or visa rules change mid-quarter. Log which document version supplied each answer so audits do not depend on chat history scrolling skills.
What are common RAG failure modes in recruiting?
Stale PDFs, half-tables split across chunks, mixed languages with English-only embeddings, and PII sitting in files that should never hit a vendor. Retrieval can also return an older policy version if titles are ambiguous. Live sessions add organizational failure: nobody owns deletes, so assistants quote 2019 guidance with confidence. Run quarterly audits tied to reqs, log which corpus version answered each thread, and add an escalation path when confidence is low. Hallucination checks still apply after retrieval. Train coordinators to spot "right file, wrong section" merges before they reach candidates.
Does RAG remove the need for verification?
No. Models misread tables, merge two similar clauses, or quote the right file with the wrong interpretation. Keep verify-before-send for candidate-facing text and spot-check internal summaries until error rates flatten. Pair RAG with habits from the hallucination entry, especially for numbers, URLs, and eligibility statements. If leadership wants "zero humans," push back with audit requirements: someone still owns the corpus and the incident log when a candidate receives wrong guidance. Publish who reviews low-confidence retrievals so accountability does not vanish behind a "grounded" badge.
Where should we start without engineers?
Curate ten canonical Markdown files: tone, outreach patterns, intake questions, scorecard anchors, and booking links. Link them from a Gem or project instructions and rehearse weekly updates as a five-minute stand-up item. Read How to use AI in recruiting while you build the library so prompts match your governance story. When files stabilize and search pain appears, then evaluate vector search with IT instead of starting there by default. Capture "top ten questions hiring managers asked last month" and verify each answer manually before you trust retrieval to do it alone.
When is full vector search worth it?
When the corpus is too large to hand-pick chunks per req, when multiple teams need the same knowledge with fast refresh, or when duplicate near-copies make keyword search brittle. Until then, disciplined folders plus literal search inside a slice often ship faster with less vendor surface. Cost, latency, and embedding drift when providers change models are real operational taxes. Pilot with an evaluation set of twenty real questions hiring managers asked last quarter; if manual retrieval already fails, vectors might earn their keep.

← Back to AI glossary in practice