AI with Michal

LLM tokens

The chunks of text models bill and reason over; context windows cap how much instruction, job description, and candidate material fit in one call, which shapes how recruiters package prompts and attachments.

Michal Juhas · Last reviewed May 2, 2026

What are LLM tokens?

A token is a small chunk of text the model reads and bills against, often part of a word or a punctuation mark. Long resumes and huge pastes use more tokens, so short summaries help the model focus and can lower cost.

Illustration: Long inputs metered into small segments feeding a compact prompt for the model

In practice

  • On a ChatGPT invoice or admin screen you see "tokens used this month" next to dollar amounts. Finance forwards that email and asks recruiting "why did usage spike in Q3" when hiring was busy.
  • Trainers warn "do not paste fifty resumes at once" because the app may cut off the bottom of the pile when it hits limits. The UI might only say "message too long," but the limit is counted in tokens.
  • Partner sales decks compare cost "per thousand tokens" when pitching cheaper models to TA tech buyers who mostly care about monthly spend.

Quick read, then how hiring teams use it

This is for recruiters, sourcers, TA, and HR partners who need the same vocabulary in debriefs, vendor calls, and policy reviews. Skim the first section when you need a fast shared picture. Use the second when you are deciding how it shows up in the ATS, sourcing tools, or candidate communications.

Plain-language summary

  • What it means for you: Tokens are how the computer counts text chunks for billing and for "how much fits in one box." Long pastes cost more and can get cut off at the bottom.
  • How you would use it: You summarize before you paste, you delete old chat junk, and you attach only the pages that matter.
  • How to get started: Take one twenty-page PDF, extract three paragraphs your sourcer actually reads, compare model behavior.
  • When it is a good time: When finance forwards a usage spike email or when the UI says "message too long."

When you are running live reqs and tools

  • What it means for you: Tokens meter prompts, tool outputs, and retrieval chunks against a context window. They drive cost, latency, and truncation risk alongside Markdown for AI hygiene.
  • When it is a good time: When you wire workflow automation or bulk resume parsing.
  • How to use it: Pre-compress with headings, tables, and excerpts; keep canonical sources outside the thread for audits.
  • How to get started: Watch OpenAI's tokenizer demo, then set team norms on max paste sizes.
  • What to watch for: Optimizing only for token count and stripping compliance-relevant detail, or assuming "bigger windows" fix hallucination risk.

Where we talk about this

Sourcing automation days talk about tokens when webhooks ship huge JSON blobs into models. AI in recruiting days talk about tokens when intake packs get obese. Bring your worst paste to Workshops.

Around the web (opinions and rabbit holes)

Third-party creators move fast. Treat these as starting points, not endorsements, and double-check anything before you wire candidate data.

YouTube

Reddit

Quora

Rough mental model

Input styleTypical outcome
Lean Markdown SOPPredictable, cheap reruns
Full PDF dumpNoisy parse, higher cost
Chat thread archaeologyImportant lines may truncate

Related on this site

Frequently asked questions

Why should recruiters care about tokens?
Tokens drive cost, latency, and truncation: a twenty-page PDF plus ten Slack threads can crowd out your actual instructions or get silently cut mid-document, which is how subtle factual errors slip through. Finance notices when reqs spike and nobody changed headcount. Teaching teams to summarize in Markdown for AI and to attach curated excerpts usually improves quality per dollar. Add monitoring on automation jobs so sudden token spikes flag a broken loop before invoices arrive. Print a one-page token checklist beside your intake form so coordinators know why a pasted thread is riskier than a three-bullet brief tied to one decision.
Are tokens the same as words?
Roughly correlated but not identical: short common words may pack into one token while rare words, URLs, or code split into several. Vendor UIs show estimates; treat them as directional, not payroll-grade accounting. When comparing models, run the same JD and resume through each tokenizer preview so you are not fooled by formatting differences. Explain this to hiring managers so they stop asking why a "short" JD exploded the budget. Log tokenizer differences when you switch vendors mid-quarter so finance can reconcile invoices without guessing which team ran the spike.
How does this tie to system instructions?
System and user content share the same context budget, so bloated boilerplate steals room for candidate specifics. Teams move stable rules into system instructions and keep each task message short, structured, and scoped to one decision. Revisit length quarterly when marketing updates brand voice or legal adds disclaimers. If your system prompt is longer than your job description, you are probably hiding policy in the wrong place. Split evergreen compliance text from per-req facts so sourcers can reuse packs without duplicating tokens on every paste.
What about images or resumes?
Multimodal inputs carry their own limits, pricing, and parsing quirks; OCR resume text still counts as tokens and can introduce garbage characters. Decide what must be in-model versus what stays in the ATS for human review, especially around hallucination risk on dates and employers. Prefer structured fields your ATS already validated over raw PDF dumps when automation is downstream. Test accessibility paths for candidates uploading scans. When marketing wants cover images analyzed, document consent and retention separately from resume flows so security reviews stay clear.
Does a bigger context window fix everything?
No. Very long contexts can dilute focus, increase cost, and tempt teams to skip curation. Better retrieval plus smaller trusted snippets usually beats "send the whole drive," and RAG hygiene still matters. Automation needs monitoring when windows grow because silent truncation moves further down the file. Teach recruiters that bigger windows are not permission to skip summarizing. Benchmark quality on your longest realistic packet before you promise hiring managers unlimited attachments, because latency and failure rates climb faster than marketing slides admit.
Where can we learn more practically?
Read How to write better AI prompts, tighten Markdown for AI packs, and rehearse packaging in a workshop before you wire high-volume workflow automation. Bring a real "too long" thread and time-box how small you can make it without losing decisions hiring managers care about. After class, pick one automation and cap inputs for thirty days while you measure error rate and cost; publish results internally so teams copy the winning pattern instead of reverting to dumps overnight. If you still see truncation, split into two chained calls with explicit handoffs rather than one hero prompt nobody can debug.

← Back to AI glossary in practice