Question 1

What is a validation study in hiring?

Accepted Answer

A validation study tests the link between an assessment score and a real job outcome (performance ratings, retention, time-to-productivity) in your specific context. It is the evidence base that justifies using a test to make employment decisions. Two common approaches are criterion validity (does a high score predict high performance?) and content validity (does the test sample the actual tasks of the job?). In the United States, the EEOC Uniform Guidelines require validation evidence before any selection procedure is used at scale. In the EU, GDPR and emerging AI regulation push similar requirements for automated scoring. Without validation, you are relying on the vendor's generalised research, which may not transfer to your roles, your hiring pipeline, or your workforce.

Question 2

Why do most recruiting teams skip validation studies?

Accepted Answer

Validation requires time, job analysis work, a sample size large enough to detect a meaningful correlation, access to performance data, and statistical expertise most TA teams do not have in-house. Vendors often present their own validity research, which covers their general client base but not your specific roles, seniority levels, or industry segment. The result is widespread use of unvalidated assessments across hiring pipelines, sometimes legally defensible by proxy (borrowed validation), sometimes not. AI-based [resume screening](/ai-glossary-in-practice/ai-based-resume-screening) and scoring tools face the same gap: vendors frequently lack role-specific validation evidence, and buying teams rarely ask for it during procurement. The risk surfaces in [adverse impact](/ai-glossary-in-practice/adverse-impact) audits or legal challenges, not on demo day.

Question 3

What does a basic validation study involve?

Accepted Answer

At minimum: a job analysis to define the competencies the assessment claims to measure, a sample of employees or candidates with both assessment scores and a performance criterion (ratings, output metrics, 90-day retention), statistical analysis showing the correlation is meaningful and not an artefact, and a differential prediction check to confirm the tool does not predict differently across protected groups. Sample size is a constraint: you typically need 100 to 300 matched pairs for a reliable criterion study. Smaller organisations often rely on synthetic validity (combining job analysis data across similar roles) or transport validity studies from the vendor, where the burden of proof is on showing the general evidence applies to your context. Document the process thoroughly because regulators and plaintiff attorneys ask for it.

Question 4

How does this apply to AI screening tools?

Accepted Answer

AI tools that score resumes, rank candidates, or predict interview outcomes are selection procedures under EEOC guidance and are subject to the same validation requirements as traditional tests. The challenge is that many AI vendors use proprietary models trained on historical hiring data, which can encode the biases of past decisions. A validation study should check both whether the tool predicts performance and whether it shows [adverse impact](/ai-glossary-in-practice/adverse-impact) against protected groups. California and New York City have made [AI bias audits](/ai-glossary-in-practice/ai-bias-audit) a legal requirement for automated employment decision tools in some jurisdictions. Buyers who skip validation on AI tools face both legal and reputational risk when the model degrades or auditors arrive.

Question 5

What is the difference between validation and reliability?

Accepted Answer

Reliability means the tool produces consistent scores: the same candidate gets the same score on a retest. Validity means the scores mean something real about job performance. A reliable tool can still be invalid. A cognitive test might produce highly consistent scores across administrations but predict sales performance only weakly for your specific product and customer base. Both matter, but validity is the higher bar. Reliability is necessary but not sufficient. In practice, vendors often report reliability coefficients because they are easy to measure internally; validity evidence tied to job outcomes is harder to produce and less commonly shared. Ask both questions during procurement: how consistent is this, and how well does it predict performance in roles like mine?

Question 6

Who owns validation studies in a TA team?

Accepted Answer

Typically industrial-organisational psychologists or assessment specialists within HR, or outside consultants hired during vendor evaluation or litigation. In organisations without specialist expertise, the legal or people analytics team often owns the question by default, usually when a challenge arrives rather than proactively. Best practice is assigning ownership before deploying any high-stakes assessment: define who commissions the study, who reviews the output, and who monitors [adverse impact](/ai-glossary-in-practice/adverse-impact) metrics over time. Join [AI in recruiting workshops](/workshops) where procurement and compliance questions around AI-based assessments come up in live Q&A with practitioners who have run the procurement process and faced the audits.

Validation study (selection)

What is a validation study in selection?

In practice

Quick read, then how hiring teams use it

Plain-language summary

When you are running live reqs and tools

Where we talk about this

Around the web (opinions and rabbit holes)

Related on this site

Frequently asked questions