AI with Michal

Validation study (selection)

A structured research process that tests whether a hiring assessment, test, or scoring tool actually predicts the job outcome it claims to predict, such as performance or retention, in the specific role and organisation where it will be used.

Michal Juhas · Last reviewed June 26, 2026

What is a validation study in selection?

A validation study is the research that proves whether a hiring assessment actually does what it claims. Rather than taking a vendor at their word that a test predicts performance, a validation study collects data from your actual hiring pipeline or workforce and tests the statistical link between assessment scores and real job outcomes.

For recruiters and TA leaders, this matters in two practical ways. First, it is the legal defence when a regulator or plaintiff asks whether your selection process is job-related. Second, it is the quality signal that tells you whether the assessment is worth the time it adds to your hiring process. An assessment that does not predict performance is friction with no return.

The concept applies equally to traditional psychometric tests and to AI-based tools that rank resumes or score candidate responses. A high-performing model in a vendor demo is not evidence of validity in your specific context.

In practice

  • A large retail employer uses a personality questionnaire for store manager hiring without a local validation study. Three years later, an adverse impact claim surfaces and legal asks for the validation file. The vendor provides a general study covering retail broadly, which is not strong enough to defend the specific use case.
  • A TA ops lead at a mid-size tech company commissions a content validity study before deploying a work sample test for software engineers. The process takes six weeks and a small external consultant fee, but the team can point to job task alignment when a candidate challenges the process.
  • An AI resume screener is evaluated during vendor renewal. The people analytics team runs a retrospective analysis against performance data for the past 18 months and finds the tool predicts top-performer placement slightly better than the prior manual process, but shows a small disparity for one demographic group that triggers a calibration review before renewal.

Quick read, then how hiring teams use it

This is for recruiters, TA leaders, and HR business partners who need to understand validation before procuring assessments or defending selection processes. Skim the first section for the core concept. Use the second when you are in a vendor evaluation, a compliance review, or building the case for or against an assessment tool.

Plain-language summary

  • What it means for you: A validation study is the evidence that a test or AI scorer actually predicts job performance in your specific context, not just in the vendor's general research.
  • How you would use it: Ask for validation evidence during procurement and check whether the study covers roles, seniority levels, and demographic groups comparable to your hiring population.
  • How to get started: Pull one high-volume role, pull performance data for people hired via your current process, and check whether scores from the assessment you are evaluating correlate with the outcome you care about.
  • When it is a good time: Before deploying any assessment at scale, and during any annual review of tools that contribute to employment decisions.

When you are running live reqs and tools

  • What it means for you: Every assessment tool in your pipeline that influences hiring decisions carries a validation obligation. AI-based tools are explicitly included under EEOC guidance and increasingly under state and local law.
  • When it is a good time: Before procurement approval and before any tool reaches more than a pilot cohort. Revisit annually.
  • How to use it: Request the technical manual and validity evidence from the vendor. Check whether it covers your role family and seniority. If it does not, ask for a transport validity rationale or commission a small local study.
  • How to get started: Start with your most-used assessment. Map who owns validity oversight. Establish a cadence for monitoring adverse impact metrics on any tool that produces a score used in hiring decisions.
  • What to watch for: Vendors citing test reliability instead of validity evidence, generic industry studies presented as role-specific validation, and AI tools with no published adverse impact data.

Where we talk about this

On AI with Michal live sessions, assessment validity comes up in AI in recruiting blocks when participants are evaluating AI screening and scoring tools and need to separate vendor claims from evidence. The membership community includes HR leaders who have navigated regulatory audits involving AI-based selection tools.

Around the web (opinions and rabbit holes)

Third-party creators move fast. Treat these as starting points, not endorsements.

YouTube

  • Searches for "selection validation study HR" and "EEOC uniform guidelines hiring" on YouTube surface I-O psychology explainers and employment law discussions aimed at HR practitioners.

Reddit

  • r/IOPsychology has detailed discussions on validity types, sample size requirements, and how practitioners run studies in applied settings.
  • r/humanresources has candid threads on assessment procurement and what happens when legal gets involved after a challenge.

Quora

  • Searches for "hiring assessment validity" and "how to validate a pre-employment test" collect a range of practitioner and academic answers worth filtering by applied context.

Related on this site

Frequently asked questions

What is a validation study in hiring?
A validation study tests the link between an assessment score and a real job outcome (performance ratings, retention, time-to-productivity) in your specific context. It is the evidence base that justifies using a test to make employment decisions. Two common approaches are criterion validity (does a high score predict high performance?) and content validity (does the test sample the actual tasks of the job?). In the United States, the EEOC Uniform Guidelines require validation evidence before any selection procedure is used at scale. In the EU, GDPR and emerging AI regulation push similar requirements for automated scoring. Without validation, you are relying on the vendor's generalised research, which may not transfer to your roles, your hiring pipeline, or your workforce.
Why do most recruiting teams skip validation studies?
Validation requires time, job analysis work, a sample size large enough to detect a meaningful correlation, access to performance data, and statistical expertise most TA teams do not have in-house. Vendors often present their own validity research, which covers their general client base but not your specific roles, seniority levels, or industry segment. The result is widespread use of unvalidated assessments across hiring pipelines, sometimes legally defensible by proxy (borrowed validation), sometimes not. AI-based resume screening and scoring tools face the same gap: vendors frequently lack role-specific validation evidence, and buying teams rarely ask for it during procurement. The risk surfaces in adverse impact audits or legal challenges, not on demo day.
What does a basic validation study involve?
At minimum: a job analysis to define the competencies the assessment claims to measure, a sample of employees or candidates with both assessment scores and a performance criterion (ratings, output metrics, 90-day retention), statistical analysis showing the correlation is meaningful and not an artefact, and a differential prediction check to confirm the tool does not predict differently across protected groups. Sample size is a constraint: you typically need 100 to 300 matched pairs for a reliable criterion study. Smaller organisations often rely on synthetic validity (combining job analysis data across similar roles) or transport validity studies from the vendor, where the burden of proof is on showing the general evidence applies to your context. Document the process thoroughly because regulators and plaintiff attorneys ask for it.
How does this apply to AI screening tools?
AI tools that score resumes, rank candidates, or predict interview outcomes are selection procedures under EEOC guidance and are subject to the same validation requirements as traditional tests. The challenge is that many AI vendors use proprietary models trained on historical hiring data, which can encode the biases of past decisions. A validation study should check both whether the tool predicts performance and whether it shows adverse impact against protected groups. California and New York City have made AI bias audits a legal requirement for automated employment decision tools in some jurisdictions. Buyers who skip validation on AI tools face both legal and reputational risk when the model degrades or auditors arrive.
What is the difference between validation and reliability?
Reliability means the tool produces consistent scores: the same candidate gets the same score on a retest. Validity means the scores mean something real about job performance. A reliable tool can still be invalid. A cognitive test might produce highly consistent scores across administrations but predict sales performance only weakly for your specific product and customer base. Both matter, but validity is the higher bar. Reliability is necessary but not sufficient. In practice, vendors often report reliability coefficients because they are easy to measure internally; validity evidence tied to job outcomes is harder to produce and less commonly shared. Ask both questions during procurement: how consistent is this, and how well does it predict performance in roles like mine?
Who owns validation studies in a TA team?
Typically industrial-organisational psychologists or assessment specialists within HR, or outside consultants hired during vendor evaluation or litigation. In organisations without specialist expertise, the legal or people analytics team often owns the question by default, usually when a challenge arrives rather than proactively. Best practice is assigning ownership before deploying any high-stakes assessment: define who commissions the study, who reviews the output, and who monitors adverse impact metrics over time. Join AI in recruiting workshops where procurement and compliance questions around AI-based assessments come up in live Q&A with practitioners who have run the procurement process and faced the audits.

← Back to AI glossary in practice