Question 1

Why do duplicate candidate records accumulate in an ATS?

Accepted Answer

Duplicates form when the same person enters the system through more than one channel: a direct application, a sourcer import, an agency submission, a referral form, and a re-application after a role closes. Each channel typically creates a new record rather than matching to an existing one. Name format variations, different email addresses (personal versus work), and name changes after marriage or legal updates all defeat simple exact-match checks. High-volume ATS environments can accumulate duplicate rates above 20 percent within three years if no deduplication rule is applied at ingestion. The cost appears in outreach logs and sourcing reports before it appears in candidate complaints.

Question 2

What is the difference between deduplication and record merging?

Accepted Answer

Deduplication is the detection step: the system identifies that two records likely represent the same person using matching logic based on email address, phone number, name similarity, and sourcing history. Merging is the resolution step: the system or a human reviewer decides which record becomes the primary, which fields from the secondary record to carry over (notes, tags, previous applications), and what happens to the secondary record (typically archived or deleted). Many ATS platforms detect but do not merge automatically because field-level merge rules require business decisions. For example, when two records have conflicting current companies, a human needs to choose which one to keep, not an algorithm.

Question 3

How does AI improve candidate deduplication beyond exact-match logic?

Accepted Answer

Exact-match logic catches the same email address appearing twice. AI-based matching, typically using [semantic search](/ai-glossary-in-practice/semantic-search) or embedding similarity, catches the same person under "Jon Smith" and "Jonathan Smith" with different phone numbers and overlapping work history. This is called fuzzy or probabilistic matching: the system scores record pairs on weighted criteria and flags pairs above a confidence threshold for review. The risk is false positives, where two different people with similar profiles get merged, losing one candidate's history entirely. AI deduplication tools should surface match confidence scores and keep a human review gate for any pair below maximum confidence rather than auto-merging everything.

Question 4

What merge rules do TA ops teams need to define before running a deduplication project?

Accepted Answer

Before merging at scale, define field-level resolution rules: which record's email address wins when two differ, whether notes from both records are concatenated or only the primary record's notes are kept, how overlapping tags are handled, whether application history from both records appears on the merged record, and how GDPR deletion requests interact with merged records (if a candidate requests erasure of one record, does the merged record also delete). Document these rules before you touch a single record. Running deduplication without defined merge rules typically produces a cleaned record count and a support queue of panicked recruiters asking why their candidate notes vanished.

Question 5

What are the GDPR implications of candidate deduplication?

Accepted Answer

Under GDPR, merging records means you need a clear lawful basis for retaining the data from both source records in the merged record, and the merged record must satisfy any data subject rights requests from either original record. If a candidate submitted a data subject access request on one email address and you merge that record into a second one, the DSAR must still be fulfilled. Similarly, if a candidate requests deletion of the record created by their application but not the one created by a sourcing import, you need to know which fields came from which source. Log the merge operation with a timestamp so you can reconstruct what data existed before and after.

Question 6

When should TA ops teams run a full ATS deduplication project?

Accepted Answer

Run a deduplication project before migrating to a new ATS (duplicate records in the old system become double the problem in the new one), before a GDPR audit, or when sourcing analytics show anomalies like candidates appearing multiple times in stage counts or outreach logs showing the same person messaged twice within a suppression window. Many teams also trigger a project when [candidate nurturing](/ai-glossary-in-practice/candidate-nurturing) sequences start flagging deliverability issues, which is often caused by duplicate records defeating the outreach frequency rules. Do not run a full merge project without a backup snapshot of all affected records, regardless of which tool you use.

Question 7

How does deduplication connect to multi-channel sourcing and AI tools?

Accepted Answer

As teams add more sourcing channels, including [AI browser automation](/ai-glossary-in-practice/ai-browser-automation-recruiting), job board imports, LinkedIn exports, and referral forms, the volume of inbound records increases and the likelihood of duplicates grows proportionally. AI sourcing tools that pull profiles from multiple platforms often create one record per source by default. If your ATS does not deduplicate at ingestion, an actively sourced candidate can accumulate three or four records within a single search project. Build deduplication rules at the API or webhook level rather than retroactively. [Recruiting webhooks](/ai-glossary-in-practice/recruiting-webhooks) that post to a matching endpoint before creating a new ATS record are far cheaper than a manual cleanup project six months later.

Approach	How it works	Risk
Exact email match	Flag records with identical email	Misses name-change and multi-email cases
Fuzzy name plus employer	Score similarity on name and last company	False positives for common names
Embedding similarity	Vector match across full profile text	Opaque scoring, harder to audit
Manual review queue	Human reviews all flagged pairs	Slow at scale but lowest false-positive risk

Candidate deduplication and merge rules

What is candidate deduplication?

In practice

Quick read, then how hiring teams use it

Plain-language summary

When you are running live reqs and tools

Where we talk about this

Around the web (opinions and rabbit holes)

Deduplication approach comparison

Related on this site

Frequently asked questions