AI with Michal

Candidate deduplication and merge rules

Logic applied inside an ATS or CRM to detect when two records represent the same candidate and decide which fields to keep, merge, or discard so that sourcing history, application data, and outreach limits are preserved without duplicating contact.

Michal Juhas · Last reviewed May 24, 2026

What is candidate deduplication?

Candidate deduplication is the process of finding and resolving records in your ATS or CRM where the same person appears more than once. Merge rules are the field-level decisions that govern which data survives when two records are combined: whose email address becomes primary, whether notes from both records are preserved, and what happens to the secondary record after the merge completes.

Together these two concepts make up the data hygiene foundation that sourcing analytics, outreach frequency limits, and GDPR compliance all depend on. If a candidate appears twice, every metric that touches that candidate is wrong.

Illustration: two duplicate candidate record cards detected by a matching engine, reviewed at a human gate, and merged into a single canonical record with a field-resolution indicator and audit log chip

In practice

  • A sourcer who sends an outreach message to a candidate, gets no reply, and then accidentally re-sources the same person from a different platform a month later is running into a deduplication failure: the ATS did not recognize the two records as the same person, so the suppression logic did not fire.
  • A TA ops team preparing for a GDPR audit discovers that 12 percent of the ATS records are duplicates, which means at least 12 percent of their data subject access request responses are incomplete, because only one of the two records was found and returned.
  • A recruiter who runs a pipeline report and sees a candidate appearing in two different stages simultaneously is seeing the visual symptom of a duplicate that was partially progressed on each record independently.

Quick read, then how hiring teams use it

This is for TA ops practitioners, recruiters, and data owners who maintain ATS data quality. Skim the first section for the shared vocabulary. Use the second when you are designing deduplication rules, planning a cleanup project, or building ingestion logic for a new sourcing channel.

Plain-language summary

  • What it means for you: When the same person exists twice in your ATS, your outreach limits, stage reports, and candidate history are all wrong. Deduplication finds those pairs; merge rules decide what survives.
  • How you would use it: Define the match criteria (email, phone, name plus date of birth, or name plus last employer) and the field-level resolution rules before running any merge. Run a sample of 100 detected pairs through the rules manually before batch-processing thousands.
  • How to get started: Export all records with duplicate email addresses from your ATS. That list is your baseline. Every pair on it is a confirmed duplicate. Start merge rules there before tackling the harder fuzzy-name cases.
  • When it is a good time: Before an ATS migration, before a GDPR audit, or when sourcing metrics start showing candidates appearing in multiple pipeline stages simultaneously.

When you are running live reqs and tools

  • What it means for you: Every new sourcing channel you add increases your duplicate creation rate unless you build deduplication at ingestion. ATS-level deduplication after the fact is far more expensive than a matching check at the API or webhook level.
  • When it is a good time: Before you launch a new sourcing integration. If you are wiring recruiting webhooks or API imports from a new tool, add a match-check step that queries the ATS by email and phone before creating a new record.
  • How to use it: Write your merge rules into a one-page document: field priority, note concatenation behavior, GDPR implications, and what audit log is created for each merge. Store that document with your ATS configuration so it survives staff changes.
  • How to get started: Run a duplicate count query in your ATS (most platforms expose this in admin or analytics). If the duplicate rate is above 5 percent, start a cleanup project. If it is below 5 percent, focus on building ingestion-level deduplication to keep it there.
  • What to watch for: Auto-merge tools that do not log what was merged and from which source, false positives where two different candidates share a common name and employer, GDPR deletion requests that only delete one of two duplicate records, and outreach tools that cache email addresses from before a merge and continue suppressing a now-deduplicated record.

Where we talk about this

On AI with Michal live sessions, deduplication surfaces whenever teams connect a new sourcing tool to their ATS and discover that the integration creates new records rather than matching existing ones. In sourcing automation workshops, we wire deduplication logic at the webhook level so the problem does not accumulate. If you want the room conversation, start at Sourcing Lab and bring your current ATS and the sourcing integrations you are running.

Around the web (opinions and rabbit holes)

Third-party creators move fast. Treat these as starting points, not endorsements, and double-check anything before you run batch merges against your ATS.

YouTube

Reddit

Quora

Deduplication approach comparison

ApproachHow it worksRisk
Exact email matchFlag records with identical emailMisses name-change and multi-email cases
Fuzzy name plus employerScore similarity on name and last companyFalse positives for common names
Embedding similarityVector match across full profile textOpaque scoring, harder to audit
Manual review queueHuman reviews all flagged pairsSlow at scale but lowest false-positive risk

Related on this site

Frequently asked questions

Why do duplicate candidate records accumulate in an ATS?
Duplicates form when the same person enters the system through more than one channel: a direct application, a sourcer import, an agency submission, a referral form, and a re-application after a role closes. Each channel typically creates a new record rather than matching to an existing one. Name format variations, different email addresses (personal versus work), and name changes after marriage or legal updates all defeat simple exact-match checks. High-volume ATS environments can accumulate duplicate rates above 20 percent within three years if no deduplication rule is applied at ingestion. The cost appears in outreach logs and sourcing reports before it appears in candidate complaints.
What is the difference between deduplication and record merging?
Deduplication is the detection step: the system identifies that two records likely represent the same person using matching logic based on email address, phone number, name similarity, and sourcing history. Merging is the resolution step: the system or a human reviewer decides which record becomes the primary, which fields from the secondary record to carry over (notes, tags, previous applications), and what happens to the secondary record (typically archived or deleted). Many ATS platforms detect but do not merge automatically because field-level merge rules require business decisions. For example, when two records have conflicting current companies, a human needs to choose which one to keep, not an algorithm.
How does AI improve candidate deduplication beyond exact-match logic?
Exact-match logic catches the same email address appearing twice. AI-based matching, typically using semantic search or embedding similarity, catches the same person under "Jon Smith" and "Jonathan Smith" with different phone numbers and overlapping work history. This is called fuzzy or probabilistic matching: the system scores record pairs on weighted criteria and flags pairs above a confidence threshold for review. The risk is false positives, where two different people with similar profiles get merged, losing one candidate's history entirely. AI deduplication tools should surface match confidence scores and keep a human review gate for any pair below maximum confidence rather than auto-merging everything.
What merge rules do TA ops teams need to define before running a deduplication project?
Before merging at scale, define field-level resolution rules: which record's email address wins when two differ, whether notes from both records are concatenated or only the primary record's notes are kept, how overlapping tags are handled, whether application history from both records appears on the merged record, and how GDPR deletion requests interact with merged records (if a candidate requests erasure of one record, does the merged record also delete). Document these rules before you touch a single record. Running deduplication without defined merge rules typically produces a cleaned record count and a support queue of panicked recruiters asking why their candidate notes vanished.
What are the GDPR implications of candidate deduplication?
Under GDPR, merging records means you need a clear lawful basis for retaining the data from both source records in the merged record, and the merged record must satisfy any data subject rights requests from either original record. If a candidate submitted a data subject access request on one email address and you merge that record into a second one, the DSAR must still be fulfilled. Similarly, if a candidate requests deletion of the record created by their application but not the one created by a sourcing import, you need to know which fields came from which source. Log the merge operation with a timestamp so you can reconstruct what data existed before and after.
When should TA ops teams run a full ATS deduplication project?
Run a deduplication project before migrating to a new ATS (duplicate records in the old system become double the problem in the new one), before a GDPR audit, or when sourcing analytics show anomalies like candidates appearing multiple times in stage counts or outreach logs showing the same person messaged twice within a suppression window. Many teams also trigger a project when candidate nurturing sequences start flagging deliverability issues, which is often caused by duplicate records defeating the outreach frequency rules. Do not run a full merge project without a backup snapshot of all affected records, regardless of which tool you use.
How does deduplication connect to multi-channel sourcing and AI tools?
As teams add more sourcing channels, including AI browser automation, job board imports, LinkedIn exports, and referral forms, the volume of inbound records increases and the likelihood of duplicates grows proportionally. AI sourcing tools that pull profiles from multiple platforms often create one record per source by default. If your ATS does not deduplicate at ingestion, an actively sourced candidate can accumulate three or four records within a single search project. Build deduplication rules at the API or webhook level rather than retroactively. Recruiting webhooks that post to a matching endpoint before creating a new ATS record are far cheaper than a manual cleanup project six months later.

← Back to AI glossary in practice