How document verification works step by step, why retained ID images are a breach honeypot, and how to verify a document without storing it under GDPR and NIST.
Table of contents
- Document verification confirms an identity document is authentic, valid and belongs to the person presenting it, through data extraction, tamper checks and an optional biometric face match.
- The five-stage process (capture, extract, authenticate, cross-reference, face match plus liveness) is now table stakes. Two vendors can run the same checks and carry opposite risk.
- The real 2026 differentiator is data architecture: does the provider retain a central copy of every ID image and biometric template, or verify and forget?
- Retained images are a standing liability. The IDMerit leak exposed roughly one billion identity records; the Discord age-verification breach exposed about 70,000 government-ID images.
- NIST now requires injection-attack detection and analysis for generative-AI media, separate from and additional to liveness, so smile-and-blink alone no longer meets the bar.
- You can verify a document without storing it, using user-held keys, sharded storage and a reusable credential, which keeps the audit trail without the honeypot.
Document verification is the process of confirming that an identity document, such as a passport, driver's license or national ID card, is authentic, valid and belongs to the person presenting it. It combines data extraction, authenticity and tamper checks against the document's security features, and increasingly a biometric face match with liveness detection to defeat fraud.
TL;DR
Document verification confirms an identity document is genuine and belongs to the person holding it, through capture, optical character recognition, authenticity checks against built-in security features, database cross-referencing and an optional face match with liveness detection. By 2026 those steps are commodity. The decision that actually changes your risk is architecture: most providers upload and retain a copy of every ID image and biometric template, which under the General Data Protection Regulation is a data-minimisation problem and a breach honeypot. Named incidents at IDMerit, Discord and Sumsub show the cost. This guide explains the process, the new NIST requirements on injection and deepfake media, and how to verify a document without storing it centrally.
What is document verification?
Document verification is the process of confirming that an identity document, such as a passport, driver's license or national ID card, is authentic, valid and belongs to the person presenting it. It combines data extraction through optical character recognition (OCR), authenticity and tamper checks against the document's security features, and increasingly a biometric face match with liveness detection to defeat fraud. It is a pillar of identity proofing, the broader exercise of establishing that a person is who they claim to be before you grant them an account or access.
For regulated firms this is not optional. Under the United States Customer Identification Program rule, 31 CFR 1020.220, a bank must obtain at least a customer's name, date of birth, address and identification number before opening an account, and may verify identity by reference to an unexpired government-issued photo ID such as a driver's license or passport. The Financial Action Task Force (FATF), the global anti-money-laundering standard setter, ties this kind of document checking directly to customer due diligence in its 2020 Guidance on Digital Identity. So the check sits inside know your customer (KYC) software and feeds the wider onboarding decision.
How does document verification work step by step?
The modern flow has five stages and, end to end, it can complete in seconds. First is capture: the user photographs or scans the front and back of the document, or the device reads the chip. Second is data extraction, where OCR lifts the printed fields and, on an electronic passport, the machine-readable zone (MRZ) is parsed. ICAO Doc 9303, the authoritative travel-document standard, defines the MRZ format and a check-digit algorithm using a repeating 7-3-1 weighting modulo 10, so the data can be machine-verified for tampering rather than merely read.
Third is authenticity and tamper detection: the system validates built-in security features such as holograms, microprint, fonts, layout templates and the MRZ checksums, and looks for signs of editing, recapture or a printed copy. This is the step that separates a real check from a simple document check that only reads the text. Fourth is database cross-referencing against issuing-authority or third-party records where available. Fifth is an optional biometric face match with liveness, covered next. Each stage narrows the space a fraudster can occupy, and the order matters: authenticity before identity.
What is a selfie check and liveness detection?
A selfie check captures a live image of the person and compares it to the photo on the document, mapping facial landmarks to confirm the holder is the document's owner. Liveness detection, more precisely presentation-attack detection (PAD), confirms a real, present human rather than a printed photo, a mask, a screen replay or a deepfake. The two run together: the face match answers "is this the same face," and PAD answers "is this a live person in front of the sensor."
The bar moved in 2025. NIST Special Publication 800-63A, the identity-proofing volume of the final 800-63-4 revision, requires that at identity assurance level 2 (IAL2) remote, where biometrics are collected and compared remotely, the credential service provider implement presentation-attack detection meeting an Impostor Attack Presentation Accept Rate (IAPAR) below 0.07, tested to ISO/IEC 30107-3:2023. NIST does not declare liveness obsolete. It adds a distinct requirement, discussed below, for detecting injected media. Practical liveness within the flow is necessary but, on its own, no longer sufficient. For the attacker's view of how these checks are probed, see our guide to why your KYC vendor is your biggest data-breach risk.
What documents can be verified?
The core set is primary government-issued identity documents: passports, driver's licenses, national ID cards, residence permits and visas. Beyond identity, verification extends to supporting evidence such as proof-of-address documents (utility bills and bank statements) and, for business onboarding, company records like incorporation certificates and registry extracts. Coverage breadth matters because a person from any country may need to onboard, and leading systems support large libraries of document templates across many countries, though exact counts vary by vendor and should be read as a published figure, not a universal standard.
The document type drives the checks. A passport offers a chip and an MRZ to read and validate; a paper utility bill offers neither, so it relies on layout, issuer and data consistency. For business verification, the documents and registries differ again, which is why company onboarding runs through dedicated know your business (KYB) software rather than the same flow as a personal ID. Address evidence has its own pattern, set out in our note on proof-of-address verification. Matching the check to the document is part of doing the job properly.
How does it differ from identity verification?
Document verification confirms the document itself is genuine, valid and unaltered. Identity verification is the broader process of confirming that the person is who they claim to be, of which the document check is one component, sitting alongside biometrics, database and sanctions screening, and sometimes knowledge-based checks. Put simply, the document side proves the ID is real; identity verification proves it is yours and that you exist.
| Question | Document verification | Identity verification |
|---|---|---|
| What it confirms | The document is genuine and unaltered | The person is who they claim to be |
| Core methods | OCR, security-feature and tamper checks, MRZ validation | Document verification plus face match, liveness, database and sanctions screening |
| Output | Document is valid or suspect | Identity is proofed to an assurance level |
| Where it sits | One step inside onboarding | The full onboarding decision |
The distinction matters for scoping a build. You can have a valid document presented by the wrong person, or a genuine person holding a forged document, and only the combination catches both. Document verification is the evidence; identity proofing is the verdict that anti-money-laundering rules actually require.
Manual or automated document verification software?
Manual review means a human compares a document image against records and judgement. It is slow, inconsistent at scale, vulnerable to fatigue and bias, and it does not survive onboarding volumes in the thousands. Automated document verification software uses OCR, computer vision and forensic models to scan, validate and cross-check in seconds, with a consistent decision and an audit log. For any regulated firm onboarding at volume, automation is the baseline, usually delivered through an API or software development kit so it runs inside the existing onboarding flow. Our piece on KYC automation that replaces manual verification walks through that shift.
Here is the seed of the whole argument. Once accuracy is good enough across vendors, accuracy stops being the differentiator. The thing that separates a compliance asset from a future breach headline is no longer how well the software reads a hologram. It is what the software does with your customer's data afterwards: whether the provider keeps a central copy of every ID image and biometric template, or verifies and forgets. Hold that thought through the next two sections, because it is the part most buyer guides skip.
What does the 2026 fraud landscape look like?
The threat model has shifted from physical forgery to synthetic media. Attackers now generate fake IDs and deepfake selfies with generative-AI tools, and run injection attacks that feed forged video straight into the camera stream instead of holding a fake to a real lens. This is why NIST SP 800-63A, in Section 3.14, now requires credential service providers to implement controls confirming digital media comes from a genuine sensor, to detect virtual cameras, emulators and injection. The text says providers SHALL analyse submitted media for modification, manipulation, tampering and forgery, and SHOULD analyse it for signatures of generative-AI and deepfake tools.
The key point for buyers is precise: NIST treats injection-attack detection and deepfake-media analysis as requirements that are distinct from, and additional to, presentation-attack detection. Liveness is not banned or obsolete; it is one layer, and the standard now mandates a second. A system that passes a smile-and-blink prompt but cannot tell a real sensor from an injected stream meets neither the spirit nor the letter of the current guidance. That raises the bar on the verification itself, and, as the next section argues, on the data you keep once verification is done.
Is document verification secure and GDPR-compliant?
It can be, but only if data handling is. Here is the uncomfortable truth most vendor guides bury: traditional document verification requires the user to upload an ID image and the vendor to retain a copy of it, often alongside a biometric template. Under the General Data Protection Regulation, Recital 64 says a controller should not retain personal data for the sole purpose of being able to react to potential requests, and Article 9(1) classifies biometric data used to uniquely identify a person as a special category whose processing is prohibited unless a specific exception applies. Every retained ID image and template is therefore both a minimisation problem and a standing breach target.
The cost is not hypothetical, and three named, dated incidents make it concrete. In November 2025, Cybernews found an unprotected, password-less MongoDB database linked to identity vendor IDMerit exposing roughly one billion identity records across 26 countries, including national ID numbers, dates of birth and full names, secured only the next day (IDMerit disputed that its core systems were compromised). In October 2025, Discord disclosed a breach of a third-party customer-service vendor in which attackers accessed government-ID images submitted for age verification, affecting about 70,000 users, as NBC News reported and Discord's own 3 October 2025 disclosure set out; Discord disputed a threat actor's far larger extortion claim of millions of IDs, and the named vendor, 5CA, publicly denies that it handled any government IDs for the client. And an intrusion at Sumsub's third-party support platform, begun in July 2024, went undetected for around 18 months until a January 2026 audit. Different firms, same lesson: retained images are the liability. We unpack the pattern in our analysis of why your KYC vendor is your biggest data-breach risk and the Coinbase data breach of 2025.
How to verify a document without storing it?
The contrarian payoff is that you can run all five verification stages and then not retain the raw personal data centrally at all. Learning how to verify a document without storing it starts with moving the trust boundary: verify authenticity and identity at the point of capture, keep the customer in control of the key, and shard what remains so no single system holds a complete record. In Zyphe's model, the NFC chip is read to ICAO 9303 and eIDAS standards with two-step liveness and no image upload, verified data is sharded across a decentralised network of more than 60,000 nodes under a 29-of-100 threshold scheme, and the customer holds the key. There is no master key and no central honeypot, yet an authorised party can still reconstruct a complete record and export an audit-ready trail on demand.
The European direction of travel supports this. Regulation (EU) 2024/1183, the second electronic identification and trust services regulation (eIDAS 2), entered into force on 20 May 2024 and establishes the EU Digital Identity Wallet for reusable, user-controlled credentials. That is the same idea as a reusable KYC passport: verify once, re-present elsewhere, without re-uploading or re-storing the document. Steelmanning the other side, a well-run central store in a SOC 2 environment, encrypted at rest and tightly minimised, can pass examinations for years, and many do. The honest counter is that encryption did not save the firms above, because the failure was access and retention, not cipher strength. To learn how to verify a document without storing it in practice, see how it works; the cost case sits in our note on KYC cost reduction, where Zyphe models a materially lower total KYC cost stack, a fraction of the cost of a conventional stack, as a self-published estimate contingent on a healthy returning-user rate and verify-then-shred, not an independently audited figure.
How do you choose a provider?
Score providers on coverage, fraud defence, speed and the criterion incumbents omit. Use the scorecard below to compare a centralising profile against a verify-without-storing profile. Most well-known providers, including Sumsub, Onfido, Veriff, Jumio, Trulioo and Persona, retain document images centrally; that is a fact about architecture, not a slur on their detection quality, and a head-to-head sits in our identity verification software comparison.

Download: the provider scorecard (PDF) is a one-page, print-ready version of this comparison you can keep beside your vendor shortlist and share with risk and compliance teams.
| Evaluation criterion | Centralising incumbent profile | Verify-without-storing profile |
|---|---|---|
| Document and country coverage | Broad published template libraries | Broad published template libraries |
| PAD per ISO/IEC 30107-3 | Usually present | Present |
| Injection and deepfake-media detection (NIST 800-63A 3.14) | Varies by vendor | Required by design |
| Speed and UX drop-off | Fast, image upload step | Fast, NFC read, no image upload |
| API or SDK integration | Standard | Single API, about 15-minute integration |
| KYC, AML and sanctions coverage | Yes | Yes |
| Retains a central copy of every ID and biometric template | Yes, honeypot risk | No central PII |
The last row is the decisive one. Two systems can run identical OCR, face match and liveness and carry opposite risk depending on whether they keep the data. Read the pricing model alongside it, because usage-based pricing with no minimums changes the maths for low-volume and seasonal onboarding. Architecture is the question that turns an onboarding check from a future incident into a durable asset.
The bottom line
Document verification in 2026 is a solved process and an unsolved decision. The five stages, capture, extract, authenticate, cross-reference and match, are commodity, and the new NIST requirements on injection and deepfake media simply raise the floor everyone must clear. What still splits the field is the data: keep a central copy of every ID image and biometric template and you build a honeypot that GDPR penalises and attackers find, as IDMerit, Discord and Sumsub each show in their own way. Verify the document and the human, then provably forget the raw personal data, and you keep the audit trail without the liability. Architecture, not accuracy, is the choice that ages well.
Related resources
- KYC software with no central PII honeypot
- KYB software for business document verification
- Proof-of-address verification explained
- Why your KYC vendor is your biggest data-breach risk
- Identity verification software comparison 2026
- How Zyphe verifies identity without storing PII
Cited sources
- NIST SP 800-63A, Identity Proofing and Enrollment (PAD, injection and deepfake-media requirements)
- GDPR Recital 64, identity of the data subject and retention
- GDPR Article 9, processing of special categories of personal data
- 31 CFR 1020.220, Customer Identification Program requirements for banks (eCFR)
- Regulation (EU) 2024/1183, eIDAS 2 and the EU Digital Identity Wallet (EUR-Lex)
- ICAO Doc 9303 Part 4, machine-readable passports and MRZ check digits
- Cybernews, global identity data leak exposes roughly one billion records (IDMerit)
- NBC News, 70,000 government ID photos exposed in Discord user hack
- Discord, update on a security incident involving third-party customer service (3 October 2025 disclosure)
- SecurityWeek, customer-service firm 5CA denies responsibility for the Discord data breach
- Fortune, Discord age-verification breach and Persona exposure
Michelangelo Frigo(Co-Founder at Zyphe)Michelangelo Frigo is a privacy and identity infrastructure expert and co-founder of Zyphe.