Learn more about the latest security and privacy threats
Back

Document Verification in 2026: How It Works, and How to Verify an ID Without Storing It

Michelangelo FrigoMichelangelo Frigo(Co-Founder at Zyphe)Published July 3, 2026Updated July 3, 2026
ID document with a photo and a green verification check, representing document verification

How document verification works step by step, why retained ID images are a breach honeypot, and how to verify a document without storing it under GDPR and NIST.

Table of contents
  • Document verification confirms an identity document is authentic, valid and belongs to the person presenting it, through data extraction, tamper checks and an optional biometric face match.
  • The five-stage process (capture, extract, authenticate, cross-reference, face match plus liveness) is now table stakes. Two vendors can run the same checks and carry opposite risk.
  • The real 2026 differentiator is data architecture: does the provider retain a central copy of every ID image and biometric template, or verify and forget?
  • Retained images are a standing liability. The IDMerit leak exposed roughly one billion identity records; the Discord age-verification breach exposed about 70,000 government-ID images.
  • NIST now requires injection-attack detection and analysis for generative-AI media, separate from and additional to liveness, so smile-and-blink alone no longer meets the bar.
  • You can verify a document without storing it, using user-held keys, sharded storage and a reusable credential, which keeps the audit trail without the honeypot.

Document verification is the process of confirming that an identity document, such as a passport, driver's license or national ID card, is authentic, valid and belongs to the person presenting it. It combines data extraction, authenticity and tamper checks against the document's security features, and increasingly a biometric face match with liveness detection to defeat fraud.

TL;DR

Document verification confirms an identity document is genuine and belongs to the person holding it, through capture, optical character recognition, authenticity checks against built-in security features, database cross-referencing and an optional face match with liveness detection. By 2026 those steps are commodity. The decision that actually changes your risk is architecture: most providers upload and retain a copy of every ID image and biometric template, which under the General Data Protection Regulation is a data-minimisation problem and a breach honeypot. Named incidents at IDMerit, Discord and Sumsub show the cost. This guide explains the process, the new NIST requirements on injection and deepfake media, and how to verify a document without storing it centrally.

What is document verification?

Document verification is the process of confirming that an identity document, such as a passport, driver's license or national ID card, is authentic, valid and belongs to the person presenting it. It combines data extraction through optical character recognition (OCR), authenticity and tamper checks against the document's security features, and increasingly a biometric face match with liveness detection to defeat fraud. It is a pillar of identity proofing, the broader exercise of establishing that a person is who they claim to be before you grant them an account or access.

For regulated firms this is not optional. Under the United States Customer Identification Program rule, 31 CFR 1020.220, a bank must obtain at least a customer's name, date of birth, address and identification number before opening an account, and may verify identity by reference to an unexpired government-issued photo ID such as a driver's license or passport. The Financial Action Task Force (FATF), the global anti-money-laundering standard setter, ties this kind of document checking directly to customer due diligence in its 2020 Guidance on Digital Identity. So the check sits inside know your customer (KYC) software and feeds the wider onboarding decision.

How does document verification work step by step?

The modern flow has five stages and, end to end, it can complete in seconds. First is capture: the user photographs or scans the front and back of the document, or the device reads the chip. Second is data extraction, where OCR lifts the printed fields and, on an electronic passport, the machine-readable zone (MRZ) is parsed. ICAO Doc 9303, the authoritative travel-document standard, defines the MRZ format and a check-digit algorithm using a repeating 7-3-1 weighting modulo 10, so the data can be machine-verified for tampering rather than merely read.

Third is authenticity and tamper detection: the system validates built-in security features such as holograms, microprint, fonts, layout templates and the MRZ checksums, and looks for signs of editing, recapture or a printed copy. This is the step that separates a real check from a simple document check that only reads the text. Fourth is database cross-referencing against issuing-authority or third-party records where available. Fifth is an optional biometric face match with liveness, covered next. Each stage narrows the space a fraudster can occupy, and the order matters: authenticity before identity.

What is a selfie check and liveness detection?

A selfie check captures a live image of the person and compares it to the photo on the document, mapping facial landmarks to confirm the holder is the document's owner. Liveness detection, more precisely presentation-attack detection (PAD), confirms a real, present human rather than a printed photo, a mask, a screen replay or a deepfake. The two run together: the face match answers "is this the same face," and PAD answers "is this a live person in front of the sensor."

The bar moved in 2025. NIST Special Publication 800-63A, the identity-proofing volume of the final 800-63-4 revision, requires that at identity assurance level 2 (IAL2) remote, where biometrics are collected and compared remotely, the credential service provider implement presentation-attack detection meeting an Impostor Attack Presentation Accept Rate (IAPAR) below 0.07, tested to ISO/IEC 30107-3:2023. NIST does not declare liveness obsolete. It adds a distinct requirement, discussed below, for detecting injected media. Practical liveness within the flow is necessary but, on its own, no longer sufficient. For the attacker's view of how these checks are probed, see our guide to why your KYC vendor is your biggest data-breach risk.

What documents can be verified?

The core set is primary government-issued identity documents: passports, driver's licenses, national ID cards, residence permits and visas. Beyond identity, verification extends to supporting evidence such as proof-of-address documents (utility bills and bank statements) and, for business onboarding, company records like incorporation certificates and registry extracts. Coverage breadth matters because a person from any country may need to onboard, and leading systems support large libraries of document templates across many countries, though exact counts vary by vendor and should be read as a published figure, not a universal standard.

The document type drives the checks. A passport offers a chip and an MRZ to read and validate; a paper utility bill offers neither, so it relies on layout, issuer and data consistency. For business verification, the documents and registries differ again, which is why company onboarding runs through dedicated know your business (KYB) software rather than the same flow as a personal ID. Address evidence has its own pattern, set out in our note on proof-of-address verification. Matching the check to the document is part of doing the job properly.

How does it differ from identity verification?

Document verification confirms the document itself is genuine, valid and unaltered. Identity verification is the broader process of confirming that the person is who they claim to be, of which the document check is one component, sitting alongside biometrics, database and sanctions screening, and sometimes knowledge-based checks. Put simply, the document side proves the ID is real; identity verification proves it is yours and that you exist.

QuestionDocument verificationIdentity verification
What it confirmsThe document is genuine and unalteredThe person is who they claim to be
Core methodsOCR, security-feature and tamper checks, MRZ validationDocument verification plus face match, liveness, database and sanctions screening
OutputDocument is valid or suspectIdentity is proofed to an assurance level
Where it sitsOne step inside onboardingThe full onboarding decision

The distinction matters for scoping a build. You can have a valid document presented by the wrong person, or a genuine person holding a forged document, and only the combination catches both. Document verification is the evidence; identity proofing is the verdict that anti-money-laundering rules actually require.

Manual or automated document verification software?

Manual review means a human compares a document image against records and judgement. It is slow, inconsistent at scale, vulnerable to fatigue and bias, and it does not survive onboarding volumes in the thousands. Automated document verification software uses OCR, computer vision and forensic models to scan, validate and cross-check in seconds, with a consistent decision and an audit log. For any regulated firm onboarding at volume, automation is the baseline, usually delivered through an API or software development kit so it runs inside the existing onboarding flow. Our piece on KYC automation that replaces manual verification walks through that shift.

Here is the seed of the whole argument. Once accuracy is good enough across vendors, accuracy stops being the differentiator. The thing that separates a compliance asset from a future breach headline is no longer how well the software reads a hologram. It is what the software does with your customer's data afterwards: whether the provider keeps a central copy of every ID image and biometric template, or verifies and forgets. Hold that thought through the next two sections, because it is the part most buyer guides skip.

What does the 2026 fraud landscape look like?

The threat model has shifted from physical forgery to synthetic media. Attackers now generate fake IDs and deepfake selfies with generative-AI tools, and run injection attacks that feed forged video straight into the camera stream instead of holding a fake to a real lens. This is why NIST SP 800-63A, in Section 3.14, now requires credential service providers to implement controls confirming digital media comes from a genuine sensor, to detect virtual cameras, emulators and injection. The text says providers SHALL analyse submitted media for modification, manipulation, tampering and forgery, and SHOULD analyse it for signatures of generative-AI and deepfake tools.

The key point for buyers is precise: NIST treats injection-attack detection and deepfake-media analysis as requirements that are distinct from, and additional to, presentation-attack detection. Liveness is not banned or obsolete; it is one layer, and the standard now mandates a second. A system that passes a smile-and-blink prompt but cannot tell a real sensor from an injected stream meets neither the spirit nor the letter of the current guidance. That raises the bar on the verification itself, and, as the next section argues, on the data you keep once verification is done.

Is document verification secure and GDPR-compliant?

It can be, but only if data handling is. Here is the uncomfortable truth most vendor guides bury: traditional document verification requires the user to upload an ID image and the vendor to retain a copy of it, often alongside a biometric template. Under the General Data Protection Regulation, Recital 64 says a controller should not retain personal data for the sole purpose of being able to react to potential requests, and Article 9(1) classifies biometric data used to uniquely identify a person as a special category whose processing is prohibited unless a specific exception applies. Every retained ID image and template is therefore both a minimisation problem and a standing breach target.

The cost is not hypothetical, and three named, dated incidents make it concrete. In November 2025, Cybernews found an unprotected, password-less MongoDB database linked to identity vendor IDMerit exposing roughly one billion identity records across 26 countries, including national ID numbers, dates of birth and full names, secured only the next day (IDMerit disputed that its core systems were compromised). In October 2025, Discord disclosed a breach of a third-party customer-service vendor in which attackers accessed government-ID images submitted for age verification, affecting about 70,000 users, as NBC News reported and Discord's own 3 October 2025 disclosure set out; Discord disputed a threat actor's far larger extortion claim of millions of IDs, and the named vendor, 5CA, publicly denies that it handled any government IDs for the client. And an intrusion at Sumsub's third-party support platform, begun in July 2024, went undetected for around 18 months until a January 2026 audit. Different firms, same lesson: retained images are the liability. We unpack the pattern in our analysis of why your KYC vendor is your biggest data-breach risk and the Coinbase data breach of 2025.

How to verify a document without storing it?

The contrarian payoff is that you can run all five verification stages and then not retain the raw personal data centrally at all. Learning how to verify a document without storing it starts with moving the trust boundary: verify authenticity and identity at the point of capture, keep the customer in control of the key, and shard what remains so no single system holds a complete record. In Zyphe's model, the NFC chip is read to ICAO 9303 and eIDAS standards with two-step liveness and no image upload, verified data is sharded across a decentralised network of more than 60,000 nodes under a 29-of-100 threshold scheme, and the customer holds the key. There is no master key and no central honeypot, yet an authorised party can still reconstruct a complete record and export an audit-ready trail on demand.

The European direction of travel supports this. Regulation (EU) 2024/1183, the second electronic identification and trust services regulation (eIDAS 2), entered into force on 20 May 2024 and establishes the EU Digital Identity Wallet for reusable, user-controlled credentials. That is the same idea as a reusable KYC passport: verify once, re-present elsewhere, without re-uploading or re-storing the document. Steelmanning the other side, a well-run central store in a SOC 2 environment, encrypted at rest and tightly minimised, can pass examinations for years, and many do. The honest counter is that encryption did not save the firms above, because the failure was access and retention, not cipher strength. To learn how to verify a document without storing it in practice, see how it works; the cost case sits in our note on KYC cost reduction, where Zyphe models a materially lower total KYC cost stack, a fraction of the cost of a conventional stack, as a self-published estimate contingent on a healthy returning-user rate and verify-then-shred, not an independently audited figure.

How do you choose a provider?

Score providers on coverage, fraud defence, speed and the criterion incumbents omit. Use the scorecard below to compare a centralising profile against a verify-without-storing profile. Most well-known providers, including Sumsub, Onfido, Veriff, Jumio, Trulioo and Persona, retain document images centrally; that is a fact about architecture, not a slur on their detection quality, and a head-to-head sits in our identity verification software comparison.

Provider scorecard comparing a centralising incumbent that retains every ID image against a verify-without-storing profile, with the decisive row marking central PII retention as honeypot risk.
The last row is decisive, two providers can run identical checks and carry opposite risk depending on whether they keep the data.

Download: the provider scorecard (PDF) is a one-page, print-ready version of this comparison you can keep beside your vendor shortlist and share with risk and compliance teams.

Evaluation criterionCentralising incumbent profileVerify-without-storing profile
Document and country coverageBroad published template librariesBroad published template libraries
PAD per ISO/IEC 30107-3Usually presentPresent
Injection and deepfake-media detection (NIST 800-63A 3.14)Varies by vendorRequired by design
Speed and UX drop-offFast, image upload stepFast, NFC read, no image upload
API or SDK integrationStandardSingle API, about 15-minute integration
KYC, AML and sanctions coverageYesYes
Retains a central copy of every ID and biometric templateYes, honeypot riskNo central PII

The last row is the decisive one. Two systems can run identical OCR, face match and liveness and carry opposite risk depending on whether they keep the data. Read the pricing model alongside it, because usage-based pricing with no minimums changes the maths for low-volume and seasonal onboarding. Architecture is the question that turns an onboarding check from a future incident into a durable asset.

The bottom line

Document verification in 2026 is a solved process and an unsolved decision. The five stages, capture, extract, authenticate, cross-reference and match, are commodity, and the new NIST requirements on injection and deepfake media simply raise the floor everyone must clear. What still splits the field is the data: keep a central copy of every ID image and biometric template and you build a honeypot that GDPR penalises and attackers find, as IDMerit, Discord and Sumsub each show in their own way. Verify the document and the human, then provably forget the raw personal data, and you keep the audit trail without the liability. Architecture, not accuracy, is the choice that ages well.

Cited sources

Michelangelo FrigoMichelangelo Frigo(Co-Founder at Zyphe)Michelangelo Frigo is a privacy and identity infrastructure expert and co-founder of Zyphe.

Frequently Asked Questions

Document verification is the process of confirming that an identity document, such as a passport, driver's license or national ID card, is authentic, valid and belongs to the person presenting it. It uses OCR data extraction, authenticity and tamper checks against security features, and often a biometric face match with liveness. It underpins KYC and anti-money-laundering onboarding and broader fraud prevention.

The user captures or scans the ID, OCR extracts the data, the system checks authenticity and security features such as holograms, the machine-readable zone and microprint for tampering, cross-references databases where available, and optionally matches a live selfie with liveness detection. On an electronic passport the chip and MRZ are read and validated. The full process typically completes in seconds.

Government-issued identity documents (passports, driver's licenses, national ID cards, residence permits and visas), proof-of-address documents (utility bills and bank statements), and business documents for KYB such as incorporation certificates. Leading systems support large libraries of document templates across many countries, though exact coverage figures vary by vendor and should be read as published claims.

Document verification confirms the document itself is genuine and unaltered. Identity verification is the broader process of confirming the person is who they claim to be, of which document verification is one component alongside biometrics, liveness and database checks. Put plainly, document verification proves the ID is real; identity verification proves it is yours and that you exist.

It can be, but only with proper data handling. The General Data Protection Regulation says controllers should not retain personal data solely to react to potential requests (Recital 64) and treats biometric data used to identify a person as a special category (Article 9). Privacy-first systems verify without storing raw PII centrally, avoiding the honeypot risk that retained ID images create.

A selfie check captures a live image and matches the face to the ID photo to confirm document ownership. Liveness, or presentation-attack detection, confirms a real, present human rather than a photo, mask, replay or deepfake. NIST SP 800-63A sets a presentation-attack metric and, separately, requires detection of injected and generative-AI media, so smile-and-blink liveness alone no longer meets the standard.

Automated document verification software uses OCR, computer vision and forensic models to scan, validate and cross-check identity documents in seconds, far faster and more consistent than manual review. It is usually delivered through an API or SDK so it runs inside existing onboarding flows. The key differentiator between vendors is data architecture, whether they retain a central copy, not accuracy alone.

Yes. Privacy-first architectures verify authenticity and identity at capture, then avoid retaining raw PII centrally, using user-held encryption keys, sharded storage so no single node holds a full record, and a reusable KYC passport. This removes the breach honeypot while staying audit-ready by construction, and Zyphe models it as a materially lower total KYC cost stack, a fraction of the cost of a conventional stack, under stated assumptions.

Verify documents without storing them

Zyphe checks ID documents for authenticity and matches them to a live user — without retaining a copy of the document.

Book a demo