English
Book a Demo

Sovereign document intelligence for the Republic.

DigiLekh — India’s own document intelligence

India's sovereign Intelligent Document Processing (IDP) platform. Extract every field from scanned forms, ledgers and manuscripts across 13 scripts, printed or handwritten, and deliver structured, decision-ready data. Air-gapped on your department's server. Data never leaves India.

Air-gapped deployable DPDP Act 2023 CERT-In posture ISO 27001

Privacy isn't claimed, It's Private Always.

As SOC 2, HIPAA, GDPR, and ISO certified, we ensure enterprise-grade security – your data stays yours.

ISO 27001:2023 certified
✓ CERTIFIED
AICPA SOC 2 certified
✓ CERTIFIED
HIPAA compliant
✓ COMPLIANT
GDPR
✓ COMPLIANT
DPDP
✓ READY
The Commissioners

Built for institutions that govern records.

Every level of the administrative apparatus runs on paper that must now run on data. DigiLekh is configured for each register: no department is too specific, no archive too old.

I.

Urban & Development Authorities

Municipal corporations, smart-city missions, regional development boards. Building plans, mutation registers, property tax files, water and sewerage records, trade licences.

II.

Revenue & Land Administration

Record rooms holding jamabandi, khatauni, patta, chitta, cadastral maps and mutation registers across every district tehsil and sub-division.

III.

Judicial & Quasi-Judicial Bodies

District and session courts, tribunals, commissions, regulatory authorities. Pleadings, judgements, case bundles and evidence ledgers in multiple regional scripts.

IV.

Welfare & Benefit Missions

Directorates administering pensions, rations, rural employment, scholarships and subsidies. Eligibility files, beneficiary registers, life certificates, utilisation returns.

V.

Public Sector Undertakings

PSUs in power, railways, banking, defence electronics, oil & gas, telecom. Pay rolls, procurement files, vigilance records, stores ledgers and technical drawings.

VI.

Archives, Libraries & Heritage Bodies

National and state archives, oriental research institutes, manuscript missions and museum registries. Palm-leaf manuscripts, ruler-period records, rare print collections.

§ The Wedge

Four reasons procurement officers shortlist DigiLekh.

Side-by-side: our position versus legacy ECM platforms and global IDP tools.

01

Data Sovereignty

Hosted in your SDC, secured by your rules.

DigiLekh

True on-prem, air-gapped AI on your department's infrastructure or MeitY-empanelled SDC.

Legacy

Built for cloud. On-prem AI deployments struggle with model updates and GPU licensing.

02

Linguistic Nuance

Built for 22 Bhashini languages, not retrofitted.

DigiLekh

Native Indic engine. Handwritten Devanagari, Tamil, Bengali, Urdu. HWR-first design.

Legacy

Global tools treat Indian scripts as secondary translation tasks. Accuracy falls off a cliff.

03

Forensic Shield

Pixel-level tamper detection at is ingestion.

DigiLekh

ELA, JPEG Ghost Maps, synthetic-media detection. Catches fraud before workflow.

Legacy

Conventional DMS has zero forensic capability. Vulnerable to GAN-generated content.

04

Legal Defensibility

Audit logs mapped to the Indian Evidence Act.

DigiLekh

Non-repudiation, SHA-2 hashing, digital signature integration. Court-admissible by design.

Legacy

Audit trails built for international standards. Gaps around Indian Evidence Act specifics.

The Solution Architecture

Four pillars. One sovereign spine.

DigiLekh is a single source of truth across the entire document lifecycle, from the moment a page is captured to the dashboard that a secretary reads on the 1st of the month. It slots into the Government of India IT fabric: eOffice, API Setu and DigiLocker.

Smart Digitisation scan and capture interface
I.

Smart Digitisation

Archival preservation at the point of entry. A proprietary mobile and desktop scanning ecosystem that makes capture a one-time, lifetime process.

  • NAI-grade capture. 300–600 DPI · TIFF v6.0 · lossless LZW compression · PDF/A-3 archival output.
  • Edge enhancement. Automated curvature correction, deskewing, despeckling and background cleaning for aged paper and bound registers.
  • Chain of custody. SHA-256 checksum per page, barcode/QR separator mapping, tamper-evident ingest log.
Visual extraction console with OCR fields
II.

Visual Extraction

Beyond OCR. A VLM engine that understands the language of administration — forms, ledgers, notings, stamps — not just the text on them.

  • Native multilingual. All 22 Bhashini-supported scheduled languages, in native and romanised scripts — printed and handwritten.
  • HWR mastery. Handwriting recognition calibrated for old registers, departmental notings and cursive Devanagari.
  • Structural recovery. Automatic field mapping to eOffice, SPARROW, NGDRS and FHIR R4 metadata standards.
Forensic validation shield
III.

Forensic Validation

The first line of digital defence. DigiLekh catches manipulation before it enters the workflow — not after a grievance surfaces.

  • Pixel-level forensics. Error Level Analysis (ELA) and JPEG Ghost Maps identify insertions and copy-paste tampering at ingestion.
  • Authority APIs. Real-time cross-verification against SHCIL (e-Stamp) and DigiLocker-issued URIs via API Setu.
  • Synthetic-media detection. Surfaces GAN-generated modifications in photographs and signatures that pass routine visual inspection.
Predictive intelligence dashboard
IV.

Predictive Intelligence

Dashboards that mirror national monitoring standards. Raw documents become decision-grade intelligence without a separate BI tool.

  • Pendency Command Centre. Real-time visibility into file aging, SLA adherence and departmental bottlenecks.
  • DPDP Compliance Monitor. Automated RoPA logs, data-erasure job tracking, grievance-redressal SLAs.
  • Scheme Monitoring. Track critical Government scheme progress across districts and talukas, drill-down to document.
The Workflow

How DigiLekh works.

One sovereign pipeline. From the scanner on the ground to the monthly report on the Secretary's desk — with a forensic gate that catches tampered pages before a single record is trusted.

The Extraction Pipeline

From paper to verified record.

Manuscript capture step
Capture

Scan the record.

Field staff use the DigiLekh app to photograph revenue records, muster rolls, pension files, FIRs, pay slips — whatever the registry holds. Auto-deskew, auto-crop, auto-page detection. Works offline, syncs when the department VPN reconnects.

OCR extraction console
Extract

Read every field.

Upload a batch or point DigiLekh at an existing folder. The extraction console runs full OCR, handwritten recognition, and key-field parsing — on your department's own server or GPU. Every inference happens on-premises.

Validation layer
Validation layer

Catch the fraud before the file moves.

Every scanned page passes a forensic gate. At the image level — ELA tampering detection, copy-move clone analysis, seal and signature verification. At the data level — duplicate entries, amount inconsistencies, date-sequence errors. Suspect pages are quarantined and escalated with a full audit trail before any human sees a 'clean' record.

Validated extraction results table
Review & govern

Structured. Searchable. Governed.

What reaches the reviewer is already validated. Consolidated results appear in a table — PII masked by default for unauthorised reviewers, confidence flagged per field, one-click export to your DMS, eOffice, or GIS. Every read is logged. Every edit is traceable.

The data now exists — and is trusted. Next, it must think.
The Intelligence Layer

From verified record to actionable brief.

DigiLekh intelligence dashboard preview
Intelligence gathering

Connect the dots across registers.

Every record becomes searchable. Entities — persons, khatas, case numbers, account holders — surface across Jamabandi, pay slips, court orders and scheme rolls. An on-premises vector database indexes 2M+ documents with sub-second query. Duplicates, inconsistencies and cross-register linkages emerge automatically.

Sentiment analysis console preview
Sentiment analysis

Hear the tone at scale.

Run thousands of grievance letters, inspection reports, feedback forms or constituent correspondence through an Indic-fine-tuned sentiment engine. Tone distribution, recurring themes, critical-level escalations — routed automatically to the right desk, logged with source-letter attribution, 91.2% reviewer-agreement.

Reporting results table preview
Custom reports

Every report. Your way.

Build once, run monthly. Select columns, aggregations, filters. Export to PDF, XLSX, CSV, eOffice workflow, GIS shapefile or JSON. Schedule auto-dispatch to CAG, DDO or P&A every first of the month. Templates are department-owned, auditable, and portable across administrations.

The Tongues

We read every major Indian language.

Not "some." Not "Hindi and English, the rest soon." Every major script the Constitution recognises, and the heritage scripts the archives still hold.

22+Scheduled languages
13Writing scripts
6+Heritage scripts
हिन्दीবাংলাதமிழ்తెలుగుमराठीગુજરાતીಕನ್ನಡമലയാളംਪੰਜਾਬੀଓଡ଼ିଆঅসমীয়াاردوसंस्कृतनेपालीकोंकणीडोगरी

+ Heritage scripts: Modi · Kaithi · Sharada · Grantha · Nandinagari · Siddham

📜 Before — Palm-leaf manuscriptInput
ॐ असतो मा सद्गमय ।
तमसो मा ज्योतिर्गमय ।

मृत्योर्मा अमृतं गमय ॥
Bṛhadāraṇyaka Upaniṣad 1.3.28 · c. 800 BCE
DigiLekh AI extraction
📊 After — Structured recordOutput
Textॐ असतो मा सद्गमय ।
IASTOm asato mā sadgamaya
SourceBṛhadāraṇyaka Up. 1.3.28
ScriptDevanagari
Confidence94.2%
Heritage & Manuscripts

Preserving India's written inheritance.

The Government of India is undertaking the largest manuscript preservation effort in history. DigiLekh's handwritten text recognition handles ancient Sanskrit, Pali, Persian and regional scripts, converting fragile palm-leaf and paper manuscripts into searchable digital archives.

  • Custom HTR models trained per script lineage, not generic
  • Confidence-tagged transliteration with IAST and diacritics
  • Scholar-in-the-loop review for critical editions
  • Cryptographic provenance — manuscript, folio, date, custodian
The Deployment Posture

Three postures. Your risk. Your rules.

Most IDP vendors force a single model. DigiLekh is architected for a spectrum: from quick-start sovereign SaaS to a fully air-gapped appliance for classified workloads.

I.

Sovereign SaaS

Hosted in India - MeitY-empanelled CSP - DPDP-aligned

The fastest path to production. DigiLekh runs in an Indian cloud region with full platform feature parity, RBAC, DigiLocker and API Setu integrations.

  • All workflow stages live
  • 22 Bhashini languages
  • DPDP + ISO 27001 + SOC 2
  • Data residency enforceable at deployment
  • Multi-department tenancy
II.On Demand

On-Premises

State Data Centre - NIC SDC - dedicated GPU

For departments with sensitive data that must remain on their own metal. DigiLekh deploys to existing infrastructure or NIC-managed SDC with customer-managed storage.

  • Full platform features, customer-controlled
  • Open-source models hosted locally
  • Configurable logging and audit
  • Customer-held KMS and encryption
  • Optional managed updates by Predusk AI
III.

Air-Gapped Appliance

Defence - Intelligence - Strategic PSUs

A fully isolated device for classified workloads. Pre-deployed models, offline licence activation, no external APIs and no outbound telemetry.

  • Fully offline, no internet dependency
  • Pre-loaded model set
  • Offline licence tokens
  • Maximum isolation, zero external exposure
  • Designed for classified environments
Feature availability varies by posture. Air-gapped deployment ships a curated subset of models and capabilities, briefed per engagement.
The Commission

One platform. Configured for your department.

End-to-end today, extensible tomorrow. New document types, new languages and new integrations are added as modules without replacing what already works.

i.

Your documents

Land records, court files, welfare forms, manuscripts, pay bills, FIRs - whatever your department processes.

ii.

Your languages

Configure the language pack for your state. Hindi + Urdu for UP. Tamil for TN. Bengali + Santali for Jharkhand.

iii.

Your workflow

Define extraction fields, validation rules, approval chains and export formats specific to your process.

iv.

Your output

Structured data, searchable archives, decision dashboards and eOffice integration in the format you need.

The Next Step

A sample, before a procurement. On your terms.

Book a sample to have our team run a live extraction on documents you provide, or try a sample yourself with a sanitised set from your department's registry. Either way, nothing leaves your premises without your consent.