A platform that reads, classifies and routes thousands of documents a month

An insurance group was drowning its teams in paper claims and policies. We built the platform that sorts everything automatically, plus an assistant that answers across the whole history.

Details anonymized and adapted for confidentiality.

The problem

An insurance group, several entities across several countries. Every month, thousands of documents come in: claim filings, endorsements, policies, supporting docs, letters. All of it as PDFs, photos, low-quality scans, in three languages.

Sorting was manual. Case handlers spent a large chunk of their day opening attachments, guessing what each one was, and dropping it in the right folder of the right entity. Average time before a document landed in the right place: 2 to 3 days. And nobody had the full picture: each entity worked in its own corner.

What we built

Ingestion pipeline - Every incoming document is read by OCR, then a language model extracts its type, the entity it belongs to, the policy number and key dates. Structured text out, ready to route.

Classification and routing - The document is filed automatically into the right folder of the right entity, with a confidence score. Below a threshold, it goes to a human review queue instead of being guessed.

RAG assistant over the whole history - A handler asks a question in plain language and gets a sourced answer, with links to the exact documents. No more digging through ten folders to reconstruct a case.

Oversight dashboard per entity - Volumes, turnaround, automation rate, review queues. Each entity sees its own numbers, the group sees all of it.

The result

Routing time dropped from 2-3 days to a few minutes for the vast majority of documents.
Around 85% of documents classified with no human touch, the rest sent to targeted review.
Handlers got several hours back per day, redirected to actually working the cases.
One multi-entity source of truth, no more per-country silos.
History searchable in seconds instead of manual digging.

The stack

OCR + language models for extraction and classification
Async ingestion pipeline, scalable across volume spikes
RAG with a vector store over the full document history
Integration API with the entities’ existing business systems
Confidence thresholds and a human review queue for ambiguous cases

The problem

What we built

The result

The stack

A problem close to one of these?