Booking 800 documents a month without typing a single one
An accounting firm drowning in client PDFs, photos and scans. We built a pipeline that turns that chaos into entries ready to push into the accounting software.
Details anonymized and adapted for confidentiality.
The problem
An accounting firm of around fifteen people. Every month-end, clients send their documents in every format imaginable: crooked scanned PDFs, photos of receipts shot with flash, bank exports as CSV with columns that change from one bank to the next, supplier invoices buried in email attachments.
An accountant spent on average 3 to 4 hours per file per month opening, reading and retyping all of it by hand into the software. Across 60 active files, that is over 200 hours of pure data entry every month. The real work, analysis and advice, came last.
What we built
Universal intake - One drop box per client (email plus web drag and drop). PDF, JPG, scans, bank CSVs, it does not matter. Everything lands in one place, timestamped and tied to the right file.
AI extraction - An OCR plus language model pipeline that reads each document and pulls out the structured fields: supplier, date, net and VAT amounts, rate, invoice number, currency. Bank statements are parsed line by line, whatever the bank’s format.
Rules and chart of accounts - Automatic mapping to the right accounts based on the supplier and the file’s history. The system learns from past corrections and suggests the correct expense account from the second occurrence onward.
Targeted human control - Nothing is pushed blind. A validation interface surfaces only the doubtful documents (unreadable amount, inconsistent VAT, unknown supplier). The rest clears in one click.
The result
- From 3-4 h down to 35 min per file per month. The accountant validates instead of typing.
- Over 90% of documents extracted with no intervention, the rest flagged for review.
- Around 160 hours a month handed back to the team, reinvested in client advisory.
- VAT discrepancies caught upstream, before the error reaches the filing.
- Direct export into the existing accounting software, no tool change.
The stack
- OCR plus language model for structured document extraction
- Python pipeline orchestrating intake, parsing and accounting mapping
- In-house rules layer that learns from the accountant’s corrections
- Lightweight web validation interface for targeted control
- Export connector into the accounting software already in place