How I Would Automate Żabka's Delivery Document Processing Using AI

The Scaling Problem No One Talks About

When a retail network reaches Żabka's scale over 10,000 franchise locations active across Poland the daily reconciliation of Delivery Notes (WZ documents) with supplier invoices and central Purchase Orders becomes one of the largest hidden operational costs in the business.

Each store receives multiple deliveries per day. Each delivery produces a WZ document. That document needs to be matched against the original PO in the ERP, validated for quantity and price discrepancies, and either approved or flagged. Across a 10,000-store network, this creates an estimated 50,000+ documents per day flowing through the back office. No manual process can handle that volume consistently.

The problem isn't speed alone it's consistency. Franchisees and data-entry clerks working under time pressure make errors. Errors mean delayed variance claims, incorrect inventory levels, and cash-flow friction. The standard response is to hire more people. The correct response is to architect a system that makes discrepancies impossible to miss.

Strategic Goal: Management by Exception

The design principle behind this architecture is Management by Exception a model where human operators at Żabka HQ only ever look at documents the system has already determined require attention. The 97% of deliveries that are fully compliant are processed and closed automatically. Only flagged transactions reach a human desk.

This shifts the role of the back-office team from data-entry to exception resolution and supplier relationship management. That is dramatically higher-value work at a fraction of the current headcount cost. It also means the 10,001st store adds almost no incremental processing burden the system scales with the network at near-zero marginal cost.

Technical Architecture: Three-Phase Pipeline

The system is built as a sequential processing pipeline. Each phase passes structured data to the next. No phase waits on human input unless the validation logic specifically routes a document to a review queue.

Phase 1 | Ingestion and Extraction (Make.com + Claude 3.5 Sonnet)

The pipeline triggers the moment a franchisee scans or emails a delivery note. Make.com acts as the cloud orchestrator it monitors a dedicated inbox or scanning endpoint, detects incoming documents, and routes them to the AI processing node without any human intervention.

The AI extraction node uses Claude 3.5 Sonnet via API. The key design decision here is to avoid template-based OCR rules entirely. Building and maintaining regex parsers for each regional supplier's document format is expensive and brittle a supplier changes their invoice layout, the parser breaks, documents back up. Instead, Claude receives a strict system prompt instructing it to extract a normalized JSON payload regardless of how the source document is formatted.

The system prompt defines a fixed schema: supplier tax ID, store code, delivery date, WZ number, and a line-items array containing EAN, description, quantity, and unit price for each product. Claude is explicitly instructed to return only valid JSON, use null for missing fields, and never invent values. This output is deterministic enough to be passed directly into a validation script without further cleaning.

Phase 2 | Logic and Validation (Python + SQL)

The extracted JSON payload is immediately passed to a Python validation service. This layer is entirely deterministic no AI, pure business logic. It does three things.

First, PO lookup: the script queries the central SQL database (or SAP via API) to pull the original Purchase Order for this store, supplier, and date combination. If no matching PO exists, the document is flagged as unmatched and escalated immediately.

Second, variance analysis: the script runs a line-by-line comparison between the WZ quantities and PO quantities, and between WZ prices and centrally contracted rates. Quantity tolerances are configurable per supplier contract a supplier with a 98% delivery SLA might have a 2% tolerance; a critical fresh-goods supplier might have zero. Price tolerances are set at PLN 0.01, effectively zero.

Third, routing: documents where every line item passes validation are auto-committed to the ERP and archived. Documents with any flag are routed to a regional manager queue with a pre-populated exception report showing exactly what mismatched, the affected SKUs, and the financial value at risk. A manager reviewing this queue never needs to open the original document.

Phase 3 | Executive Visibility (Power BI)

All processed transactions approved and flagged are written to a central data warehouse in real time. Power BI connects via DirectQuery, enabling the supply chain team to see a live view across the network without waiting for end-of-day batch reports.

The dashboard is structured around three views. The Exception Queue is used by regional managers and shows open flag count and value at risk, with alerts triggered by any HIGH severity flag. The Supplier Reliability Map is used by procurement leads and shows variance rate per supplier and region, with alerts triggered when a supplier exceeds 3% variance for three consecutive weeks. The Network Inventory Delta is used by demand planning and surfaces systemic short-delivery patterns across SKUs before they become stockout events.

What the System Actually Produces

To make this concrete: here is what a typical batch output looks like for one store's morning deliveries. This is the exception queue a regional manager sees after the pipeline has processed ten incoming WZ documents.

Eight documents are auto-approved with no further action required. The manager sees two flagged items. The first shows a quantity variance of minus 12 units on a single SKU from a Lay's delivery, with a calculated value at risk of PLN 47.88. The second shows a price discrepancy of PLN 0.15 per unit across three Mondelez SKUs, totalling PLN 21.60. One document is escalated with no matching PO found in the system, flagging PLN 312 of unmatched spend.

The manager resolves all three items in under ten minutes. Without this system, identifying those same three discrepancies manually would have taken 40 to 90 minutes of document cross-checking.

→ INSERT EMBED: Detection rate chart (embed-chart-detection.html)

Projected Business Impact

The efficiency numbers are directional estimates based on published benchmarks from FMCG document automation projects, adjusted for Poland's labour market and Żabka's known operational scale. They are not audited figures exact numbers would emerge from a pilot on a subset of stores.

The core metrics are: approximately 93% reduction in average processing time per document, dropping from roughly 18 minutes to under 1.5 minutes. Approximately 97% of deliveries processed and closed with zero human touchpoints. Cost per compliant transaction drops from an estimated PLN 4 to 8 manually to effectively PLN 0 at scale. Estimated payback period of 6 to 9 months based on FTE reduction and recovered margin leakage.

The more important number is the detection rate. Manual processes in FMCG logistics typically catch 40 to 45% of true delivery discrepancies the rest slip through unclaimed. An automated system running consistent validation logic catches upwards of 97% after a 60-day calibration period. At Żabka's volume, that difference represents several million PLN in recovered margin annually.

Honest Risk Assessment

A concept study that does not address implementation risks is not worth much.

The most significant technical risk is SAP and ERP integration complexity. The PO lookup step depends entirely on the quality of Żabka's ERP data model. If POs are stored inconsistently different formats per product category, partial records, legacy supplier IDs the validator needs additional cleansing layers before matching works reliably. In practice, this is typically 40 to 60% of the real engineering effort in any enterprise automation project.

The second risk is extraction errors on edge cases. Claude 3.5 Sonnet is highly capable at structured extraction but is not infallible. Heavily degraded scans, handwritten corrections on printed documents, or non-standard supplier formats can produce extraction errors. The mitigation is a confidence-scoring layer that routes low-confidence extractions to human review before they hit the validator. This preserves 100% accuracy on the human-reviewed subset while still automating the clear majority automatically.

The third risk is change management at the franchise level. The pipeline only works if franchisees scan and submit documents correctly. This is a people and process problem, not a technical one, and it is often the slowest part of any FMCG automation rollout. Training, a simple submission interface, and fast feedback loops franchisees see a confirmation within minutes of submitting are as important as the technical architecture.

The Bottom Line

The technology to build this exists today and is affordable at scale. Claude API, Make.com, and Power BI are production-grade platforms already running in enterprise environments. This architecture connects them in a way that is specifically designed around Żabka's operational reality: varied supplier formats, extremely high document volumes, and the need for real-time HQ visibility without bureaucratic overhead.

The result is a back-office function that scales with the network at near-zero marginal cost and where every compliant delivery is closed automatically, every discrepancy is surfaced immediately, and the supply chain team spends its time on supplier relationships not spreadsheets.