Every plant manager wants the same thing from an APS: fast, correct, explainable. Pick any two and most vendors will quietly give you the third — usually fast and explainable, with 'correct' loosely defined. We refused that tradeoff. This post is a walkthrough of how we get all three under realistic load.
The problem, formally
Given N work orders, M machines, K resource constraints (operators, tools, materials), and a current snapshot of in-progress jobs, find a feasible assignment that minimises total weighted tardiness. We use a CP-SAT model from Google OR-Tools as the core, with a custom incremental layer on top.
Trick 1: warm-start from the previous solution
Most replans are tiny perturbations of a known-good schedule — one machine fault, a rush order, an operator calling in sick. We seed the CP-SAT solver with the previous solution as a hint. In practice this drops first-feasible time from ~6 seconds to ~280 ms for typical perturbations.
Trick 2: the constraint set is denormalised at write time
Our work_orders table has a precomputed JSONB column called solver_payload. Every time a work order, machine, or resource changes, a Postgres trigger updates the payload. The solver reads everything from this single column instead of joining four tables — turning a 14-join query into a single index lookup.
CREATE INDEX idx_work_orders_solver_active
ON work_orders USING gin (solver_payload jsonb_path_ops)
WHERE status IN ('scheduled','in_progress');Trick 3: streaming intermediate solutions
The UI subscribes via WebSocket. As the solver finds better solutions, we stream them — even if the optimal is still 800 ms away, the planner sees the first feasible schedule in under 300 ms. They can act immediately if it's good enough; they can wait if they want optimal. Operator agency is the point.
Trick 4: solver explanations as first-class output
Every constraint that matters carries a human-readable label. When a job slips, the solver returns 'tardy because tool T-44 is allocated to higher-priority order WO-2241 from 09:00 to 13:00' — not just 'no feasible solution.' This single feature has saved dozens of support tickets.
Why we don't use 'AI' in our marketing more
CP-SAT is exact. It's deterministic. Given identical input, it always produces the same output, with mathematical proof of feasibility. That's the opposite of what most people mean by 'AI' in 2026. We do use ML — for demand forecasting and predictive maintenance — but the moment a planner asks 'why is this schedule the way it is,' we want a precise answer, not a probability distribution.