RC2026

System Design

Supply Genie

Supply Chain Coordinator AI Worker.

Artifacts

Stack

System DesignBackend EngineeringAI OrchestrationVoice Streaming

Synopsis

When a shipping container carrying temperature-sensitive inventory is delayed at the port, a Slack alert isn't enough. You need a system that can call the warehouse floor, email the customer, re-route inventory in the ERP, and coordinate logistics — sequentially, without dropping context, and without hallucinating steps.

Most "AI in logistics" projects I looked at were thin wrappers around an LLM — a two-paragraph system prompt plus unstructured API access — and they struggled the moment a webhook dropped a packet or a contractor talked over the bot on a phone call.

This project is a conceptual demo, not a production system. It's an honest attempt to explore the architectural patterns — idempotency, state machines, human-in-the-loop gates, concurrency control — that would be needed to make supply chain AI actually reliable. Heavily inspired by how HappyRobot approaches this problem.

Architecture

System Layers
I

Layer 01

Frontend Layer (React)

Operator Workspace built with Next.js and Shadcn UI — a command center for watching what the AI commits to the database, not for running it. Mixed-initiative chat, action timelines, and approval gates for any action that touches external parties.

01
II

Layer 02

Backend & Orchestration (FastAPI)

FastAPI backend where the LLM is constrained to a defined set of function tools — it can't touch external systems directly. Python handles retries, circuit breakers, and state transitions. The AI plans; Python executes.

02
III

Layer 03

Connectors & Services

Real Twilio server-side media streams for two-way active communication, Slack SDK for team alerts, Resend for email dispatches, and mock TMS/WMS endpoints.

03
IV

Layer 04

Data & Storage (PostgreSQL)

Fully relational PostgreSQL schema managed via SQLAlchemy & Alembic. Covers domains spanning Incidents, Approvals, POs, Shipments, and tracking Action Execution logs.

04

Core Engineering

Field Notes
01.

Omnichannel Flow Control

Architected dual modes of execution. Wait-and-watch (event webhook ingestion matching active playbooks) coupled with a proactive voice agent layer capable of triggering stateful updates using function calling. Built non-blocking real-time voice latency configurations by tuning queue draining algorithms and managing duplex SIP limits.

02.

Concurrency & Safeties

Enforced idempotency headers combined with two-layer optimistic concurrency. If out-of-band writes collide, application layer checks explicitly lock database rows ensuring strictly serialized updates over critical financial documents (POs).

03.

Metrics & Telemetry

Wrapped core API bounds with OpenTelemetry (OTel). Generated per-endpoint metrics and detailed spans containing metadata about request latency, escalation ratios, auto-resolution rates, and payload traces to a built KPI dashboard.

Execution Flow

Dispatch
Execution Trace4 steps
01

Trigger Event Detected

Upstream ERP issues a webhook stating a supplier delay or an unexpected worker absence.

02

Playbook Allocation

Idempotency checks parse the request. AI orchestrator spawns action runs (Slack Notification, Database Shift Update, Voice Protocol sequence).

03

Execution & Human Approval

Agent autonomously fires Twilio SIP requests or Slack alerts. When hitting a gated boundary (e.g. Email to paying customer), workflow pauses and surfaces execution logs to React Operator Dashboard.

04

Conclusion & Observation

Operator flags approval. Flow unblocks, final states reach DB via OTel instrumented boundaries. Analytics Dashboard updates displaying auto-resolved metric increment.

Complete✓ OK
Reflections
Outcomes
  • "Decoupling cognition from execution makes failures predictable. The LLM plans; Python handles retries, backoff, and circuit breakers. Removing the AI from the failure loop keeps the context window clean.
  • "Idempotency isn't glamorous but it's foundational. Without enforced Idempotency-Keys at the HTTP layer, duplicate webhooks from a stuttering WMS would have caused the agent to dispatch multiple replacement workers.
  • "Human-in-the-loop is architectural, not cosmetic. Pausing execution at a database level — rather than prompting the AI to ask for permission — is what makes operations teams actually trust the system.
Next Steps
  • Extending the agent protocol to support LangGraph multi-agent orchestration for branching scenarios.
  • Improving conversational latency optimization using purely local-server Voice Activity Detection (VAD).