Supply Genie
Supply Chain Coordinator AI Worker.
An AI coordinator that resolves common supply chain exceptions end-to-end across voice and email, executes enterprise actions via APIs, and escalates complex cases to humans. Replaces repetitive manual coordination across calls, emails, Slack, PO systems, and ticketing tools.
Problem: Operations teams are overloaded with repetitive exceptions (supplier delays, status checks). Manual coordination across disconnected systems like TMS/WMS, Slack, and email leads to slower resolutions and missed SLAs.
How it was built: Architected a durable orchestration engine. Integrated with real external APIs including Slack SDK, Twilio Server-side media streams, Resend for email, and Gemini 2.5 Flash for reasoning. Built a React-based Operator Workspace for human-in-the-loop approvals, timeline tracing, and KPIs observability using OpenTelemetry.
Stack: FastAPI, React, Python, Twilio, OpenTelemetry, Postgres, LangGraph, Gemini 2.5 Flash
Tags: System Design, Backend Engineering, AI Orchestration, Voice Streaming
GitHub: https://github.com/RohithReddy20/supply-genie
Overview
When a shipping container carrying temperature-sensitive inventory is delayed at the port, a Slack alert isn't enough. You need a system that can call the warehouse floor, email the customer, re-route inventory in the ERP, and coordinate logistics — sequentially, without dropping context, and without hallucinating steps.
Most "AI in logistics" projects I looked at were thin wrappers around an LLM — a two-paragraph system prompt plus unstructured API access — and they struggled the moment a webhook dropped a packet or a contractor talked over the bot on a phone call.
This project is a conceptual demo, not a production system. It's an honest attempt to explore the architectural patterns — idempotency, state machines, human-in-the-loop gates, concurrency control — that would be needed to make supply chain AI actually reliable. Heavily inspired by how HappyRobot approaches this problem.
Architecture
Frontend Layer (React): Operator Workspace built with Next.js and Shadcn UI — a command center for watching what the AI commits to the database, not for running it. Mixed-initiative chat, action timelines, and approval gates for any action that touches external parties.
Backend & Orchestration (FastAPI): FastAPI backend where the LLM is constrained to a defined set of function tools — it can't touch external systems directly. Python handles retries, circuit breakers, and state transitions. The AI plans; Python executes.
Connectors & Services: Real Twilio server-side media streams for two-way active communication, Slack SDK for team alerts, Resend for email dispatches, and mock TMS/WMS endpoints.
Data & Storage (PostgreSQL): Fully relational PostgreSQL schema managed via SQLAlchemy & Alembic. Covers domains spanning Incidents, Approvals, POs, Shipments, and tracking Action Execution logs.
Core Engineering
Omnichannel Flow Control: Architected dual modes of execution. Wait-and-watch (event webhook ingestion matching active playbooks) coupled with a proactive voice agent layer capable of triggering stateful updates using function calling. Built non-blocking real-time voice latency configurations by tuning queue draining algorithms and managing duplex SIP limits.
Concurrency & Safeties: Enforced idempotency headers combined with two-layer optimistic concurrency. If out-of-band writes collide, application layer checks explicitly lock database rows ensuring strictly serialized updates over critical financial documents (POs).
Metrics & Telemetry: Wrapped core API bounds with OpenTelemetry (OTel). Generated per-endpoint metrics and detailed spans containing metadata about request latency, escalation ratios, auto-resolution rates, and payload traces to a built KPI dashboard.
Execution Flow
Trigger Event Detected: Upstream ERP issues a webhook stating a supplier delay or an unexpected worker absence.
Playbook Allocation: Idempotency checks parse the request. AI orchestrator spawns action runs (Slack Notification, Database Shift Update, Voice Protocol sequence).
Execution & Human Approval: Agent autonomously fires Twilio SIP requests or Slack alerts. When hitting a gated boundary (e.g. Email to paying customer), workflow pauses and surfaces execution logs to React Operator Dashboard.
Conclusion & Observation: Operator flags approval. Flow unblocks, final states reach DB via OTel instrumented boundaries. Analytics Dashboard updates displaying auto-resolved metric increment.
Key Learnings
Decoupling cognition from execution makes failures predictable. The LLM plans; Python handles retries, backoff, and circuit breakers. Removing the AI from the failure loop keeps the context window clean.
Idempotency isn't glamorous but it's foundational. Without enforced Idempotency-Keys at the HTTP layer, duplicate webhooks from a stuttering WMS would have caused the agent to dispatch multiple replacement workers.
Human-in-the-loop is architectural, not cosmetic. Pausing execution at a database level — rather than prompting the AI to ask for permission — is what makes operations teams actually trust the system.
