Supply Genie

Supply Chain Coordinator AI Worker.

An AI coordinator that resolves common supply chain exceptions end-to-end across voice and email, executes enterprise actions via APIs, and escalates complex cases to humans. Replaces repetitive manual coordination across calls, emails, Slack, PO systems, and ticketing tools.

Problem: Operations teams are overloaded with repetitive exceptions (supplier delays, status checks). Manual coordination across disconnected systems like TMS/WMS, Slack, and email leads to slower resolutions and missed SLAs.

How it was built: Architected a durable orchestration engine. Integrated with real external APIs including Slack SDK, Twilio Server-side media streams, Resend for email, and Gemini 2.5 Flash for reasoning. Built a React-based Operator Workspace for human-in-the-loop approvals, timeline tracing, and KPIs observability using OpenTelemetry.

Stack: FastAPI, React, Python, Twilio, OpenTelemetry, Postgres, LangGraph, Gemini 2.5 Flash

Tags: System Design, Backend Engineering, AI Orchestration, Voice Streaming

GitHub: https://github.com/RohithReddy20/supply-genie

Overview

When a shipping container carrying temperature-sensitive inventory is delayed at the port, a Slack alert isn't enough. You need a system that can call the warehouse floor, email the customer, re-route inventory in the ERP, and coordinate logistics — sequentially, without dropping context, and without hallucinating steps.

Most "AI in logistics" projects I looked at were thin wrappers around an LLM — a two-paragraph system prompt plus unstructured API access — and they struggled the moment a webhook dropped a packet or a contractor talked over the bot on a phone call.

This project is a conceptual demo, not a production system. It's an honest attempt to explore the architectural patterns — idempotency, state machines, human-in-the-loop gates, concurrency control — that would be needed to make supply chain AI actually reliable. Heavily inspired by how HappyRobot approaches this problem.

Architecture

Frontend Layer (React): Operator Workspace built with Next.js and Shadcn UI — a command center for watching what the AI commits to the database, not for running it. Mixed-initiative chat, action timelines, and approval gates for any action that touches external parties.

Backend & Orchestration (FastAPI): FastAPI backend where the LLM is constrained to a defined set of function tools — it can't touch external systems directly. Python handles retries, circuit breakers, and state transitions. The AI plans; Python executes.

Connectors & Services: Real Twilio server-side media streams for two-way active communication, Slack SDK for team alerts, Resend for email dispatches, and mock TMS/WMS endpoints.

Data & Storage (PostgreSQL): Fully relational PostgreSQL schema managed via SQLAlchemy & Alembic. Covers domains spanning Incidents, Approvals, POs, Shipments, and tracking Action Execution logs.

Core Engineering

Omnichannel Flow Control: Architected dual modes of execution. Wait-and-watch (event webhook ingestion matching active playbooks) coupled with a proactive voice agent layer capable of triggering stateful updates using function calling. Built non-blocking real-time voice latency configurations by tuning queue draining algorithms and managing duplex SIP limits.

Concurrency & Safeties: Enforced idempotency headers combined with two-layer optimistic concurrency. If out-of-band writes collide, application layer checks explicitly lock database rows ensuring strictly serialized updates over critical financial documents (POs).

Metrics & Telemetry: Wrapped core API bounds with OpenTelemetry (OTel). Generated per-endpoint metrics and detailed spans containing metadata about request latency, escalation ratios, auto-resolution rates, and payload traces to a built KPI dashboard.

Execution Flow

Trigger Event Detected: Upstream ERP issues a webhook stating a supplier delay or an unexpected worker absence.

Playbook Allocation: Idempotency checks parse the request. AI orchestrator spawns action runs (Slack Notification, Database Shift Update, Voice Protocol sequence).

Execution & Human Approval: Agent autonomously fires Twilio SIP requests or Slack alerts. When hitting a gated boundary (e.g. Email to paying customer), workflow pauses and surfaces execution logs to React Operator Dashboard.

Conclusion & Observation: Operator flags approval. Flow unblocks, final states reach DB via OTel instrumented boundaries. Analytics Dashboard updates displaying auto-resolved metric increment.

Key Learnings

Decoupling cognition from execution makes failures predictable. The LLM plans; Python handles retries, backoff, and circuit breakers. Removing the AI from the failure loop keeps the context window clean.

Idempotency isn't glamorous but it's foundational. Without enforced Idempotency-Keys at the HTTP layer, duplicate webhooks from a stuttering WMS would have caused the agent to dispatch multiple replacement workers.

Human-in-the-loop is architectural, not cosmetic. Pausing execution at a database level — rather than prompting the AI to ask for permission — is what makes operations teams actually trust the system.

Back to Archive

System Design

Supply Genie

Supply Chain Coordinator AI Worker.

Artifacts

Repo

Stack

System DesignBackend EngineeringAI OrchestrationVoice Streaming

Synopsis

Architecture

System Layers

Layer 01

Frontend Layer (React)

Operator Workspace built with Next.js and Shadcn UI — a command center for watching what the AI commits to the database, not for running it. Mixed-initiative chat, action timelines, and approval gates for any action that touches external parties.

Layer 02

Backend & Orchestration (FastAPI)

FastAPI backend where the LLM is constrained to a defined set of function tools — it can't touch external systems directly. Python handles retries, circuit breakers, and state transitions. The AI plans; Python executes.

III

Layer 03

Connectors & Services

Real Twilio server-side media streams for two-way active communication, Slack SDK for team alerts, Resend for email dispatches, and mock TMS/WMS endpoints.

Layer 04

Data & Storage (PostgreSQL)

Fully relational PostgreSQL schema managed via SQLAlchemy & Alembic. Covers domains spanning Incidents, Approvals, POs, Shipments, and tracking Action Execution logs.

Core Engineering

Field Notes

01.

Omnichannel Flow Control

Architected dual modes of execution. Wait-and-watch (event webhook ingestion matching active playbooks) coupled with a proactive voice agent layer capable of triggering stateful updates using function calling. Built non-blocking real-time voice latency configurations by tuning queue draining algorithms and managing duplex SIP limits.

02.

Concurrency & Safeties

Enforced idempotency headers combined with two-layer optimistic concurrency. If out-of-band writes collide, application layer checks explicitly lock database rows ensuring strictly serialized updates over critical financial documents (POs).

03.

Metrics & Telemetry

Wrapped core API bounds with OpenTelemetry (OTel). Generated per-endpoint metrics and detailed spans containing metadata about request latency, escalation ratios, auto-resolution rates, and payload traces to a built KPI dashboard.

Execution Flow

Dispatch

Execution Trace4 steps

Trigger Event Detected

Upstream ERP issues a webhook stating a supplier delay or an unexpected worker absence.

Playbook Allocation

Idempotency checks parse the request. AI orchestrator spawns action runs (Slack Notification, Database Shift Update, Voice Protocol sequence).

Execution & Human Approval

Agent autonomously fires Twilio SIP requests or Slack alerts. When hitting a gated boundary (e.g. Email to paying customer), workflow pauses and surfaces execution logs to React Operator Dashboard.

Conclusion & Observation

Operator flags approval. Flow unblocks, final states reach DB via OTel instrumented boundaries. Analytics Dashboard updates displaying auto-resolved metric increment.

Complete✓ OK

Reflections

Outcomes

"Decoupling cognition from execution makes failures predictable. The LLM plans; Python handles retries, backoff, and circuit breakers. Removing the AI from the failure loop keeps the context window clean.
"Idempotency isn't glamorous but it's foundational. Without enforced Idempotency-Keys at the HTTP layer, duplicate webhooks from a stuttering WMS would have caused the agent to dispatch multiple replacement workers.
"Human-in-the-loop is architectural, not cosmetic. Pausing execution at a database level — rather than prompting the AI to ask for permission — is what makes operations teams actually trust the system.

Next Steps

Extending the agent protocol to support LangGraph multi-agent orchestration for branching scenarios.
Improving conversational latency optimization using purely local-server Voice Activity Detection (VAD).

Rohith Chimpiri — Full Stack Engineer

Skills

AI & Agents

Frontend

Backend

Data & Infrastructure

Projects

Supply Genie

Overview

Architecture

Core Engineering

Execution Flow

Key Learnings

RBI Gen AI Platform

Vestra Assets

IDP Studio

Reft

Contact

Supply Genie

Synopsis

Architecture

Frontend Layer (React)

Backend & Orchestration (FastAPI)

Connectors & Services

Data & Storage (PostgreSQL)

Core Engineering

Omnichannel Flow Control

Concurrency & Safeties

Metrics & Telemetry

Execution Flow

Trigger Event Detected

Playbook Allocation

Execution & Human Approval

Conclusion & Observation