The Challenge
What Ozzie Was Facing
Ozzie is a conversational AI assistant used inside enterprise workflows. The core engineering challenge was not just calling an LLM — it was building the infrastructure layer that made the product reliable and cost-efficient at scale: handling streaming responses without connection timeouts, routing between different model providers based on cost and latency, storing conversation context efficiently, and ensuring the platform stayed within budget as usage scaled.
The Solution
What We Built
We built an API-first platform on AWS Lambda and API Gateway with WebSocket support for streaming. A model router service selected between providers based on real-time latency and per-token cost metrics logged to DynamoDB. Conversation history was compressed and stored in S3 with a retrieval index in Redis, keeping context retrieval under 40ms. The entire infrastructure was defined in CDK, and blue-green deployments ensured zero-downtime releases as model integrations were updated.

Results
