The Challenge
What MetekuAI Was Facing
MetekuAI analyses product reviews at scale using LLMs to extract structured sentiment, feature mentions, and competitive signals. The engineering challenge was designing an inference pipeline that could process over a million reviews cost-efficiently — synchronous LLM calls were too expensive and too slow for bulk processing — while guaranteeing that every review was processed exactly once and structured outputs were validated before being stored.
The Solution
What We Built
We designed the inference pipeline as an async job queue using SQS with long polling. Review batches were enqueued by an ingestion service, processed by auto-scaling ECS worker tasks that called the LLM API with structured output schemas (JSON mode), and results were validated against Pydantic schemas before writing to PostgreSQL. Idempotency was enforced via a review-fingerprint deduplication table. The pipeline included a dead-letter queue for malformed outputs and a retry strategy with exponential backoff. Cost was managed by model-tier routing: simpler classification tasks used a cheaper model, complex extraction used a higher-tier model, with the routing decision made per-review based on text length and category signals.

Results
