Prerequisites
Required Knowledge
This tutorial assumes you're comfortable with Python async/await patterns, REST API design, and have at least a passing familiarity with LLM APIs. You should know how to reason about stateful systems — agentic workflows are fundamentally about orchestrating state transitions across multiple LLM calls, tool invocations, and external service interactions.
Specifically, you need:
- Python 3.11+ with async programming (asyncio, async generators)
- FastAPI basics — routing, dependency injection, Pydantic models
- OpenAI or Anthropic SDK — we'll use
openaibut the patterns transfer directly - Understanding of tool calling / function calling in LLM APIs
- Basic familiarity with Redis or any async-capable key-value store for state persistence
Development Environment
You need Python 3.11 or later. Python 3.12 is recommended for the improved asyncio performance and better error messages. The architecture we're building uses async throughout — synchronous LLM calls inside an async framework will create thread contention under load.
Redis 7+ for workflow state persistence. You can run it locally via Docker:
An OpenAI API key (or Anthropic — the patterns are identical). Set it in your environment:
Dependencies
We keep the dependency footprint intentional. Every package here serves a specific purpose:
Install them:
Project Setup
Initialize Project
Structure matters for agentic systems because you'll be adding new agents, tools, and workflow definitions over time. Use a layout that makes those extension points obvious:
Create the project:
Configure Build Tools
Create config.py first — everything else imports from here:
The lru_cache ensures settings are parsed once at startup, not on every request. FastAPI's dependency injection system calls this per-request if you don't cache it.
Add Dependencies
Set up main.py with a proper lifespan context that initializes the Redis connection pool and OpenAI client once at startup:
Core Implementation
Building the Foundation
The core abstraction is a workflow — a named, stateful sequence of agent invocations. Each workflow run has a unique ID, persists its state to Redis, and can be resumed after interruption.
Start with the state store:
Adding Business Logic
The agent is the unit of reasoning. It runs an LLM call with a set of tools, processes tool calls, and loops until the model decides it's done or we hit the iteration limit:
Connecting Services
Now build the research workflow that ties the state store and agent together. This workflow uses an agent to research a topic by searching the web and synthesizing results:
Adding Features
Feature 1: Core Capability — Multi-Step Orchestration
Real agentic systems involve multiple specialized agents. Build an orchestrator that decomposes a task, delegates to specialized agents, and synthesizes results:
Feature 2: Extensions — Streaming Responses
Agentic tasks take time. Stream intermediate progress via Server-Sent Events so the UI stays responsive:
Feature 3: Polish — Background Task Execution
For long-running workflows, don't block the HTTP request. Use FastAPI's BackgroundTasks to start the workflow asynchronously and return the ID immediately:
Need a second opinion on your AI systems architecture?
I run free 30-minute strategy calls for engineering teams tackling this exact problem.
Book a Free CallError Handling
Error Classification
In agentic systems, errors fall into distinct categories that require different recovery strategies:
| Category | Examples | Strategy |
|---|---|---|
| Transient | Rate limit, timeout, network blip | Retry with backoff |
| Model | Malformed tool call JSON, refusal | Prompt correction + retry |
| Tool | External API down, bad URL | Return error to model, let it adapt |
| Logic | Infinite loop, contradictory goals | Max iterations + circuit breaker |
| Fatal | Invalid API key, disk full | Fail fast, alert on-call |
Recovery Strategies
Tool failures should be returned to the model as structured errors — the model can often adapt its strategy. Only escalate to workflow failure when the agent cannot make progress:
Detect loop behavior — if the model calls the same tool with the same arguments repeatedly, it's stuck:
User-Facing Messages
Never expose raw LLM errors or stack traces to API consumers. Map internal states to clean, actionable messages:
Deployment
Environment Configuration
All configuration goes through environment variables — no hardcoded values, no config files committed to the repo:
For production Redis, use Redis Cluster or Upstash with TLS:
Dockerfile for the FastAPI service:
CI/CD Pipeline
Monitoring Setup
Instrument your workflows so you can debug production issues without guessing:
Key metrics to track in Prometheus/Datadog:
workflow_duration_seconds— histogram by workflow typeagent_iterations_count— histogram to catch runaway agentsllm_tokens_total— counter by model for cost attributiontool_call_errors_total— counter by tool and error typeworkflow_status_total— counter by status (completed/failed)
Next Steps
Recommended Extensions
Persistent memory across workflow runs. Currently, each workflow starts fresh. Add a vector store (Pinecone, Qdrant) to embed and retrieve relevant context from past runs. This lets the agent say "I already researched this last week" and avoid redundant work.
Human-in-the-loop interruption. Set workflow status to AWAITING_INPUT mid-execution and expose a POST /workflows/{id}/input endpoint. The agent pauses at a checkpoint, waits for user input, then resumes. Critical for high-stakes decisions.
Agent evaluation framework. Build a test harness that runs each agent against fixed scenarios and scores the output. Track scores in CI — catch regressions before they reach production.
Cost guardrails. Track token consumption per workflow and abort with a clear error when it exceeds your budget. A rogue prompt that triggers 50 LLM calls costs real money.
Multi-model routing. Route simple tool calls to gpt-4o-mini and complex reasoning to gpt-4o. This cuts token costs by 60-80% on typical workloads without measurable quality loss.
Further Reading
- OpenAI Function Calling documentation — canonical reference for tool calling patterns
- LangGraph — graph-based workflow orchestration with first-class support for cycles, branching, and human-in-the-loop
- Instructor — enforces structured output from LLMs using Pydantic, eliminates the JSON parsing fragility in planner responses
- Prefect — production workflow orchestration that integrates well with agentic pipelines when you need durability guarantees beyond Redis TTLs
Conclusion
FastAPI's async-native design makes it a natural fit for agentic AI backends where every operation — LLM calls, tool invocations, state persistence — is I/O-bound. The architecture laid out here separates concerns cleanly: FastAPI handles HTTP routing and request validation, Pydantic models enforce typed state across workflow steps, Redis persists workflow state for crash recovery, and tenacity handles the retry logic that production LLM integrations demand.
The key implementation decisions that pay off in production: use structured logging with a run ID propagated through every step, persist workflow state externally so workers are stateless and horizontally scalable, validate LLM outputs with Pydantic before acting on them, and version your prompts as files rather than inline strings. Start with a single-agent workflow backed by a task queue, get it stable in production, and add multi-agent orchestration only when you have empirical evidence that a single agent cannot handle the task. The framework gives you the async primitives and validation layer — the production reliability comes from how consistently you apply them.