When our team at a Series B SaaS company noticed that 62% of new signups never completed their first core action, we knew the onboarding flow was the bottleneck. Over six months, we redesigned the entire onboarding system to handle 50,000+ new users per month while improving activation rates from 38% to 71%. This is the full story — architecture decisions, mistakes, and measurable outcomes.
The Problem: Activation Was Bleeding Revenue
Our product — a B2B project management tool — had solid acquisition numbers. Marketing was delivering 50K signups monthly. But the activation funnel told a different story:
- 38% activation rate (defined as creating first project + inviting one teammate)
- Average time-to-value: 14 minutes (industry benchmark: under 5 minutes)
- Day-1 retention: 22%
- Support tickets about "getting started": 340/month
The existing onboarding was a five-screen wizard built three years prior. It asked for company size, industry, team role, preferred integrations, and notification preferences — none of which helped users reach their aha moment faster.
Architecture: Event-Driven Onboarding Pipeline
We replaced the monolithic wizard with an event-driven system built on AWS.
System Overview
Key Design Decisions
Decision 1: Progressive disclosure over upfront collection. Instead of asking five questions before the user touches the product, we embedded onboarding into the actual product experience. Each step was a real action — creating a project, adding a task, inviting a teammate — with contextual guidance overlaid.
Decision 2: DynamoDB for progress tracking. We needed sub-10ms reads for checking onboarding state on every page load. DynamoDB gave us single-digit millisecond reads with a simple partition key (userId) and sort key (stepId).
Decision 3: Lambda-based orchestrator. The orchestrator evaluates which step to show next based on the user's current progress, their plan tier, and A/B test assignments. We chose Lambda over a long-running service because onboarding decisions are stateless — all state lives in DynamoDB.
Decision 4: Kinesis for analytics collection. Every onboarding interaction — step views, completions, time-on-step, abandonment — streams through Kinesis into Redshift. This gave us real-time dashboards without impacting the critical path.
Implementation: The Checklist Pattern
We settled on what I call the "checklist pattern" — a persistent, collapsible sidebar widget that tracks progress and provides quick-jump navigation to incomplete steps.
Client-Side Architecture
Contextual Tooltips Over Modal Wizards
The biggest UX win was replacing modal wizards with contextual tooltips anchored to actual UI elements. When a user needed to create their first project, we highlighted the "New Project" button with a pulsing indicator and a tooltip explaining the action — instead of showing a disconnected wizard screen.
Need a second opinion on your saas engineering architecture?
I run free 30-minute strategy calls for engineering teams tackling this exact problem.
Book a Free CallWhat Went Wrong
Mistake 1: Over-Engineering the Step Graph
Our initial design supported arbitrary DAG-based step dependencies with conditional branching based on 12 different user attributes. In practice, we used three linear flows (solo users, team users, enterprise users). The complexity added two weeks of development time and made debugging step ordering issues significantly harder. We eventually simplified to three hardcoded flows.
Mistake 2: WebSocket for Progress Updates
We initially used WebSocket connections to push real-time onboarding progress updates to the client. For a feature where users typically complete steps minutes apart, polling every 30 seconds would have been simpler, cheaper (no persistent connection infrastructure), and equally effective. We burned a week on WebSocket reconnection logic and connection management.
Mistake 3: Not Instrumenting Abandonment Early
We had completion tracking from day one but didn't add abandonment tracking until month two. This meant we couldn't identify where users were dropping off during the critical first four weeks. When we finally added it, we discovered that 40% of abandonments happened at the "invite teammate" step — users on free plans didn't have teammates to invite. We made that step skippable, and activation jumped 8 percentage points.
Results After Six Months
| Metric | Before | After | Change |
|---|---|---|---|
| Activation rate | 38% | 71% | +87% |
| Time-to-value | 14 min | 4.2 min | -70% |
| Day-1 retention | 22% | 41% | +86% |
| Onboarding support tickets | 340/mo | 89/mo | -74% |
| Onboarding completion rate | 31% | 68% | +119% |
Infrastructure Costs
The entire onboarding system costs approximately $340/month at 50K users/month:
- DynamoDB: $45/month (on-demand billing, ~2M reads + 500K writes)
- Lambda (orchestrator): $12/month (~500K invocations)
- Kinesis + Redshift: $280/month (analytics pipeline)
- API Gateway: $3/month
The Kinesis-to-Redshift pipeline accounts for 82% of the cost. If I rebuilt this today, I'd use Kinesis Data Firehose direct to S3 with Athena for querying, cutting the analytics cost by roughly 60%.
What I Would Change
If starting over, three things would be different:
-
Start with three hardcoded flows, not a generic engine. Build the abstraction only when you have evidence of needing a fourth flow. We never did.
-
Use server-sent events instead of WebSockets. Onboarding updates are unidirectional (server → client). SSE is simpler to implement, works through proxies without special configuration, and automatically reconnects.
-
Build abandonment tracking before completion tracking. Knowing where users fail is more actionable than knowing where they succeed. The first dashboard should show drop-off points, not completion funnels.
Conclusion
SaaS onboarding is a systems problem masquerading as a UX problem. The visual design of the onboarding flow matters, but the infrastructure underneath — progress tracking, step orchestration, analytics — determines whether you can iterate fast enough to find what works.
The event-driven architecture gave us the flexibility to run A/B tests on step ordering, content, and flow structure without redeploying the application. DynamoDB's consistent single-digit millisecond reads meant the onboarding state check never added perceptible latency. And shipping the analytics pipeline from day one (even if we should have tracked abandonment earlier) meant every decision was backed by data.
The 38% → 71% activation improvement translated to roughly $2.1M in additional annual revenue from users who would have otherwise churned before experiencing the product's value. That number alone justified the six months of engineering investment.