Back to Journal
SaaS Engineering

SaaS Onboarding Flows at Scale: Lessons from Production

Real-world lessons from implementing SaaS Onboarding Flows in production, including architecture decisions, measurable results, and honest retrospectives.

Muneer Puthiya Purayil 11 min read

When our team at a Series B SaaS company noticed that 62% of new signups never completed their first core action, we knew the onboarding flow was the bottleneck. Over six months, we redesigned the entire onboarding system to handle 50,000+ new users per month while improving activation rates from 38% to 71%. This is the full story — architecture decisions, mistakes, and measurable outcomes.

The Problem: Activation Was Bleeding Revenue

Our product — a B2B project management tool — had solid acquisition numbers. Marketing was delivering 50K signups monthly. But the activation funnel told a different story:

  • 38% activation rate (defined as creating first project + inviting one teammate)
  • Average time-to-value: 14 minutes (industry benchmark: under 5 minutes)
  • Day-1 retention: 22%
  • Support tickets about "getting started": 340/month

The existing onboarding was a five-screen wizard built three years prior. It asked for company size, industry, team role, preferred integrations, and notification preferences — none of which helped users reach their aha moment faster.

Architecture: Event-Driven Onboarding Pipeline

We replaced the monolithic wizard with an event-driven system built on AWS.

System Overview

1┌─────────────┐ ┌──────────────┐ ┌─────────────────┐
2│ Client App │────▶│ API Gateway │────▶│ Onboarding │
3│ (React) │ │ (Kong) │ │ Orchestrator │
4└─────────────┘ └──────────────┘ │ (Lambda) │
5 └────────┬────────┘
6
7 ┌─────────────────────────┼──────────────────┐
8 │ │ │
9 ┌────▼─────┐ ┌──────▼───┐ ┌──────▼──────┐
10 │ Progress │ │ Template │ │ Analytics │
11 │ Tracker │ │ Engine │ │ Collector │
12 │ (DynamoDB)│ │ (Lambda) │ │ (Kinesis → │
13 └──────────┘ └──────────┘ │ Redshift) │
14 └─────────────┘
15 

Key Design Decisions

Decision 1: Progressive disclosure over upfront collection. Instead of asking five questions before the user touches the product, we embedded onboarding into the actual product experience. Each step was a real action — creating a project, adding a task, inviting a teammate — with contextual guidance overlaid.

Decision 2: DynamoDB for progress tracking. We needed sub-10ms reads for checking onboarding state on every page load. DynamoDB gave us single-digit millisecond reads with a simple partition key (userId) and sort key (stepId).

typescript
1// DynamoDB schema for onboarding progress
2const OnboardingProgressSchema = {
3 TableName: 'onboarding-progress',
4 KeySchema: [
5 { AttributeName: 'userId', KeyType: 'HASH' },
6 { AttributeName: 'stepId', KeyType: 'RANGE' }
7 ],
8 AttributeDefinitions: [
9 { AttributeName: 'userId', AttributeType: 'S' },
10 { AttributeName: 'stepId', AttributeType: 'S' }
11 ],
12 BillingMode: 'PAY_PER_REQUEST'
13};
14 
15// Progress record structure
16interface OnboardingStep {
17 userId: string;
18 stepId: string; // e.g., "create_project", "invite_teammate"
19 status: 'pending' | 'active' | 'completed' | 'skipped';
20 startedAt?: string;
21 completedAt?: string;
22 metadata: Record<string, unknown>;
23 ttl: number; // Auto-cleanup after 90 days
24}
25 

Decision 3: Lambda-based orchestrator. The orchestrator evaluates which step to show next based on the user's current progress, their plan tier, and A/B test assignments. We chose Lambda over a long-running service because onboarding decisions are stateless — all state lives in DynamoDB.

typescript
1export async function determineNextStep(
2 userId: string,
3 completedSteps: OnboardingStep[]
4): Promise<OnboardingStep> {
5 const userProfile = await getUserProfile(userId);
6 const abVariant = await getABVariant(userId, 'onboarding-v2');
7 
8 const stepGraph: StepDefinition[] = [
9 {
10 id: 'create_project',
11 required: true,
12 condition: () => true,
13 },
14 {
15 id: 'add_first_task',
16 required: true,
17 condition: () => true,
18 dependsOn: ['create_project'],
19 },
20 {
21 id: 'invite_teammate',
22 required: false,
23 condition: (profile) => profile.plan !== 'solo',
24 dependsOn: ['create_project'],
25 },
26 {
27 id: 'connect_integration',
28 required: false,
29 condition: (_, variant) => variant === 'with_integrations',
30 dependsOn: ['create_project'],
31 },
32 ];
33 
34 const completedIds = new Set(
35 completedSteps.filter(s => s.status === 'completed').map(s => s.stepId)
36 );
37 
38 return stepGraph.find(step => {
39 if (completedIds.has(step.id)) return false;
40 const depsResolved = (step.dependsOn ?? []).every(d => completedIds.has(d));
41 return depsResolved && step.condition(userProfile, abVariant);
42 });
43}
44 

Decision 4: Kinesis for analytics collection. Every onboarding interaction — step views, completions, time-on-step, abandonment — streams through Kinesis into Redshift. This gave us real-time dashboards without impacting the critical path.

Implementation: The Checklist Pattern

We settled on what I call the "checklist pattern" — a persistent, collapsible sidebar widget that tracks progress and provides quick-jump navigation to incomplete steps.

Client-Side Architecture

typescript
1// React context for onboarding state
2interface OnboardingContext {
3 steps: OnboardingStep[];
4 currentStep: OnboardingStep | null;
5 completionPercentage: number;
6 markComplete: (stepId: string, metadata?: Record<string, unknown>) => Promise<void>;
7 skipStep: (stepId: string) => Promise<void>;
8 dismiss: () => void;
9}
10 
11function useOnboarding(): OnboardingContext {
12 const { user } = useAuth();
13 const [steps, setSteps] = useState<OnboardingStep[]>([]);
14 
15 useEffect(() => {
16 if (!user) return;
17 
18 const fetchProgress = async () => {
19 const response = await api.get(`/onboarding/progress/${user.id}`);
20 setSteps(response.data.steps);
21 };
22 
23 fetchProgress();
24 
25 // Real-time updates via WebSocket
26 const ws = new WebSocket(`${WS_URL}/onboarding?userId=${user.id}`);
27 ws.onmessage = (event) => {
28 const update = JSON.parse(event.data);
29 setSteps(prev =>
30 prev.map(s => s.stepId === update.stepId ? { ...s, ...update } : s)
31 );
32 };
33 
34 return () => ws.close();
35 }, [user]);
36 
37 const markComplete = async (stepId: string, metadata = {}) => {
38 await api.post(`/onboarding/complete`, { stepId, metadata });
39 // Optimistic update
40 setSteps(prev =>
41 prev.map(s =>
42 s.stepId === stepId
43 ? { ...s, status: 'completed', completedAt: new Date().toISOString() }
44 : s
45 )
46 );
47 };
48 
49 return {
50 steps,
51 currentStep: steps.find(s => s.status === 'active') ?? null,
52 completionPercentage: (steps.filter(s => s.status === 'completed').length / steps.length) * 100,
53 markComplete,
54 skipStep: (stepId) => api.post(`/onboarding/skip`, { stepId }),
55 dismiss: () => api.post(`/onboarding/dismiss`),
56 };
57}
58 

Contextual Tooltips Over Modal Wizards

The biggest UX win was replacing modal wizards with contextual tooltips anchored to actual UI elements. When a user needed to create their first project, we highlighted the "New Project" button with a pulsing indicator and a tooltip explaining the action — instead of showing a disconnected wizard screen.

typescript
1function OnboardingTooltip({ stepId, targetSelector, content }: TooltipProps) {
2 const { currentStep } = useOnboarding();
3 const [position, setPosition] = useState({ top: 0, left: 0 });
4 
5 useEffect(() => {
6 if (currentStep?.stepId !== stepId) return;
7 
8 const target = document.querySelector(targetSelector);
9 if (!target) return;
10 
11 const rect = target.getBoundingClientRect();
12 setPosition({
13 top: rect.bottom + 8,
14 left: rect.left + rect.width / 2,
15 });
16 
17 target.classList.add('onboarding-highlight');
18 return () => target.classList.remove('onboarding-highlight');
19 }, [currentStep, stepId, targetSelector]);
20 
21 if (currentStep?.stepId !== stepId) return null;
22 
23 return (
24 <div
25 className="onboarding-tooltip"
26 style={{ top: position.top, left: position.left }}
27 >
28 {content}
29 </div>
30 );
31}
32 

Need a second opinion on your saas engineering architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

What Went Wrong

Mistake 1: Over-Engineering the Step Graph

Our initial design supported arbitrary DAG-based step dependencies with conditional branching based on 12 different user attributes. In practice, we used three linear flows (solo users, team users, enterprise users). The complexity added two weeks of development time and made debugging step ordering issues significantly harder. We eventually simplified to three hardcoded flows.

Mistake 2: WebSocket for Progress Updates

We initially used WebSocket connections to push real-time onboarding progress updates to the client. For a feature where users typically complete steps minutes apart, polling every 30 seconds would have been simpler, cheaper (no persistent connection infrastructure), and equally effective. We burned a week on WebSocket reconnection logic and connection management.

Mistake 3: Not Instrumenting Abandonment Early

We had completion tracking from day one but didn't add abandonment tracking until month two. This meant we couldn't identify where users were dropping off during the critical first four weeks. When we finally added it, we discovered that 40% of abandonments happened at the "invite teammate" step — users on free plans didn't have teammates to invite. We made that step skippable, and activation jumped 8 percentage points.

Results After Six Months

MetricBeforeAfterChange
Activation rate38%71%+87%
Time-to-value14 min4.2 min-70%
Day-1 retention22%41%+86%
Onboarding support tickets340/mo89/mo-74%
Onboarding completion rate31%68%+119%

Infrastructure Costs

The entire onboarding system costs approximately $340/month at 50K users/month:

  • DynamoDB: $45/month (on-demand billing, ~2M reads + 500K writes)
  • Lambda (orchestrator): $12/month (~500K invocations)
  • Kinesis + Redshift: $280/month (analytics pipeline)
  • API Gateway: $3/month

The Kinesis-to-Redshift pipeline accounts for 82% of the cost. If I rebuilt this today, I'd use Kinesis Data Firehose direct to S3 with Athena for querying, cutting the analytics cost by roughly 60%.

What I Would Change

If starting over, three things would be different:

  1. Start with three hardcoded flows, not a generic engine. Build the abstraction only when you have evidence of needing a fourth flow. We never did.

  2. Use server-sent events instead of WebSockets. Onboarding updates are unidirectional (server → client). SSE is simpler to implement, works through proxies without special configuration, and automatically reconnects.

  3. Build abandonment tracking before completion tracking. Knowing where users fail is more actionable than knowing where they succeed. The first dashboard should show drop-off points, not completion funnels.

Conclusion

SaaS onboarding is a systems problem masquerading as a UX problem. The visual design of the onboarding flow matters, but the infrastructure underneath — progress tracking, step orchestration, analytics — determines whether you can iterate fast enough to find what works.

The event-driven architecture gave us the flexibility to run A/B tests on step ordering, content, and flow structure without redeploying the application. DynamoDB's consistent single-digit millisecond reads meant the onboarding state check never added perceptible latency. And shipping the analytics pipeline from day one (even if we should have tracked abandonment earlier) meant every decision was backed by data.

The 38% → 71% activation improvement translated to roughly $2.1M in additional annual revenue from users who would have otherwise churned before experiencing the product's value. That number alone justified the six months of engineering investment.

FAQ

Need expert help?

Building with saas engineering?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

Engage

Start a
Conversation.

For teams building at scale: SaaS platforms, agentic AI systems, and enterprise mobile infrastructure. Scope and fit are evaluated before any engagement begins.

Limited availability · Q3 / Q4 2026