How do you handle users who return after abandoning onboarding?

We store onboarding progress with a 90-day TTL in DynamoDB. When a returning user loads the app, the orchestrator checks for existing progress and resumes from the last incomplete step. If the TTL has expired, we restart the flow but skip steps where the user has already taken the underlying action (e.g., if they created a project outside the onboarding flow, that step auto-completes).

What is the right number of onboarding steps for a SaaS product?

Based on our testing, 3-5 steps targeting the core activation metric works best. We tested flows with 7 steps and saw 23% lower completion rates compared to 4-step flows. Each step should correspond to a concrete action that brings the user closer to their aha moment — not data collection for your CRM.

How do you A/B test onboarding flows without corrupting the user experience?

We assign A/B variants at signup time and store them alongside the user profile. The orchestrator reads the variant assignment when determining the step graph. Critically, once a user starts a flow variant, they stay in that variant even if we roll out a new test — we never switch users mid-flow. We use a separate "onboarding_ab_assignments" table to track this.

Should onboarding be skippable?

Every non-critical step should be skippable, and we display a "skip for now" option on all optional steps. In our data, 15% of users who skip steps during onboarding complete them organically within 7 days. Forcing completion creates resentment; allowing skipping preserves agency. The only step we don't allow skipping is the single action that defines activation (creating a first project).

SaaS Onboarding Flows at Scale: Lessons from Production

When our team at a Series B SaaS company noticed that 62% of new signups never completed their first core action, we knew the onboarding flow was the bottleneck. Over six months, we redesigned the entire onboarding system to handle 50,000+ new users per month while improving activation rates from 38% to 71%. This is the full story — architecture decisions, mistakes, and measurable outcomes.

The Problem: Activation Was Bleeding Revenue

Our product — a B2B project management tool — had solid acquisition numbers. Marketing was delivering 50K signups monthly. But the activation funnel told a different story:

38% activation rate (defined as creating first project + inviting one teammate)
Average time-to-value: 14 minutes (industry benchmark: under 5 minutes)
Day-1 retention: 22%
Support tickets about "getting started": 340/month

The existing onboarding was a five-screen wizard built three years prior. It asked for company size, industry, team role, preferred integrations, and notification preferences — none of which helped users reach their aha moment faster.

Architecture: Event-Driven Onboarding Pipeline

We replaced the monolithic wizard with an event-driven system built on AWS.

System Overview

1┌─────────────┐ ┌──────────────┐ ┌─────────────────┐

2│ Client App │────▶│ API Gateway │────▶│ Onboarding │

3│ (React) │ │ (Kong) │ │ Orchestrator │

4└─────────────┘ └──────────────┘ │ (Lambda) │

5 └────────┬────────┘

6 │

7 ┌─────────────────────────┼──────────────────┐

8 │ │ │

9 ┌────▼─────┐ ┌──────▼───┐ ┌──────▼──────┐

10 │ Progress │ │ Template │ │ Analytics │

11 │ Tracker │ │ Engine │ │ Collector │

12 │ (DynamoDB)│ │ (Lambda) │ │ (Kinesis → │

13 └──────────┘ └──────────┘ │ Redshift) │

14 └─────────────┘

Key Design Decisions

Decision 1: Progressive disclosure over upfront collection. Instead of asking five questions before the user touches the product, we embedded onboarding into the actual product experience. Each step was a real action — creating a project, adding a task, inviting a teammate — with contextual guidance overlaid.

Decision 2: DynamoDB for progress tracking. We needed sub-10ms reads for checking onboarding state on every page load. DynamoDB gave us single-digit millisecond reads with a simple partition key (userId) and sort key (stepId).

typescript

1// DynamoDB schema for onboarding progress

2const OnboardingProgressSchema = {

3 TableName: 'onboarding-progress',

4 KeySchema: [

5 { AttributeName: 'userId', KeyType: 'HASH' },

6 { AttributeName: 'stepId', KeyType: 'RANGE' }

7 ],

8 AttributeDefinitions: [

9 { AttributeName: 'userId', AttributeType: 'S' },

10 { AttributeName: 'stepId', AttributeType: 'S' }

11 ],

12 BillingMode: 'PAY_PER_REQUEST'

13};

15// Progress record structure

16interface OnboardingStep {

17 userId: string;

18 stepId: string; // e.g., "create_project", "invite_teammate"

19 status: 'pending' | 'active' | 'completed' | 'skipped';

20 startedAt?: string;

21 completedAt?: string;

22 metadata: Record<string, unknown>;

23 ttl: number; // Auto-cleanup after 90 days

24}

Decision 3: Lambda-based orchestrator. The orchestrator evaluates which step to show next based on the user's current progress, their plan tier, and A/B test assignments. We chose Lambda over a long-running service because onboarding decisions are stateless — all state lives in DynamoDB.

typescript

1export async function determineNextStep(

2 userId: string,

3 completedSteps: OnboardingStep[]

4): Promise<OnboardingStep> {

5 const userProfile = await getUserProfile(userId);

6 const abVariant = await getABVariant(userId, 'onboarding-v2');

8 const stepGraph: StepDefinition[] = [

9 {

10 id: 'create_project',

11 required: true,

12 condition: () => true,

13 },

14 {

15 id: 'add_first_task',

16 required: true,

17 condition: () => true,

18 dependsOn: ['create_project'],

19 },

20 {

21 id: 'invite_teammate',

22 required: false,

23 condition: (profile) => profile.plan !== 'solo',

24 dependsOn: ['create_project'],

25 },

26 {

27 id: 'connect_integration',

28 required: false,

29 condition: (_, variant) => variant === 'with_integrations',

30 dependsOn: ['create_project'],

31 },

32 ];

34 const completedIds = new Set(

35 completedSteps.filter(s => s.status === 'completed').map(s => s.stepId)

36 );

38 return stepGraph.find(step => {

39 if (completedIds.has(step.id)) return false;

40 const depsResolved = (step.dependsOn ?? []).every(d => completedIds.has(d));

41 return depsResolved && step.condition(userProfile, abVariant);

42 });

43}

Decision 4: Kinesis for analytics collection. Every onboarding interaction — step views, completions, time-on-step, abandonment — streams through Kinesis into Redshift. This gave us real-time dashboards without impacting the critical path.

Implementation: The Checklist Pattern

We settled on what I call the "checklist pattern" — a persistent, collapsible sidebar widget that tracks progress and provides quick-jump navigation to incomplete steps.

Client-Side Architecture

typescript

1// React context for onboarding state

2interface OnboardingContext {

3 steps: OnboardingStep[];

4 currentStep: OnboardingStep | null;

5 completionPercentage: number;

6 markComplete: (stepId: string, metadata?: Record<string, unknown>) => Promise<void>;

7 skipStep: (stepId: string) => Promise<void>;

8 dismiss: () => void;

11function useOnboarding(): OnboardingContext {

12 const { user } = useAuth();

13 const [steps, setSteps] = useState<OnboardingStep[]>([]);

15 useEffect(() => {

16 if (!user) return;

18 const fetchProgress = async () => {

19 const response = await api.get(`/onboarding/progress/${user.id}`);

20 setSteps(response.data.steps);

21 };

23 fetchProgress();

25 // Real-time updates via WebSocket

26 const ws = new WebSocket(`${WS_URL}/onboarding?userId=${user.id}`);

27 ws.onmessage = (event) => {

28 const update = JSON.parse(event.data);

29 setSteps(prev =>

30 prev.map(s => s.stepId === update.stepId ? { ...s, ...update } : s)

31 );

32 };

34 return () => ws.close();

35 }, [user]);

37 const markComplete = async (stepId: string, metadata = {}) => {

38 await api.post(`/onboarding/complete`, { stepId, metadata });

39 // Optimistic update

40 setSteps(prev =>

41 prev.map(s =>

42 s.stepId === stepId

43 ? { ...s, status: 'completed', completedAt: new Date().toISOString() }

44 : s

45 )

46 );

47 };

49 return {

50 steps,

51 currentStep: steps.find(s => s.status === 'active') ?? null,

52 completionPercentage: (steps.filter(s => s.status === 'completed').length / steps.length) * 100,

53 markComplete,

54 skipStep: (stepId) => api.post(`/onboarding/skip`, { stepId }),

55 dismiss: () => api.post(`/onboarding/dismiss`),

56 };

57}

The biggest UX win was replacing modal wizards with contextual tooltips anchored to actual UI elements. When a user needed to create their first project, we highlighted the "New Project" button with a pulsing indicator and a tooltip explaining the action — instead of showing a disconnected wizard screen.

typescript

1function OnboardingTooltip({ stepId, targetSelector, content }: TooltipProps) {

2 const { currentStep } = useOnboarding();

3 const [position, setPosition] = useState({ top: 0, left: 0 });

5 useEffect(() => {

6 if (currentStep?.stepId !== stepId) return;

8 const target = document.querySelector(targetSelector);

9 if (!target) return;

11 const rect = target.getBoundingClientRect();

12 setPosition({

13 top: rect.bottom + 8,

14 left: rect.left + rect.width / 2,

15 });

17 target.classList.add('onboarding-highlight');

18 return () => target.classList.remove('onboarding-highlight');

19 }, [currentStep, stepId, targetSelector]);

21 if (currentStep?.stepId !== stepId) return null;

23 return (

24 <div

25 className="onboarding-tooltip"

26 style={{ top: position.top, left: position.left }}

27 >

28 {content}

29 </div>

30 );

31}

Need a second opinion on your saas engineering architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

What Went Wrong

Mistake 1: Over-Engineering the Step Graph

Our initial design supported arbitrary DAG-based step dependencies with conditional branching based on 12 different user attributes. In practice, we used three linear flows (solo users, team users, enterprise users). The complexity added two weeks of development time and made debugging step ordering issues significantly harder. We eventually simplified to three hardcoded flows.

Mistake 2: WebSocket for Progress Updates

We initially used WebSocket connections to push real-time onboarding progress updates to the client. For a feature where users typically complete steps minutes apart, polling every 30 seconds would have been simpler, cheaper (no persistent connection infrastructure), and equally effective. We burned a week on WebSocket reconnection logic and connection management.

Mistake 3: Not Instrumenting Abandonment Early

We had completion tracking from day one but didn't add abandonment tracking until month two. This meant we couldn't identify where users were dropping off during the critical first four weeks. When we finally added it, we discovered that 40% of abandonments happened at the "invite teammate" step — users on free plans didn't have teammates to invite. We made that step skippable, and activation jumped 8 percentage points.

Results After Six Months

Metric	Before	After	Change
Activation rate	38%	71%	+87%
Time-to-value	14 min	4.2 min	-70%
Day-1 retention	22%	41%	+86%
Onboarding support tickets	340/mo	89/mo	-74%
Onboarding completion rate	31%	68%	+119%

Infrastructure Costs

The entire onboarding system costs approximately $340/month at 50K users/month:

DynamoDB: $45/month (on-demand billing, ~2M reads + 500K writes)
Lambda (orchestrator): $12/month (~500K invocations)
Kinesis + Redshift: $280/month (analytics pipeline)
API Gateway: $3/month

The Kinesis-to-Redshift pipeline accounts for 82% of the cost. If I rebuilt this today, I'd use Kinesis Data Firehose direct to S3 with Athena for querying, cutting the analytics cost by roughly 60%.

What I Would Change

If starting over, three things would be different:

Start with three hardcoded flows, not a generic engine. Build the abstraction only when you have evidence of needing a fourth flow. We never did.
Use server-sent events instead of WebSockets. Onboarding updates are unidirectional (server → client). SSE is simpler to implement, works through proxies without special configuration, and automatically reconnects.
Build abandonment tracking before completion tracking. Knowing where users fail is more actionable than knowing where they succeed. The first dashboard should show drop-off points, not completion funnels.

Conclusion

SaaS onboarding is a systems problem masquerading as a UX problem. The visual design of the onboarding flow matters, but the infrastructure underneath — progress tracking, step orchestration, analytics — determines whether you can iterate fast enough to find what works.

The event-driven architecture gave us the flexibility to run A/B tests on step ordering, content, and flow structure without redeploying the application. DynamoDB's consistent single-digit millisecond reads meant the onboarding state check never added perceptible latency. And shipping the analytics pipeline from day one (even if we should have tracked abandonment earlier) meant every decision was backed by data.

The 38% → 71% activation improvement translated to roughly $2.1M in additional annual revenue from users who would have otherwise churned before experiencing the product's value. That number alone justified the six months of engineering investment.

FAQ

Need expert help?

Building with saas engineering?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Book a Free Call Send a Brief

onboarding user-experience activation saas aws case-study

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

View Portfolio Book a Call

← Previous

SaaS Onboarding Flows at Scale: Lessons from Production

The Problem: Activation Was Bleeding Revenue

Architecture: Event-Driven Onboarding Pipeline

System Overview

Key Design Decisions

Implementation: The Checklist Pattern

Client-Side Architecture

What Went Wrong

Mistake 1: Over-Engineering the Step Graph

Mistake 2: WebSocket for Progress Updates

Mistake 3: Not Instrumenting Abandonment Early

Results After Six Months

Infrastructure Costs

What I Would Change

Conclusion

FAQ

Building with saas engineering?

SaaS Onboarding Flows Best Practices for Enterprise Teams

SaaS Onboarding Flows Best Practices for Startup Teams

How to Build SaaS Onboarding Flows Using React

Complete Guide to Event-Driven Architecture with Typescript

SaaS Onboarding Flows Best Practices for Enterprise Teams

Start a
Conversation.

The Problem: Activation Was Bleeding Revenue

Architecture: Event-Driven Onboarding Pipeline

System Overview

Key Design Decisions

Implementation: The Checklist Pattern

Client-Side Architecture

Contextual Tooltips Over Modal Wizards

What Went Wrong

Mistake 1: Over-Engineering the Step Graph

Mistake 2: WebSocket for Progress Updates

Mistake 3: Not Instrumenting Abandonment Early

Results After Six Months

Infrastructure Costs

What I Would Change

Conclusion

FAQ

Building with saas engineering?

SaaS Onboarding Flows Best Practices for Enterprise Teams

SaaS Onboarding Flows Best Practices for Startup Teams

How to Build SaaS Onboarding Flows Using React

Complete Guide to Event-Driven Architecture with Typescript

SaaS Onboarding Flows Best Practices for Enterprise Teams

Start aConversation.

Start a
Conversation.