How long did the RLS migration take?

Phase 1 (RLS on shared schema) took 3 weeks: 1 week for implementation, 1 week for testing, 1 week for gradual rollout. Phase 2 (schema-per-tenant for enterprise) took 6 weeks. Total project: 3 months with two engineers.

Did any queries break during the RLS rollout?

Yes — 12 queries that intentionally crossed tenant boundaries (admin tools, analytics aggregation, billing calculations). We identified these during testing and created a dedicated database role that bypasses RLS, used exclusively by internal admin services with audit logging.

How do you handle database migrations with schema-per-tenant?

We built a migration orchestrator that applies Rails migrations to all 15 enterprise schemas in parallel. Each migration runs in its own transaction with automatic rollback on failure. The orchestrator reports per-schema migration status and alerts on any failure. Total migration time: 30-60 seconds for 15 schemas.

What monitoring do you have for tenant isolation?

Three layers: (1) automated tests that attempt cross-tenant access on every API endpoint (runs on every PR), (2) runtime monitoring that alerts if any query executes without `app.current_tenant` set, (3) weekly audit of all cross-tenant admin queries with review by the security team.

Multi-Tenant Architecture at Scale: Lessons from Production

In 2024, we rebuilt the multi-tenant architecture of a B2B project management SaaS serving 2,400 tenants. The original architecture — shared database with application-level tenant filtering — had produced three cross-tenant data leakage incidents in six months. This is the story of migrating to a robust multi-tenant architecture under production load.

Starting Point

The application was a Rails-based project management tool on AWS. All 2,400 tenants shared a single PostgreSQL RDS instance with no row-level security. Tenant isolation relied entirely on application code: every ActiveRecord scope included where(tenant_id: current_tenant.id). Three incidents in six months proved this approach insufficient:

Incident 1: A developer forgot the tenant scope on a new API endpoint. 12 tenants' project data was visible to any authenticated user for 3 hours before detection.
Incident 2: A background job processing queue lost tenant context when retrying failed jobs, causing file attachments to be associated with the wrong tenant.
Incident 3: A search feature indexed all tenants' data without the tenant filter, exposing project names and descriptions across tenants for 2 days.

After the third incident, three enterprise customers threatened to cancel ($180,000 ARR at risk). The board approved a 3-month engineering investment to fix the architecture.

Architecture Decisions

Why PostgreSQL RLS + Schema-per-Tenant Hybrid

We evaluated three options:

Fix application code (add better testing, code review): Rejected. The root cause was that isolation depended on developer discipline. More discipline wouldn't eliminate the risk.
Row-Level Security on the shared schema: Implemented as phase 1. Provides database-enforced isolation without data migration.
Schema-per-tenant for enterprise customers: Implemented as phase 2. Provides physical isolation for the 15 enterprise customers that demanded it.

Migration Strategy

sql

1-- Phase 1: Add RLS to all existing tables

2ALTER TABLE projects ENABLE ROW LEVEL SECURITY;

3ALTER TABLE projects FORCE ROW LEVEL SECURITY;

5CREATE POLICY projects_tenant_isolation ON projects

6 USING (tenant_id = current_setting('app.current_tenant')::uuid);

8-- Repeat for all 34 tables with tenant_id

The RLS migration was non-destructive — it added policies without changing data. We deployed it table-by-table over two weeks, monitoring for query failures.

Measurable Results

Metric	Before	After	Change
Cross-tenant data incidents	3 in 6 months	0 in 12 months	-100%
Enterprise customer churn risk	$180K ARR	$0	Eliminated
Query latency (p50)	12ms	13ms	+8%
Query latency (p99)	145ms	152ms	+5%
Monthly infrastructure cost	$4,200	$5,800	+38%

The 38% cost increase came from the schema-per-tenant infrastructure for 15 enterprise customers. The 5-8% latency increase from RLS policy evaluation was negligible.

Need a second opinion on your saas engineering architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

What Went Wrong

RLS broke admin panels. Our internal admin tools queried across tenants for support and analytics. RLS correctly blocked these queries. We had to create a separate database role for admin operations that bypassed RLS, with audit logging on every cross-tenant query.

Background jobs lost tenant context. Sidekiq workers inherited the web request's tenant context through thread-local variables, but on retry, this context was lost. We had to serialize tenant_id into every job's payload and set the PostgreSQL session variable at the start of each job execution.

Schema migrations for 15 tenants. Applying migrations to 15 separate schemas increased deployment time from 30 seconds to 4 minutes. We parallelized schema migrations to bring this back to under 1 minute.

Honest Retrospective

The biggest win was eliminating cross-tenant incidents entirely. In 12 months since the migration, zero data leakage events. This alone justified the investment.

What we'd do differently:

Start with RLS from day one. The retrofit cost 3 engineer-months; implementing RLS at project start would have been 2 days.
Design background jobs with explicit tenant context from the beginning. Implicit context through thread-local variables is fragile.
Implement cross-tenant access testing in CI before the first incident, not after three.

Conclusion

Multi-tenant data isolation cannot depend on application code alone. PostgreSQL RLS provides database-enforced isolation that prevents cross-tenant access regardless of application bugs, forgotten WHERE clauses, or lost context in background jobs. The performance overhead is minimal (5-8% latency increase), and the operational overhead is manageable. For any SaaS handling sensitive customer data, RLS should be enabled from day one — retrofitting it is 10x more expensive than building it in.

FAQ

Need expert help?

Building with saas engineering?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Book a Free Call Send a Brief

multi-tenancy saas architecture isolation aws case-study

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

View Portfolio Book a Call

← Previous

Multi-Tenant Architecture at Scale: Lessons from Production

Starting Point

Architecture Decisions

Why PostgreSQL RLS + Schema-per-Tenant Hybrid

Migration Strategy

Measurable Results

What Went Wrong

Honest Retrospective

Conclusion

FAQ

Building with saas engineering?

Multi-Tenant Architecture Best Practices for High Scale Teams

Multi-Tenant Architecture Best Practices for Enterprise Teams

Multi-Tenant Architecture Best Practices for Startup Teams

Complete Guide to Subscription Billing Systems with Typescript

Multi-Tenant Architecture Best Practices for High Scale Teams

Start a
Conversation.

Starting Point

Architecture Decisions

Why PostgreSQL RLS + Schema-per-Tenant Hybrid

Migration Strategy

Measurable Results

What Went Wrong

Honest Retrospective

Conclusion

FAQ

Building with saas engineering?

Multi-Tenant Architecture Best Practices for High Scale Teams

Multi-Tenant Architecture Best Practices for Enterprise Teams

Multi-Tenant Architecture Best Practices for Startup Teams

Complete Guide to Subscription Billing Systems with Typescript

Multi-Tenant Architecture Best Practices for High Scale Teams

Start aConversation.

Start a
Conversation.