How many tenants can you put on a single database shard?

It depends on workload, but a general guideline is 100-500 tenants per shard for a db.r6g.xlarge (4 vCPU, 32GB RAM) handling typical SaaS CRUD workloads. Monitor per-shard CPU, IOPS, and connection count. Rebalance when any shard consistently exceeds 70% resource utilization.

How do you handle tenant data migration between shards?

Use dual-write during migration: write to both old and new shards, then switch reads to the new shard, then stop writing to the old shard. This provides zero-downtime migration. The migration tool should be idempotent and resumable to handle failures gracefully.

When should you move from shared infrastructure to dedicated per-tenant infrastructure?

When a tenant's willingness to pay justifies the operational overhead. A tenant paying $100/month doesn't justify a dedicated database ($50-200/month in infrastructure). A tenant paying $10,000/month easily justifies dedicated resources. The threshold is typically around $2,000-5,000/month ARR per tenant.

How do you prevent one tenant from consuming all available connections?

Set per-tenant connection limits in the application layer. A connection pool per tenant (or tenant tier) with a maximum size ensures no single tenant can exhaust the database's connection capacity. PgBouncer or ProxySQL can enforce this at the proxy level for database-per-tenant setups.

Multi-Tenant Architecture Best Practices for High Scale Teams

High-scale multi-tenant systems serve thousands of tenants with varying workload patterns, data volumes, and performance requirements. At this scale, the architecture decisions around data partitioning, resource allocation, and tenant routing determine whether the platform remains economically viable.

Tenant-Aware Data Partitioning

Sharding by Tenant

1type ShardRouter struct {

2 shards []DatabaseShard

3 tenantShardMap map[string]int

6func (r *ShardRouter) GetShard(tenantID string) *DatabaseShard {

7 if shardIdx, ok := r.tenantShardMap[tenantID]; ok {

8 return &r.shards[shardIdx]

9 }

10 // Consistent hashing for new tenants

11 hash := crc32.ChecksumIEEE([]byte(tenantID))

12 shardIdx := int(hash) % len(r.shards)

13 r.tenantShardMap[tenantID] = shardIdx

14 return &r.shards[shardIdx]

15}

Noisy Neighbor Detection

1type TenantMetrics struct {

2 mu sync.RWMutex

3 requestCounts map[string]*atomic.Int64

4 resourceUsage map[string]*ResourceUsage

7func (m *TenantMetrics) RecordRequest(tenantID string) {

8 counter, _ := m.requestCounts.LoadOrStore(tenantID, &atomic.Int64{})

9 counter.(*atomic.Int64).Add(1)

10}

12func (m *TenantMetrics) DetectNoisyNeighbors(threshold float64) []string {

13 var total int64

14 counts := make(map[string]int64)

15 m.mu.RLock()

16 for tid, counter := range m.requestCounts {

17 c := counter.Load()

18 counts[tid] = c

19 total += c

20 }

21 m.mu.RUnlock()

23 avg := float64(total) / float64(len(counts))

24 var noisy []string

25 for tid, count := range counts {

26 if float64(count) > avg*threshold {

27 noisy = append(noisy, tid)

28 }

29 }

30 return noisy

31}

Rate Limiting Per Tenant

1type TenantRateLimiter struct {

2 limiters map[string]*rate.Limiter

3 mu sync.RWMutex

4 defaultRate rate.Limit

5 defaultBurst int

8func (trl *TenantRateLimiter) Allow(tenantID string) bool {

9 trl.mu.RLock()

10 limiter, exists := trl.limiters[tenantID]

11 trl.mu.RUnlock()

13 if !exists {

14 limiter = rate.NewLimiter(trl.defaultRate, trl.defaultBurst)

15 trl.mu.Lock()

16 trl.limiters[tenantID] = limiter

17 trl.mu.Unlock()

18 }

20 return limiter.Allow()

21}

Need a second opinion on your saas engineering architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

Anti-Patterns to Avoid

Single-shard hot tenants. A tenant generating 100x average load on a single shard degrades performance for all co-located tenants. Implement automatic tenant migration between shards based on load metrics.

Global rate limits instead of per-tenant. A global rate limit of 10,000 req/s means one aggressive tenant can consume the entire allocation. Per-tenant limits ensure fair resource distribution.

No tenant-level caching isolation. A single Redis instance shared across all tenants means one tenant's cache eviction pattern affects others. Use key prefixing at minimum; dedicated cache instances for high-value tenants.

Production Checklist

Tenant-aware sharding with consistent hashing
Per-tenant rate limiting with configurable tiers
Noisy neighbor detection and automated throttling
Tenant migration between shards without downtime
Per-tenant metrics and SLO tracking
Automated shard rebalancing
Cache isolation per tenant or tenant tier
Connection pool isolation per shard

Conclusion

High-scale multi-tenancy is a resource management problem. The platform must allocate compute, storage, network, and cache resources fairly across thousands of tenants while maintaining performance SLOs for each. The key mechanisms — sharding, per-tenant rate limiting, noisy neighbor detection, and tiered resource allocation — work together to prevent any single tenant from degrading the experience for others.

FAQ

Need expert help?

Building with saas engineering?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Book a Free Call Send a Brief

multi-tenancy saas architecture isolation high-scale best-practices

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

View Portfolio Book a Call

← Previous

Tenant-Aware Data Partitioning

Sharding by Tenant

Noisy Neighbor Detection

Rate Limiting Per Tenant

Anti-Patterns to Avoid

Production Checklist

Conclusion

FAQ

Building with saas engineering?

Multi-Tenant Architecture Best Practices for Enterprise Teams

Multi-Tenant Architecture Best Practices for Startup Teams

Multi-Tenant Architecture at Scale: Lessons from Production

Multi-Tenant Architecture at Scale: Lessons from Production

Multi-Tenant Architecture Best Practices for Enterprise Teams

Start aConversation.

Start a
Conversation.