How many shards should an enterprise start with?

Start with the minimum number that addresses your immediate capacity need — typically 4-8 shards. Over-sharding increases operational overhead (more connections to manage, more migrations to coordinate) without proportional benefit. Use consistent hashing so you can add shards later with minimal data movement. Plan for 2-3x your current data volume when choosing the initial count.

How do you handle reporting and analytics across shards?

Replicate data from all shards into a dedicated analytics database (Redshift, BigQuery, ClickHouse) using change data capture (CDC). Tools like Debezium capture changes from each shard's WAL and stream them to the analytics store. Never run analytical queries directly against production shards — the full-table scans degrade transactional performance.

What is the impact of sharding on database backups?

Each shard must be backed up independently, and point-in-time recovery across shards requires coordination. Use PostgreSQL's continuous archiving (WAL shipping) per shard. For consistent cross-shard backups, implement a brief write pause across all shards simultaneously, take snapshots, then resume writes. AWS RDS automated backups handle per-shard backups but not cross-shard consistency — plan for that gap.

How do you handle global unique constraints across shards?

True global unique constraints (e.g., unique email) cannot be enforced at the database level across shards. Use a lookup table on a dedicated unsharded database that maps the unique field to its shard location. Before inserting a new record, check the lookup table. Alternatively, use a distributed lock service (Redis, ZooKeeper) for uniqueness checks, accepting that this adds a synchronous dependency to write operations.

Database Sharding Best Practices for Enterprise Teams

Database sharding introduces operational complexity that enterprises must manage for years. The decision to shard is irreversible in practice — once data is distributed across shards, consolidating it requires a migration effort comparable to the original sharding project. These best practices help enterprise teams implement sharding that scales without creating an operational nightmare.

When Enterprises Actually Need Sharding

Sharding becomes necessary when vertical scaling reaches its limit and read replicas cannot absorb the write load. For most enterprise PostgreSQL deployments, that threshold is around 5-10TB of data with 50,000+ write transactions per second. Before sharding, exhaust these alternatives: connection pooling (PgBouncer), read replicas, table partitioning, query optimization, and caching layers.

The decision framework: if your database is write-bound (not read-bound), growing faster than vertical scaling can accommodate, and your data has a natural partition key, sharding is the right next step.

Best Practices

1. Choose the Shard Key Based on Query Patterns, Not Data Structure

The shard key determines query performance, data distribution, and operational complexity for the life of the system.

sql

1-- Good: Tenant ID as shard key for multi-tenant SaaS

2-- Queries naturally scope to a single tenant

3SELECT * FROM orders WHERE tenant_id = 'acme-corp' AND status = 'pending';

4-- Routes to exactly one shard

6-- Bad: Order date as shard key

7-- Range queries span all shards, recent shards become hot

8SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days';

9-- Must query all shards and merge results

For enterprise multi-tenant systems, tenant_id is almost always the correct shard key. It provides natural isolation, predictable routing, and aligns with compliance requirements for data residency.

2. Implement a Routing Layer with Shard Map

Centralize shard routing in a dedicated service that maps shard keys to database connections.

typescript

1interface ShardConfig {

2 shardId: string;

3 host: string;

4 port: number;

5 database: string;

6 weight: number; // For weighted routing during migration

9class ShardRouter {

10 private shardMap: Map<string, ShardConfig>;

11 private pools: Map<string, Pool>;

12 private consistentHash: ConsistentHashRing;

14 constructor(shards: ShardConfig[]) {

15 this.shardMap = new Map(shards.map(s => [s.shardId, s]));

16 this.pools = new Map();

17 this.consistentHash = new ConsistentHashRing(

18 shards.map(s => s.shardId),

19 150 // virtual nodes per shard

20 );

22 for (const shard of shards) {

23 this.pools.set(shard.shardId, new Pool({

24 host: shard.host,

25 port: shard.port,

26 database: shard.database,

27 max: 20,

28 }));

29 }

30 }

32 getPool(shardKey: string): Pool {

33 const shardId = this.consistentHash.getNode(shardKey);

34 const pool = this.pools.get(shardId);

35 if (!pool) throw new Error(`No pool for shard ${shardId}`);

36 return pool;

37 }

39 async query(shardKey: string, sql: string, params: unknown[]): Promise<QueryResult> {

40 const pool = this.getPool(shardKey);

41 return pool.query(sql, params);

42 }

44 async scatter(sql: string, params: unknown[]): Promise<QueryResult[]> {

45 const promises = Array.from(this.pools.values()).map(pool =>

46 pool.query(sql, params)

47 );

48 return Promise.all(promises);

49 }

50}

3. Include the Shard Key in Every Table's Primary Key

Every table that participates in sharding must include the shard key in its primary key and all foreign key relationships. This ensures queries can be routed to a single shard.

sql

1-- Every table includes tenant_id in the primary key

2CREATE TABLE orders (

3 tenant_id UUID NOT NULL,

4 order_id UUID NOT NULL,

5 customer_id UUID NOT NULL,

6 status VARCHAR(50) NOT NULL,

7 total_cents BIGINT NOT NULL,

8 created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

9 PRIMARY KEY (tenant_id, order_id)

10);

12CREATE TABLE order_items (

13 tenant_id UUID NOT NULL,

14 order_id UUID NOT NULL,

15 item_id UUID NOT NULL,

16 product_id UUID NOT NULL,

17 quantity INT NOT NULL,

18 unit_price BIGINT NOT NULL,

19 PRIMARY KEY (tenant_id, order_id, item_id),

20 FOREIGN KEY (tenant_id, order_id) REFERENCES orders (tenant_id, order_id)

21);

23-- Indexes must also include tenant_id for efficient shard-local queries

24CREATE INDEX idx_orders_customer ON orders (tenant_id, customer_id, created_at DESC);

25CREATE INDEX idx_orders_status ON orders (tenant_id, status);

4. Plan for Cross-Shard Queries

Some queries inherently span shards. Design explicit scatter-gather patterns for these cases.

typescript

1class CrossShardQueryExecutor {

2 constructor(private router: ShardRouter) {}

4 async aggregateAcrossShards<T>(

5 sql: string,

6 params: unknown[],

7 merge: (results: T[][]) => T[]

8 ): Promise<T[]> {

9 const results = await this.router.scatter(sql, params);

10 const typed = results.map(r => r.rows as T[]);

11 return merge(typed);

12 }

14 async getGlobalOrderStats(): Promise<OrderStats> {

15 const perShardStats = await this.aggregateAcrossShards<ShardStats>(

16 `SELECT COUNT(*) as order_count, SUM(total_cents) as revenue,

17 AVG(total_cents) as avg_order

18 FROM orders WHERE created_at > NOW() - INTERVAL '30 days'`,

19 [],

20 (results) => results.flat()

21 );

23 return {

24 totalOrders: perShardStats.reduce((sum, s) => sum + s.order_count, 0),

25 totalRevenue: perShardStats.reduce((sum, s) => sum + s.revenue, 0),

26 avgOrderValue: perShardStats.reduce((sum, s) => sum + s.avg_order, 0) / perShardStats.length,

27 };

28 }

29}

5. Implement Shard-Aware Migrations

Schema changes must be applied to every shard atomically. A failed migration on one shard while others succeed creates an inconsistent state.

typescript

1class ShardMigrationRunner {

2 constructor(

3 private router: ShardRouter,

4 private migrationStore: MigrationStore

5 ) {}

7 async migrate(migration: Migration): Promise<MigrationResult> {

8 const shardIds = this.router.getAllShardIds();

9 const results: Map<string, 'success' | 'failed'> = new Map();

11 // Phase 1: Validate migration on all shards (dry run)

12 for (const shardId of shardIds) {

13 const isValid = await this.validateMigration(shardId, migration);

14 if (!isValid) {

15 return { status: 'aborted', reason: `Validation failed on ${shardId}` };

16 }

17 }

19 // Phase 2: Apply migration shard by shard

20 for (const shardId of shardIds) {

21 try {

22 await this.applyMigration(shardId, migration);

23 results.set(shardId, 'success');

24 } catch (error) {

25 results.set(shardId, 'failed');

26 // Rollback successful shards

27 for (const [completedShard, status] of results) {

28 if (status === 'success') {

29 await this.rollbackMigration(completedShard, migration);

30 }

31 }

32 return { status: 'rolled_back', failedShard: shardId, error };

33 }

34 }

36 await this.migrationStore.record(migration.id, 'completed');

37 return { status: 'completed' };

38 }

39}

6. Monitor Shard Balance and Hotspots

Uneven data distribution leads to hotspot shards that degrade overall system performance.

typescript

1class ShardMonitor {

2 async getShardHealth(): Promise<ShardHealthReport[]> {

3 const shardIds = this.router.getAllShardIds();

4 const reports: ShardHealthReport[] = [];

6 for (const shardId of shardIds) {

7 const pool = this.router.getPool(shardId);

8 const [size, connections, slowQueries] = await Promise.all([

9 pool.query('SELECT pg_database_size(current_database()) as size'),

10 pool.query('SELECT count(*) as active FROM pg_stat_activity WHERE state = $1', ['active']),

11 pool.query(`SELECT count(*) as slow FROM pg_stat_activity

12 WHERE state = 'active' AND NOW() - query_start > INTERVAL '5 seconds'`),

13 ]);

15 reports.push({

16 shardId,

17 sizeBytes: size.rows[0].size,

18 activeConnections: connections.rows[0].active,

19 slowQueries: slowQueries.rows[0].slow,

20 });

21 }

23 return reports;

24 }

26 detectHotspots(reports: ShardHealthReport[]): string[] {

27 const avgSize = reports.reduce((s, r) => s + r.sizeBytes, 0) / reports.length;

28 return reports

29 .filter(r => r.sizeBytes > avgSize * 1.5 || r.activeConnections > 50)

30 .map(r => r.shardId);

31 }

32}

Need a second opinion on your system design architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

Anti-Patterns to Avoid

Sharding Too Early

Premature sharding adds distributed systems complexity to a problem that PostgreSQL table partitioning could solve. A single well-tuned PostgreSQL instance handles more load than most applications will ever need.

Auto-Increment IDs Across Shards

Sequential IDs collide across shards. Use UUIDs (v7 for sortability) or a distributed ID generator (Snowflake-style) from the start.

Ignoring Cross-Shard Transaction Requirements

If your business logic requires atomic operations across entities on different shards, you need either saga patterns or to reconsider your shard key so those entities co-locate.

Shard Key That Changes

If the shard key value can change (e.g., a user's region), every record must be moved between shards when it changes. Choose immutable shard keys.

Enterprise Readiness Checklist

Conclusion

Enterprise database sharding succeeds when the shard key aligns with both query patterns and business isolation requirements. For multi-tenant SaaS systems, tenant ID provides natural sharding that satisfies both performance and compliance needs. Invest heavily in the routing layer, shard-aware migration tooling, and monitoring before the first production shard split — these operational foundations determine whether sharding scales gracefully or becomes an ongoing source of incidents.

FAQ

Need expert help?

Building with system design?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Book a Free Call Send a Brief

sharding database scalability distributed-systems enterprise best-practices

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

View Portfolio Book a Call

← Previous

When Enterprises Actually Need Sharding

Best Practices

1. Choose the Shard Key Based on Query Patterns, Not Data Structure

2. Implement a Routing Layer with Shard Map

3. Include the Shard Key in Every Table's Primary Key

4. Plan for Cross-Shard Queries

5. Implement Shard-Aware Migrations

6. Monitor Shard Balance and Hotspots

Anti-Patterns to Avoid

Sharding Too Early

Auto-Increment IDs Across Shards

Ignoring Cross-Shard Transaction Requirements

Shard Key That Changes

Enterprise Readiness Checklist

Conclusion

FAQ

Building with system design?

Database Sharding Best Practices for High Scale Teams

Database Sharding at Scale: Lessons from Production

Complete Guide to Database Sharding with Java

Database Sharding Best Practices for High Scale Teams

Complete Guide to Database Sharding with Java

Start aConversation.

Start a
Conversation.