How do you handle cache failover at high scale without overwhelming the database?

Implement a circuit breaker that rate-limits database queries when the cache is unavailable. Serve stale cached data (from a local in-process cache or a secondary Redis cluster) rather than hitting the database directly. Pre-calculate the maximum database QPS capacity and configure the circuit breaker to stay under that threshold. Accept degraded (stale) responses over unavailable responses.

What is the optimal Redis cluster size for 1 million operations per second?

Redis handles roughly 100K-200K operations per second per node for simple GET/SET operations. For 1M ops/sec, plan for 8-12 Redis masters with 1-2 replicas each, totaling 16-36 Redis instances. Use Redis Cluster mode for automatic failover and slot-based sharding. Monitor per-node CPU — Redis is single-threaded per shard, so CPU utilization per core should stay below 70%.

How do you prevent cache stampedes when a popular key expires?

Three complementary approaches: (1) probabilistic early expiration refreshes keys before they expire, (2) lock-based single-flight ensures only one request loads the key while others wait, (3) cache warming preloads popular keys on startup. The combination eliminates stampedes in practice. For extremely hot keys (millions of reads per second), consider using a local in-process cache with a very short TTL as the first layer.

When should you use Memcached instead of Redis at high scale?

Memcached outperforms Redis for simple key-value caching when you need maximum throughput and multi-threaded processing. Memcached uses multiple threads per instance (Redis is single-threaded per shard), making better use of multi-core machines for simple operations. Choose Memcached when your caching needs are purely GET/SET without data structures, sorted sets, or pub/sub. Choose Redis when you need data structures, Lua scripting, or cluster-mode sharding.

Distributed Caching Best Practices for High Scale Teams

High-scale distributed caching operates in a fundamentally different regime than enterprise caching. When your cache layer handles millions of operations per second, serves as the primary read path for user-facing requests, and a cache failure means a database that cannot absorb the redirected load, every architectural decision carries significant blast radius. These best practices come from operating Redis clusters handling 2M+ operations per second across hundreds of nodes.

The High-Scale Caching Challenge

At high scale, the cache is not optional infrastructure — it is the primary data serving layer. Your database exists as a persistence mechanism, not a query engine. A cold cache event (cluster restart, network partition) can generate a thundering herd that overwhelms the database within seconds. The cache must be engineered with the same rigor as the database itself.

Best Practices

1. Shard the Cache Layer with Consistent Hashing

A single Redis instance tops out at approximately 100K-200K operations per second. Beyond that, you need a sharded cache topology.

1type ShardedCache struct {

2 ring *consistenthash.Map

3 pools map[string]*redis.Client

6func NewShardedCache(nodes []string) *ShardedCache {

7 ring := consistenthash.New(150, nil)

8 ring.Add(nodes...)

10 pools := make(map[string]*redis.Client)

11 for _, node := range nodes {

12 pools[node] = redis.NewClient(&redis.Options{

13 Addr: node,

14 PoolSize: 100,

15 MinIdleConns: 20,

16 ReadTimeout: 50 * time.Millisecond,

17 WriteTimeout: 50 * time.Millisecond,

18 })

19 }

21 return &ShardedCache{ring: ring, pools: pools}

22}

24func (c *ShardedCache) Get(ctx context.Context, key string) (string, error) {

25 node := c.ring.Get(key)

26 return c.pools[node].Get(ctx, key).Result()

27}

Use Redis Cluster mode for automatic sharding, or implement application-level sharding with consistent hashing for finer control over data placement.

2. Implement Tiered TTL Strategy

Not all cached data has the same freshness requirements. Use tiered TTLs based on data volatility and business impact.

1type TTLTier int

3const (

4 HotData TTLTier = iota // 30-60 seconds, user sessions, active shopping carts

5 WarmData // 5-15 minutes, product details, user profiles

6 ColdData // 1-24 hours, catalog data, configuration

7 StaticData // 24-168 hours, country codes, currency rates

10func (t TTLTier) Duration() time.Duration {

11 switch t {

12 case HotData:

13 return 30 * time.Second

14 case WarmData:

15 return 10 * time.Minute

16 case ColdData:

17 return 6 * time.Hour

18 case StaticData:

19 return 72 * time.Hour

20 default:

21 return 5 * time.Minute

22 }

23}

25type TieredCache struct {

26 client *redis.Client

27}

29func (c *TieredCache) Set(ctx context.Context, key string, value interface{}, tier TTLTier) error {

30 data, err := json.Marshal(value)

31 if err != nil {

32 return err

33 }

34 jitter := time.Duration(rand.Intn(int(tier.Duration() / 10)))

35 return c.client.Set(ctx, key, data, tier.Duration()+jitter).Err()

36}

Add random jitter (10% of TTL) to prevent synchronized expiration of related keys, which causes cache stampedes.

3. Build Probabilistic Early Expiration

Instead of waiting for keys to expire and causing cache misses, proactively refresh keys before they expire.

1type ProbabilisticCache struct {

2 client *redis.Client

3 delta float64 // Controls refresh aggressiveness

6type CachedEntry struct {

7 Value json.RawMessage `json:"v"`

8 ExpiresAt int64 `json:"e"` // Unix timestamp

9 TTL int64 `json:"t"` // Original TTL in seconds

10}

12func (c *ProbabilisticCache) Get(ctx context.Context, key string, loader func() (interface{}, error)) (json.RawMessage, error) {

13 raw, err := c.client.Get(ctx, key).Bytes()

14 if err == redis.Nil {

15 return c.loadAndCache(ctx, key, loader)

16 }

17 if err != nil {

18 return nil, err

19 }

21 var entry CachedEntry

22 json.Unmarshal(raw, &entry)

24 now := time.Now().Unix()

25 timeRemaining := float64(entry.ExpiresAt - now)

26 threshold := float64(entry.TTL) * c.delta

28 // Probabilistically refresh before expiration

29 if timeRemaining < threshold && rand.Float64() < (1-timeRemaining/threshold) {

30 go c.loadAndCache(ctx, key, loader) // Background refresh

31 }

33 return entry.Value, nil

34}

36func (c *ProbabilisticCache) loadAndCache(ctx context.Context, key string, loader func() (interface{}, error)) (json.RawMessage, error) {

37 value, err := loader()

38 if err != nil {

39 return nil, err

40 }

42 data, _ := json.Marshal(value)

43 ttl := 300 // seconds

44 entry := CachedEntry{

45 Value: data,

46 ExpiresAt: time.Now().Unix() + int64(ttl),

47 TTL: int64(ttl),

48 }

50 entryData, _ := json.Marshal(entry)

51 c.client.Set(ctx, key, entryData, time.Duration(ttl)*time.Second)

53 return data, nil

54}

4. Use Pipeline and Batch Operations

At high QPS, per-key round trips to Redis become the bottleneck. Batch operations into pipelines.

1func (c *ShardedCache) GetMany(ctx context.Context, keys []string) (map[string]string, error) {

2 // Group keys by shard

3 shardKeys := make(map[string][]string)

4 for _, key := range keys {

5 node := c.ring.Get(key)

6 shardKeys[node] = append(shardKeys[node], key)

7 }

9 results := make(map[string]string)

10 var mu sync.Mutex

11 var wg sync.WaitGroup

13 for node, nodeKeys := range shardKeys {

14 wg.Add(1)

15 go func(n string, ks []string) {

16 defer wg.Done()

18 pipe := c.pools[n].Pipeline()

19 cmds := make([]*redis.StringCmd, len(ks))

21 for i, k := range ks {

22 cmds[i] = pipe.Get(ctx, k)

23 }

25 pipe.Exec(ctx)

27 mu.Lock()

28 defer mu.Unlock()

29 for i, cmd := range cmds {

30 if val, err := cmd.Result(); err == nil {

31 results[ks[i]] = val

32 }

33 }

34 }(node, nodeKeys)

35 }

37 wg.Wait()

38 return results, nil

39}

5. Implement Cache Warming and Preloading

At high scale, a cold cache can take minutes to warm up organically, during which the database bears unsustainable load.

1type CacheWarmer struct {

2 cache *ShardedCache

3 sources []WarmingSource

6type WarmingSource struct {

7 Name string

8 Loader func(ctx context.Context, batchSize int, offset int) ([]CacheEntry, error)

9 Priority int

10 TTLTier TTLTier

11}

13func (w *CacheWarmer) WarmAll(ctx context.Context) error {

14 // Sort by priority

15 sort.Slice(w.sources, func(i, j int) bool {

16 return w.sources[i].Priority < w.sources[j].Priority

17 })

19 for _, source := range w.sources {

20 log.Printf("Warming cache from source: %s", source.Name)

21 offset := 0

22 batchSize := 1000

23 warmed := 0

25 for {

26 entries, err := source.Loader(ctx, batchSize, offset)

27 if err != nil {

28 log.Printf("Warning: failed to load batch from %s at offset %d: %v", source.Name, offset, err)

29 break

30 }

31 if len(entries) == 0 {

32 break

33 }

35 pipe := w.cache.Pipeline(ctx)

36 for _, entry := range entries {

37 pipe.Set(ctx, entry.Key, entry.Value, source.TTLTier.Duration())

38 }

39 pipe.Exec(ctx)

41 warmed += len(entries)

42 offset += batchSize

43 }

45 log.Printf("Warmed %d entries from %s", warmed, source.Name)

46 }

48 return nil

49}

6. Monitor the Four Golden Signals for Caching

1type CacheMetrics struct {

2 hitRate prometheus.Gauge

3 missRate prometheus.Gauge

4 latencyHist prometheus.Histogram

5 evictionRate prometheus.Counter

6 memoryUsage prometheus.Gauge

7 connectionPool prometheus.GaugeVec

10func (m *CacheMetrics) RecordOperation(hit bool, latency time.Duration) {

11 m.latencyHist.Observe(latency.Seconds())

12 if hit {

13 m.hitRate.Inc()

14 } else {

15 m.missRate.Inc()

16 }

17}

Target metrics: hit rate > 95%, p99 latency < 5ms, memory usage < 80% of max, eviction rate near zero during normal operations.

Need a second opinion on your system design architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

Anti-Patterns

Storing Large Objects in Cache

Redis performance degrades significantly for values larger than 100KB. Compress large values or split them into smaller chunks. A 1MB cached value blocks other operations on the same Redis thread for several milliseconds.

Using KEYS Command in Production

The KEYS pattern matching command blocks Redis for the duration of the scan. Use SCAN with cursor-based iteration for pattern-based lookups in production.

Treating Cache as Durable Storage

If your application fails when cache is unavailable, you have built a cache dependency rather than a cache. Every code path must handle cache miss gracefully.

High-Scale Readiness Checklist

Conclusion

High-scale distributed caching demands treating the cache as a primary data serving infrastructure rather than an optimization layer. Shard the cache for horizontal scaling, implement probabilistic early expiration to eliminate stampedes, batch operations for throughput, and warm the cache proactively to avoid cold-start degradation. Monitor hit rates per key pattern — a drop below 90% signals either incorrect TTLs or ineffective cache key design.

FAQ

Need expert help?

Building with system design?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Book a Free Call Send a Brief

caching redis performance distributed-systems high-scale best-practices

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

View Portfolio Book a Call

← Previous

Distributed Caching Best Practices for High Scale Teams

The High-Scale Caching Challenge

Best Practices

1. Shard the Cache Layer with Consistent Hashing

2. Implement Tiered TTL Strategy

3. Build Probabilistic Early Expiration

4. Use Pipeline and Batch Operations

5. Implement Cache Warming and Preloading

6. Monitor the Four Golden Signals for Caching

Anti-Patterns

Storing Large Objects in Cache

Using KEYS Command in Production

Treating Cache as Durable Storage

High-Scale Readiness Checklist

Conclusion

FAQ

Building with system design?

Distributed Caching Best Practices for Enterprise Teams

Distributed Caching Best Practices for Startup Teams

How to Build Distributed Caching Using Spring Boot

How to Build Distributed Caching Using Nestjs

Distributed Caching Best Practices for Enterprise Teams

Start a
Conversation.

The High-Scale Caching Challenge

Best Practices

1. Shard the Cache Layer with Consistent Hashing

2. Implement Tiered TTL Strategy

3. Build Probabilistic Early Expiration

4. Use Pipeline and Batch Operations

5. Implement Cache Warming and Preloading

6. Monitor the Four Golden Signals for Caching

Anti-Patterns

Storing Large Objects in Cache

Using KEYS Command in Production

Treating Cache as Durable Storage

High-Scale Readiness Checklist

Conclusion

FAQ

Building with system design?

Distributed Caching Best Practices for Enterprise Teams

Distributed Caching Best Practices for Startup Teams

How to Build Distributed Caching Using Spring Boot

How to Build Distributed Caching Using Nestjs

Distributed Caching Best Practices for Enterprise Teams

Start aConversation.

Start a
Conversation.