Back to Journal
System Design

Distributed Caching Best Practices for High Scale Teams

Battle-tested best practices for Distributed Caching tailored to High Scale teams, including anti-patterns to avoid and a ready-to-use checklist.

Muneer Puthiya Purayil 12 min read

High-scale distributed caching operates in a fundamentally different regime than enterprise caching. When your cache layer handles millions of operations per second, serves as the primary read path for user-facing requests, and a cache failure means a database that cannot absorb the redirected load, every architectural decision carries significant blast radius. These best practices come from operating Redis clusters handling 2M+ operations per second across hundreds of nodes.

The High-Scale Caching Challenge

At high scale, the cache is not optional infrastructure — it is the primary data serving layer. Your database exists as a persistence mechanism, not a query engine. A cold cache event (cluster restart, network partition) can generate a thundering herd that overwhelms the database within seconds. The cache must be engineered with the same rigor as the database itself.

Best Practices

1. Shard the Cache Layer with Consistent Hashing

A single Redis instance tops out at approximately 100K-200K operations per second. Beyond that, you need a sharded cache topology.

go
1type ShardedCache struct {
2 ring *consistenthash.Map
3 pools map[string]*redis.Client
4}
5 
6func NewShardedCache(nodes []string) *ShardedCache {
7 ring := consistenthash.New(150, nil)
8 ring.Add(nodes...)
9
10 pools := make(map[string]*redis.Client)
11 for _, node := range nodes {
12 pools[node] = redis.NewClient(&redis.Options{
13 Addr: node,
14 PoolSize: 100,
15 MinIdleConns: 20,
16 ReadTimeout: 50 * time.Millisecond,
17 WriteTimeout: 50 * time.Millisecond,
18 })
19 }
20
21 return &ShardedCache{ring: ring, pools: pools}
22}
23 
24func (c *ShardedCache) Get(ctx context.Context, key string) (string, error) {
25 node := c.ring.Get(key)
26 return c.pools[node].Get(ctx, key).Result()
27}
28 

Use Redis Cluster mode for automatic sharding, or implement application-level sharding with consistent hashing for finer control over data placement.

2. Implement Tiered TTL Strategy

Not all cached data has the same freshness requirements. Use tiered TTLs based on data volatility and business impact.

go
1type TTLTier int
2 
3const (
4 HotData TTLTier = iota // 30-60 seconds, user sessions, active shopping carts
5 WarmData // 5-15 minutes, product details, user profiles
6 ColdData // 1-24 hours, catalog data, configuration
7 StaticData // 24-168 hours, country codes, currency rates
8)
9 
10func (t TTLTier) Duration() time.Duration {
11 switch t {
12 case HotData:
13 return 30 * time.Second
14 case WarmData:
15 return 10 * time.Minute
16 case ColdData:
17 return 6 * time.Hour
18 case StaticData:
19 return 72 * time.Hour
20 default:
21 return 5 * time.Minute
22 }
23}
24 
25type TieredCache struct {
26 client *redis.Client
27}
28 
29func (c *TieredCache) Set(ctx context.Context, key string, value interface{}, tier TTLTier) error {
30 data, err := json.Marshal(value)
31 if err != nil {
32 return err
33 }
34 jitter := time.Duration(rand.Intn(int(tier.Duration() / 10)))
35 return c.client.Set(ctx, key, data, tier.Duration()+jitter).Err()
36}
37 

Add random jitter (10% of TTL) to prevent synchronized expiration of related keys, which causes cache stampedes.

3. Build Probabilistic Early Expiration

Instead of waiting for keys to expire and causing cache misses, proactively refresh keys before they expire.

go
1type ProbabilisticCache struct {
2 client *redis.Client
3 delta float64 // Controls refresh aggressiveness
4}
5 
6type CachedEntry struct {
7 Value json.RawMessage `json:"v"`
8 ExpiresAt int64 `json:"e"` // Unix timestamp
9 TTL int64 `json:"t"` // Original TTL in seconds
10}
11 
12func (c *ProbabilisticCache) Get(ctx context.Context, key string, loader func() (interface{}, error)) (json.RawMessage, error) {
13 raw, err := c.client.Get(ctx, key).Bytes()
14 if err == redis.Nil {
15 return c.loadAndCache(ctx, key, loader)
16 }
17 if err != nil {
18 return nil, err
19 }
20
21 var entry CachedEntry
22 json.Unmarshal(raw, &entry)
23
24 now := time.Now().Unix()
25 timeRemaining := float64(entry.ExpiresAt - now)
26 threshold := float64(entry.TTL) * c.delta
27
28 // Probabilistically refresh before expiration
29 if timeRemaining < threshold && rand.Float64() < (1-timeRemaining/threshold) {
30 go c.loadAndCache(ctx, key, loader) // Background refresh
31 }
32
33 return entry.Value, nil
34}
35 
36func (c *ProbabilisticCache) loadAndCache(ctx context.Context, key string, loader func() (interface{}, error)) (json.RawMessage, error) {
37 value, err := loader()
38 if err != nil {
39 return nil, err
40 }
41
42 data, _ := json.Marshal(value)
43 ttl := 300 // seconds
44 entry := CachedEntry{
45 Value: data,
46 ExpiresAt: time.Now().Unix() + int64(ttl),
47 TTL: int64(ttl),
48 }
49
50 entryData, _ := json.Marshal(entry)
51 c.client.Set(ctx, key, entryData, time.Duration(ttl)*time.Second)
52
53 return data, nil
54}
55 

4. Use Pipeline and Batch Operations

At high QPS, per-key round trips to Redis become the bottleneck. Batch operations into pipelines.

go
1func (c *ShardedCache) GetMany(ctx context.Context, keys []string) (map[string]string, error) {
2 // Group keys by shard
3 shardKeys := make(map[string][]string)
4 for _, key := range keys {
5 node := c.ring.Get(key)
6 shardKeys[node] = append(shardKeys[node], key)
7 }
8
9 results := make(map[string]string)
10 var mu sync.Mutex
11 var wg sync.WaitGroup
12
13 for node, nodeKeys := range shardKeys {
14 wg.Add(1)
15 go func(n string, ks []string) {
16 defer wg.Done()
17
18 pipe := c.pools[n].Pipeline()
19 cmds := make([]*redis.StringCmd, len(ks))
20
21 for i, k := range ks {
22 cmds[i] = pipe.Get(ctx, k)
23 }
24
25 pipe.Exec(ctx)
26
27 mu.Lock()
28 defer mu.Unlock()
29 for i, cmd := range cmds {
30 if val, err := cmd.Result(); err == nil {
31 results[ks[i]] = val
32 }
33 }
34 }(node, nodeKeys)
35 }
36
37 wg.Wait()
38 return results, nil
39}
40 

5. Implement Cache Warming and Preloading

At high scale, a cold cache can take minutes to warm up organically, during which the database bears unsustainable load.

go
1type CacheWarmer struct {
2 cache *ShardedCache
3 sources []WarmingSource
4}
5 
6type WarmingSource struct {
7 Name string
8 Loader func(ctx context.Context, batchSize int, offset int) ([]CacheEntry, error)
9 Priority int
10 TTLTier TTLTier
11}
12 
13func (w *CacheWarmer) WarmAll(ctx context.Context) error {
14 // Sort by priority
15 sort.Slice(w.sources, func(i, j int) bool {
16 return w.sources[i].Priority < w.sources[j].Priority
17 })
18
19 for _, source := range w.sources {
20 log.Printf("Warming cache from source: %s", source.Name)
21 offset := 0
22 batchSize := 1000
23 warmed := 0
24
25 for {
26 entries, err := source.Loader(ctx, batchSize, offset)
27 if err != nil {
28 log.Printf("Warning: failed to load batch from %s at offset %d: %v", source.Name, offset, err)
29 break
30 }
31 if len(entries) == 0 {
32 break
33 }
34
35 pipe := w.cache.Pipeline(ctx)
36 for _, entry := range entries {
37 pipe.Set(ctx, entry.Key, entry.Value, source.TTLTier.Duration())
38 }
39 pipe.Exec(ctx)
40
41 warmed += len(entries)
42 offset += batchSize
43 }
44
45 log.Printf("Warmed %d entries from %s", warmed, source.Name)
46 }
47
48 return nil
49}
50 

6. Monitor the Four Golden Signals for Caching

go
1type CacheMetrics struct {
2 hitRate prometheus.Gauge
3 missRate prometheus.Gauge
4 latencyHist prometheus.Histogram
5 evictionRate prometheus.Counter
6 memoryUsage prometheus.Gauge
7 connectionPool prometheus.GaugeVec
8}
9 
10func (m *CacheMetrics) RecordOperation(hit bool, latency time.Duration) {
11 m.latencyHist.Observe(latency.Seconds())
12 if hit {
13 m.hitRate.Inc()
14 } else {
15 m.missRate.Inc()
16 }
17}
18 

Target metrics: hit rate > 95%, p99 latency < 5ms, memory usage < 80% of max, eviction rate near zero during normal operations.

Need a second opinion on your system design architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

Anti-Patterns

Storing Large Objects in Cache

Redis performance degrades significantly for values larger than 100KB. Compress large values or split them into smaller chunks. A 1MB cached value blocks other operations on the same Redis thread for several milliseconds.

Using KEYS Command in Production

The KEYS pattern matching command blocks Redis for the duration of the scan. Use SCAN with cursor-based iteration for pattern-based lookups in production.

Treating Cache as Durable Storage

If your application fails when cache is unavailable, you have built a cache dependency rather than a cache. Every code path must handle cache miss gracefully.

High-Scale Readiness Checklist

  • Cache sharded across multiple Redis nodes
  • Consistent hashing for shard routing
  • Tiered TTL strategy with jitter
  • Probabilistic early expiration for hot keys
  • Pipeline/batch operations for multi-key access
  • Cache warming procedure for cold start scenarios
  • Hit rate, latency, eviction, and memory monitoring
  • Circuit breaker on cache client for graceful degradation
  • Value size limits enforced (< 100KB per key)
  • SCAN used instead of KEYS for pattern operations
  • Load tested at 3x expected peak operations
  • Fallback to database verified under cache failure

Conclusion

High-scale distributed caching demands treating the cache as a primary data serving infrastructure rather than an optimization layer. Shard the cache for horizontal scaling, implement probabilistic early expiration to eliminate stampedes, batch operations for throughput, and warm the cache proactively to avoid cold-start degradation. Monitor hit rates per key pattern — a drop below 90% signals either incorrect TTLs or ineffective cache key design.

FAQ

Need expert help?

Building with system design?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

Engage

Start a
Conversation.

For teams building at scale: SaaS platforms, agentic AI systems, and enterprise mobile infrastructure. Scope and fit are evaluated before any engagement begins.

Limited availability · Q3 / Q4 2026