Should I use kafka-go or confluent-kafka-go for production Go services?

For most Go services, `kafka-go` (segmentio) is the better choice. It's a pure Go implementation with no CGO dependency, which simplifies cross-compilation, debugging, and deployment. `confluent-kafka-go` wraps librdkafka and offers marginally higher throughput in benchmarks, but the operational complexity of CGO (build toolchains, Alpine compatibility issues) rarely justifies the 10-15% performance difference.

How do I handle schema evolution in Go event-driven systems?

Use Protocol Buffers with a schema registry. Define events as protobuf messages with explicit field numbering, add new fields as optional with sensible defaults, and never reuse or remove field numbers. The `buf` CLI provides linting and breaking change detection in CI. For simpler systems, JSON with explicit version fields in the event envelope works but lacks the compile-time guarantees.

What's the right number of Kafka partitions for a Go consumer group?

Match partitions to your expected maximum consumer count. Each partition is assigned to exactly one consumer in a group, so 12 partitions means at most 12 concurrent consumers. Start with 12-24 partitions for most topics — this provides room to scale horizontally without repartitioning. Over-partitioning (hundreds of partitions) increases broker metadata overhead and rebalancing time.

How do I test event-driven Go services without a running Kafka cluster?

Use `testcontainers-go` to spin up a real Kafka broker in Docker during integration tests. For unit tests, define your consumer handler as a function that accepts a `context.Context` and message bytes — this decouples business logic from the Kafka transport layer entirely. Mock the producer interface to assert that correct events are published without needing a broker.

Complete Guide to Event-Driven Architecture with Go

Event-driven architecture fundamentally changes how Go services communicate. Instead of synchronous HTTP calls creating tight coupling between services, events flow through message brokers, allowing each service to evolve independently. This guide covers everything from foundational patterns to production-hardened implementations.

Core Concepts

Event-driven architecture (EDA) decomposes systems into producers that emit events and consumers that react to them. In Go, this maps naturally to goroutines and channels at the local level, and to Kafka or NATS at the distributed level.

Three primary patterns emerge:

Event Notification — a service emits a thin event signaling something happened, and consumers fetch additional data as needed
Event-Carried State Transfer — events contain the full state change, eliminating the need for consumers to call back
Event Sourcing — the event log is the source of truth, with current state derived by replaying events

1// Event Notification — thin event

2type OrderCreated struct {

3 EventID string `json:"event_id"`

4 OrderID string `json:"order_id"`

5 Timestamp time.Time `json:"timestamp"`

8// Event-Carried State Transfer — full payload

9type OrderCreatedFull struct {

10 EventID string `json:"event_id"`

11 OrderID string `json:"order_id"`

12 CustomerID string `json:"customer_id"`

13 Items []OrderItem `json:"items"`

14 Total decimal.Decimal `json:"total"`

15 Currency string `json:"currency"`

16 Timestamp time.Time `json:"timestamp"`

17}

Setting Up Kafka with Go

The segmentio/kafka-go library provides the most idiomatic Go experience. Unlike confluent-kafka-go which wraps librdkafka via CGO, kafka-go is a pure Go implementation — no C dependencies, simpler cross-compilation, and easier debugging.

1package kafka

3import (

4 "context"

5 "encoding/json"

6 "time"

8 "github.com/segmentio/kafka-go"

11type Producer struct {

12 writer *kafka.Writer

13}

15func NewProducer(brokers []string) *Producer {

16 return &Producer{

17 writer: &kafka.Writer{

18 Addr: kafka.TCP(brokers...),

19 Balancer: &kafka.LeastBytes{},

20 BatchSize: 100,

21 BatchTimeout: 10 * time.Millisecond,

22 RequiredAcks: kafka.RequireAll,

23 Async: false,

24 },

25 }

26}

28func (p *Producer) Publish(ctx context.Context, topic string, key string, event interface{}) error {

29 payload, err := json.Marshal(event)

30 if err != nil {

31 return fmt.Errorf("marshal event: %w", err)

32 }

34 return p.writer.WriteMessages(ctx, kafka.Message{

35 Topic: topic,

36 Key: []byte(key),

37 Value: payload,

38 Headers: []kafka.Header{

39 {Key: "event_type", Value: []byte(fmt.Sprintf("%T", event))},

40 {Key: "produced_at", Value: []byte(time.Now().UTC().Format(time.RFC3339Nano))},

41 },

42 })

43}

45func (p *Producer) Close() error {

46 return p.writer.Close()

47}

Building a Consumer Group

Consumer groups distribute partitions across instances for horizontal scaling. Here's a production-ready consumer implementation:

1package kafka

3import (

4 "context"

5 "log/slog"

6 "time"

8 "github.com/segmentio/kafka-go"

11type Handler func(ctx context.Context, msg kafka.Message) error

13type Consumer struct {

14 reader *kafka.Reader

15 handler Handler

16 logger *slog.Logger

17}

19func NewConsumer(brokers []string, groupID, topic string, handler Handler, logger *slog.Logger) *Consumer {

20 return &Consumer{

21 reader: kafka.NewReader(kafka.ReaderConfig{

22 Brokers: brokers,

23 GroupID: groupID,

24 Topic: topic,

25 MinBytes: 1e3,

26 MaxBytes: 10e6,

27 MaxWait: 250 * time.Millisecond,

28 CommitInterval: 0, // manual commit

29 StartOffset: kafka.LastOffset,

30 }),

31 handler: handler,

32 logger: logger,

33 }

34}

36func (c *Consumer) Run(ctx context.Context) error {

37 c.logger.Info("consumer started", "topic", c.reader.Config().Topic, "group", c.reader.Config().GroupID)

39 for {

40 select {

41 case <-ctx.Done():

42 c.logger.Info("consumer shutting down")

43 return c.reader.Close()

44 default:

45 }

47 msg, err := c.reader.FetchMessage(ctx)

48 if err != nil {

49 if ctx.Err() != nil {

50 return nil

51 }

52 c.logger.Error("fetch failed", "error", err)

53 continue

54 }

56 if err := c.handler(ctx, msg); err != nil {

57 c.logger.Error("handler failed",

58 "topic", msg.Topic,

59 "partition", msg.Partition,

60 "offset", msg.Offset,

61 "error", err,

62 )

63 // Push to dead letter topic on failure

64 if dlqErr := c.sendToDLQ(ctx, msg, err); dlqErr != nil {

65 c.logger.Error("DLQ publish failed", "error", dlqErr)

66 }

67 }

69 if err := c.reader.CommitMessages(ctx, msg); err != nil {

70 c.logger.Error("commit failed", "error", err)

71 }

72 }

73}

Event Routing and Dispatch

A clean event router decouples message consumption from business logic:

1type EventRouter struct {

2 handlers map[string]Handler

3 fallback Handler

4 logger *slog.Logger

7func NewEventRouter(logger *slog.Logger) *EventRouter {

8 return &EventRouter{

9 handlers: make(map[string]Handler),

10 logger: logger,

11 }

12}

14func (r *EventRouter) Register(eventType string, handler Handler) {

15 r.handlers[eventType] = handler

16}

18func (r *EventRouter) SetFallback(handler Handler) {

19 r.fallback = handler

20}

22func (r *EventRouter) Dispatch(ctx context.Context, msg kafka.Message) error {

23 eventType := ""

24 for _, h := range msg.Headers {

25 if h.Key == "event_type" {

26 eventType = string(h.Value)

27 break

28 }

29 }

31 handler, ok := r.handlers[eventType]

32 if !ok {

33 if r.fallback != nil {

34 return r.fallback(ctx, msg)

35 }

36 r.logger.Warn("no handler registered", "event_type", eventType)

37 return nil

38 }

40 return handler(ctx, msg)

41}

Usage ties everything together:

1func main() {

2 logger := slog.Default()

3 router := NewEventRouter(logger)

5 router.Register("OrderCreated", handleOrderCreated)

6 router.Register("OrderShipped", handleOrderShipped)

7 router.Register("OrderCancelled", handleOrderCancelled)

9 consumer := NewConsumer(

10 []string{"kafka-1:9092", "kafka-2:9092"},

11 "order-processor",

12 "order-events",

13 router.Dispatch,

14 logger,

15 )

17 ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)

18 defer cancel()

20 if err := consumer.Run(ctx); err != nil {

21 log.Fatal(err)

22 }

23}

Need a second opinion on your system design architecture?

I run free 30-minute strategy calls for engineering teams tackling this exact problem.

Book a Free Call

Exactly-Once Processing with the Outbox Pattern

Dual writes — updating a database and publishing an event — create consistency risks. The transactional outbox pattern solves this by writing events to a database table within the same transaction, then polling or using CDC to publish them to Kafka.

1type OutboxEntry struct {

2 ID string `db:"id"`

3 AggregateID string `db:"aggregate_id"`

4 EventType string `db:"event_type"`

5 Payload []byte `db:"payload"`

6 CreatedAt time.Time `db:"created_at"`

7 PublishedAt *time.Time `db:"published_at"`

10func (s *OrderService) CreateOrder(ctx context.Context, cmd CreateOrderCommand) error {

11 tx, err := s.db.BeginTx(ctx, nil)

12 if err != nil {

13 return fmt.Errorf("begin tx: %w", err)

14 }

15 defer tx.Rollback()

17 order := Order{

18 ID: uuid.New().String(),

19 CustomerID: cmd.CustomerID,

20 Items: cmd.Items,

21 Status: "created",

22 }

24 if _, err := tx.ExecContext(ctx,

25 "INSERT INTO orders (id, customer_id, items, status) VALUES ($1, $2, $3, $4)",

26 order.ID, order.CustomerID, order.Items, order.Status,

27 ); err != nil {

28 return fmt.Errorf("insert order: %w", err)

29 }

31 event := OrderCreatedFull{

32 EventID: uuid.New().String(),

33 OrderID: order.ID,

34 CustomerID: order.CustomerID,

35 Items: order.Items,

36 Timestamp: time.Now().UTC(),

37 }

38 payload, _ := json.Marshal(event)

40 if _, err := tx.ExecContext(ctx,

41 "INSERT INTO outbox (id, aggregate_id, event_type, payload) VALUES ($1, $2, $3, $4)",

42 event.EventID, order.ID, "OrderCreated", payload,

43 ); err != nil {

44 return fmt.Errorf("insert outbox: %w", err)

45 }

47 return tx.Commit()

48}

The outbox poller runs as a separate goroutine:

1func (p *OutboxPoller) Poll(ctx context.Context) error {

2 ticker := time.NewTicker(100 * time.Millisecond)

3 defer ticker.Stop()

5 for {

6 select {

7 case <-ctx.Done():

8 return nil

9 case <-ticker.C:

10 entries, err := p.fetchUnpublished(ctx, 100)

11 if err != nil {

12 p.logger.Error("fetch outbox entries failed", "error", err)

13 continue

14 }

16 for _, entry := range entries {

17 if err := p.producer.Publish(ctx, p.topic, entry.AggregateID, entry.Payload); err != nil {

18 p.logger.Error("publish failed", "entry_id", entry.ID, "error", err)

19 break

20 }

21 if err := p.markPublished(ctx, entry.ID); err != nil {

22 p.logger.Error("mark published failed", "entry_id", entry.ID, "error", err)

23 }

24 }

25 }

26 }

27}

Observability and Metrics

Production event systems need deep observability. Use OpenTelemetry for traces that span producer-to-consumer:

1import (

2 "go.opentelemetry.io/otel"

3 "go.opentelemetry.io/otel/propagation"

4 "go.opentelemetry.io/otel/trace"

7var tracer = otel.Tracer("event-processor")

9func (c *Consumer) tracedHandler(ctx context.Context, msg kafka.Message) error {

10 // Extract trace context from message headers

11 carrier := NewKafkaHeaderCarrier(msg.Headers)

12 ctx = otel.GetTextMapPropagator().Extract(ctx, carrier)

14 ctx, span := tracer.Start(ctx, fmt.Sprintf("process %s", msg.Topic),

15 trace.WithAttributes(

16 attribute.String("messaging.system", "kafka"),

17 attribute.String("messaging.destination", msg.Topic),

18 attribute.Int("messaging.kafka.partition", msg.Partition),

19 attribute.Int64("messaging.kafka.offset", msg.Offset),

20 ),

21 )

22 defer span.End()

24 return c.handler(ctx, msg)

25}

Combine tracing with Prometheus metrics for consumer lag, processing latency, and error rates:

1var (

2 eventsProcessed = promauto.NewCounterVec(prometheus.CounterOpts{

3 Name: "events_processed_total",

4 Help: "Total events processed by type and status",

5 }, []string{"event_type", "status"})

7 processingDuration = promauto.NewHistogramVec(prometheus.HistogramOpts{

8 Name: "event_processing_duration_seconds",

9 Help: "Event processing duration",

10 Buckets: prometheus.ExponentialBuckets(0.001, 2, 12),

11 }, []string{"event_type"})

12)

Graceful Shutdown and Backpressure

Graceful shutdown prevents message loss during deployments. Backpressure prevents a slow consumer from unbounded memory growth:

1func (c *Consumer) RunWithBackpressure(ctx context.Context, maxInFlight int) error {

2 sem := make(chan struct{}, maxInFlight)

4 for {

5 select {

6 case <-ctx.Done():

7 // Wait for in-flight messages to complete

8 for i := 0; i < maxInFlight; i++ {

9 sem <- struct{}{}

10 }

11 return c.reader.Close()

12 case sem <- struct{}{}:

13 }

15 msg, err := c.reader.FetchMessage(ctx)

16 if err != nil {

17 <-sem

18 if ctx.Err() != nil {

19 return nil

20 }

21 continue

22 }

24 go func(m kafka.Message) {

25 defer func() { <-sem }()

26 if err := c.handler(ctx, m); err != nil {

27 c.logger.Error("handler error", "error", err)

28 }

29 c.reader.CommitMessages(ctx, m)

30 }(msg)

31 }

32}

Conclusion

Building event-driven architecture in Go rewards you with a system that's straightforward to reason about, easy to operate, and fast enough for all but the most extreme throughput requirements. The combination of goroutines for concurrency, channels for internal coordination, and libraries like kafka-go for external messaging creates an ecosystem where the hard parts of distributed systems — exactly-once processing, consumer group management, graceful shutdown — can be handled with clear, idiomatic code.

The patterns covered here — outbox for consistency, event routing for clean dispatch, OpenTelemetry for observability — compose together naturally. Start with the simplest implementation that meets your requirements, instrument everything from day one, and evolve toward more sophisticated patterns like event sourcing only when the business domain demands it.

FAQ

Need expert help?

Building with system design?

I help teams ship production-grade systems. From architecture review to hands-on builds.

Book a Free Call Send a Brief

event-driven messaging kafka architecture go guide

Muneer Puthiya Purayil

SaaS Architect & AI Systems Engineer. 10+ years shipping production infrastructure across fintech, automotive, e-commerce, and healthcare.

View Portfolio Book a Call

← Previous

Complete Guide to Event-Driven Architecture with Go

Core Concepts

Setting Up Kafka with Go

Building a Consumer Group

Event Routing and Dispatch

Exactly-Once Processing with the Outbox Pattern

Observability and Metrics

Graceful Shutdown and Backpressure

Conclusion

FAQ

Building with system design?

Event-Driven Architecture: Go vs Rust in 2025

Event-Driven Architecture: Go vs Java in 2025

Complete Guide to Event-Driven Architecture with Rust

Complete Guide to Event-Driven Architecture with Java

Complete Guide to Event-Driven Architecture with Python

Start a
Conversation.

Core Concepts

Setting Up Kafka with Go

Building a Consumer Group

Event Routing and Dispatch

Exactly-Once Processing with the Outbox Pattern

Observability and Metrics

Graceful Shutdown and Backpressure

Conclusion

FAQ

Building with system design?

Event-Driven Architecture: Go vs Rust in 2025

Event-Driven Architecture: Go vs Java in 2025

Complete Guide to Event-Driven Architecture with Rust

Complete Guide to Event-Driven Architecture with Java

Complete Guide to Event-Driven Architecture with Python

Start aConversation.

Start a
Conversation.