Practical, production-ready rules for building resilient, observable microservices that communicate over REST, gRPC and message brokers.
Your distributed system shouldn't feel like debugging a maze blindfolded. These production-battle-tested Cursor Rules eliminate the guesswork from microservices communication, transforming chaotic service interactions into predictable, observable workflows.
You know the drill: services that worked perfectly in isolation start failing when they talk to each other. Timeouts cascade into system-wide outages. Error messages disappear into the void. Your observability stack shows you that something broke, but not why or where.
The real problems eating your productivity:
These aren't just technical debt—they're productivity killers that turn simple feature requests into multi-week debugging expeditions.
These Cursor Rules codify battle-tested patterns from high-scale production systems. Instead of reinventing communication protocols every sprint, you get consistent, secure, observable service interactions by default.
What you get immediately:
The rules eliminate decision fatigue—when your team needs to add a new service communication pattern, the implementation path is already defined and tested.
// What you're probably writing now
func GetUserOrders(userID string) ([]Order, error) {
resp, err := http.Get("http://orders-service:8080/orders?user=" + userID)
if err != nil {
log.Printf("Failed to get orders: %v", err) // Where's the trace ID?
return nil, err
}
// Hope the service is healthy and handle retries manually...
}
// Generated automatically with these rules
func GetUserOrders(ctx context.Context, userID string) ([]Order, error) {
// Context propagation, timeouts, and auth handled automatically
ctx, cancel := context.WithTimeout(ctx, 200*time.Millisecond)
defer cancel()
// Circuit breaker and retries built-in
return c.ordersClient.GetUserOrders(ctx, &pb.GetUserOrdersRequest{
UserId: userID,
})
}
Specific productivity gains:
Before: Custom event publishing logic in every service
// Manual Kafka setup in every service
producer := kafka.NewProducer(config)
message := kafka.Message{
// Forget trace ID propagation
// Forget schema versioning
// Forget error handling
}
After: Event publishing with built-in observability
// Auto-generated from the rules
func PublishOrderCreated(ctx context.Context, evt *pb.OrderCreated) error {
// Trace ID, schema versioning, and retries handled automatically
return publisher.Publish(ctx, "billing.order-created.v1", evt)
}
Real workflow improvements:
# REST API with OpenAPI documentation
curl -X POST "Generate REST handler for /v1/orders with ETag caching"
# gRPC service with interceptors
curl -X POST "Create gRPC server with auth and tracing middleware"
# Kafka producer with dead letter queue
curl -X POST "Generate Kafka publisher for order events with DLQ"
The rules include complete Istio/Linkerd configurations:
Your service mesh configuration generates automatically:
# No manual YAML writing required
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: orders-svc
spec:
host: orders
trafficPolicy:
retries:
attempts: 3
perTryTimeout: 200ms
outlierDetection:
consecutive5xxErrors: 3
Development Velocity:
System Reliability:
Team Productivity:
These rules handle the complexity your team will face as you scale:
Multi-Protocol Support: Your services can expose gRPC internally and REST at the edge without duplicate implementation logic.
Event Sourcing Integration: Message broker patterns support event replay and aggregate reconstruction without custom infrastructure code.
Security Boundaries: JWT validation, mTLS enforcement, and RBAC policies embed directly in your service code, not scattered across infrastructure configs.
Performance Under Load: Automatic bulkhead isolation, connection pooling, and load balancing handle traffic spikes without manual intervention.
Start with these rules, and your microservices communication becomes predictable, debuggable, and production-ready by default. No more guessing at best practices—implement patterns proven at scale.
You are an expert in:
- Go 1.22+
- gRPC & Protocol Buffers
- REST/JSON & OpenAPI 3
- Apache Kafka, RabbitMQ, AWS SQS/EventBridge
- Istio, Linkerd, Consul Connect (Service Mesh)
- Kubernetes & Docker
- OpenTelemetry, Prometheus, Grafana
Key Principles
- Model small, autonomous services with well-defined boundaries and APIs.
- Prefer asynchronous, event-driven flows to improve resilience and scale; fall back to synchronous calls only when absolutely required.
- Version every public contract (REST path prefix, gRPC package, topic name) using semantic versioning.
- Design messages to be immutable and idempotent; consumers must handle duplicates.
- Propagate context (trace id, auth claims, deadlines) across every hop.
- Secure by default: mutual-TLS between services, JWT/OAuth2 for users, least-privilege RBAC.
- Fail fast, observe everything, and automate recovery.
Go (language rules)
- Every exported boundary receives a `context.Context` as its first arg and checks `ctx.Err()` early.
- Return `(T, error)` or `error` only; never panic across service boundaries.
- Wrap errors with `%w` and expose sentinel/typed errors that satisfy `errors.Is/As`.
- Use interfaces for ports (inbound/outbound) and structs for adapters (Hexagonal/DDD style).
- Keep packages under 500 lines; directory names use `kebab-case`; package names use `lowercase` without underscores.
- Concurrency: prefer worker pools or `errgroup.Group`; no unbounded goroutines.
- Embed `validate:"..."` tags and run validation at DTO boundaries using `go-playground/validator`.
Error Handling and Validation
- First lines of a function guard against invalid input and transient errors (early-return pattern).
- Retry only idempotent operations and use exponential back-off + jitter (`time.Sleep(backoff + randJitter)`).
- Circuit-break remote calls with `sony/gobreaker` (or mesh policies) to prevent cascading failure.
- Standardise error surfaces:
• gRPC → `status.Errorf(code, msg)`
• REST → JSON body `{code, message, trace_id}` with proper HTTP status.
• Async → Dead-letter topic/queue with envelope `{event, error, retries}`.
Framework-Specific Rules
gRPC
- Define each API in a separate `.proto` package versioned as `package order.v1;`.
- Use deadline propagation: `ctx, cancel := context.WithTimeout(ctx, 200*time.Millisecond)`.
- Add unary & stream interceptors for auth, logging, metrics, tracing.
- Map gRPC status ↔ HTTP using grpc-gateway only at the edge; keep pure gRPC internally.
- Enable gzip or snappy compression for large payloads.
REST / HTTP APIs
- Path style: `/v1/orders/{orderId}`; nouns only; plural resources.
- Accept/Prefer `application/json; charset=utf-8`.
- Document with OpenAPI 3; generate server stubs & clients (`oapi-codegen`).
- Support ETag for caching; require `If-Match` for destructive actions to enforce optimistic locking.
Message Brokers (Kafka / RabbitMQ)
- Topic / queue naming: `{boundedContext}.{eventName}.v1` (e.g. `billing.payment-created.v1`).
- Payloads are Protocol Buffer messages; schema ID in header `X-Schema-Id` for registry lookup.
- Producer must attach `trace_id` & `span_id` headers for correlation.
- Consumers acknowledge only after successful processing & persistence; use DLQ after `N` retries.
- Partition key = domain aggregate ID to maintain order where needed.
Service Mesh (Istio / Linkerd / Consul)
- Enforce STRICT mTLS in `PeerAuthentication`.
- Control timeouts & retries in `DestinationRule`; max 3 retries, backoff 0.2s → 1s.
- Use `VirtualService` for canary routing: 90% v1, 10% v2 during experiments.
- Emit Envoy access logs in JSON; feed into central log stack.
Testing
- Contract tests (Pact) run in CI; provider build fails on contract drift.
- Integration tests spin services in Docker Compose/K8s KIND; seed fixture data.
- Chaos tests (Gremlin / Litmus) inject latency & failure to ensure resilience.
- Canary or blue-green deployments with automated rollback on SLO breach.
Performance & Scalability
- Benchmark critical code with `go test -bench` and target 4× expected peak QPS.
- Load-test service meshes with `hey` (REST) or `ghz` (gRPC) before each release.
- Prefer bulk-head isolation: separate worker pools per remote dependency.
- Use horizontal pod autoscaling on p95 latency & CPU.
Security
- Sign all JWTs with RS256; rotate keys every 90 days.
- Validate JWT in sidecar (mesh) or middleware, not in business code.
- Rate-limit public APIs by IP & token in API Gateway.
- Encrypt secrets with KMS and mount via sealed-secrets or `kubectl exec`-less init.
Observability & Monitoring
- Emit structured JSON logs: `{timestamp, level, service, trace_id, msg}`.
- Instrument handlers with OpenTelemetry; export to Jaeger/Tempo.
- Define RED metrics (Rate, Errors, Duration) for every RPC/message.
- Alert on SLOs: p95 latency, error rate ≥1%, saturation ≥80%.
Deployment & Operations
- Immutable images built with multi-stage Dockerfile; tag `service:git-sha`.
- GitOps via ArgoCD; PR merges → staging, tag → production.
- Use Helm/Kustomize for manifests; separate values for dev/staging/prod.
- Always run zero-downtime migrations before routing traffic.
Examples
```go
// publish.go – Kafka producer with context, tracing, retries
func Publish(ctx context.Context, p kafkaProducer, topic string, evt *pb.OrderCreated) error {
span := trace.SpanFromContext(ctx)
bytes, err := proto.Marshal(evt)
if err != nil {
return fmt.Errorf("marshal OrderCreated: %w", err)
}
msg := &kafka.Message{
TopicPartition: kafka.TopicPartition{Topic: &topic, Partition: kafka.PartitionAny},
Value: bytes,
Headers: []kafka.Header{
{Key: "trace_id", Value: []byte(span.SpanContext().TraceID().String())},
{Key: "content-type", Value: []byte("application/x-protobuf")},
},
}
return backoff.Retry(func() error { return p.Produce(msg, nil) }, backoff.NewExponentialBackOff())
}
```
```yaml
# istio-destinationrule.yaml – retries + circuit-breaker
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: orders-svc
spec:
host: orders
trafficPolicy:
tls:
mode: ISTIO_MUTUAL
outlierDetection:
consecutive5xxErrors: 3
interval: 30s
baseEjectionTime: 2m
connectionPool:
http:
http1MaxPendingRequests: 100
maxRequestsPerConnection: 1000
loadBalancer:
simple: ROUND_ROBIN
retries:
attempts: 3
perTryTimeout: 200ms
retryOn: connect-failure,refused-stream,5xx
```
Follow these rules to produce highly maintainable, secure, and observable microservices capable of graceful degradation under real-world load.