Comprehensive Rules for designing, implementing, and operating microservices communication patterns in a TypeScript/Node.js ecosystem.
Transform your microservices architecture from a maintenance nightmare into a resilient, observable, and scalable distributed system. These Cursor Rules eliminate the guesswork from service communication patterns and establish production-ready standards that actually work at scale.
Every developer who's built microservices knows the pain: services that work perfectly in isolation but fail spectacularly when they need to talk to each other. You've probably experienced:
The root cause? Most teams wing it when designing service communication, leading to brittle point-to-point integrations that become unmaintainable as the system grows.
These Cursor Rules establish a comprehensive framework for microservices communication that prioritizes resilience, observability, and maintainability from day one. Instead of retrofitting communication patterns, you'll build them correctly from the start.
Core Philosophy: Principle of Least Synchrony Default to asynchronous, event-driven communication and fall back to synchronous calls only when you truly need immediate responses. This fundamental shift eliminates most cascade failure scenarios and improves system resilience.
What You Get:
With mandatory correlation IDs and distributed tracing, you'll trace requests across services in seconds instead of hours. No more log diving across multiple services trying to piece together what happened.
Built-in circuit breakers and timeout policies prevent cascade failures. When one service struggles, it fails fast and doesn't bring down dependent services.
gRPC for internal communication typically delivers 2-5x better performance than REST APIs. The rules automatically optimize protocol selection based on use case.
API-first contracts mean teams can develop independently without constant coordination meetings. Changes are backward compatible by default.
// Typical ad-hoc service call - what could go wrong?
async function processOrder(orderId: string) {
const order = await fetch(`http://order-service/orders/${orderId}`);
const payment = await fetch(`http://payment-service/process`, {
method: 'POST',
body: JSON.stringify(order)
});
// Hope nothing fails, no retries, no observability
}
Problems:
// Contract-first with built-in resilience
import { CreatePaymentCmd, CreatePaymentCmdSchema } from '@/contracts';
import { PaymentServiceClient } from '@/grpc/payment-service';
async function processOrder(orderId: string, context: RequestContext) {
const span = context.tracer.startSpan('process-order');
try {
// Type-safe, validated contract
const paymentCmd: CreatePaymentCmd = {
orderId,
correlationId: context.correlationId
};
// Validate at boundary
const validated = CreatePaymentCmdSchema.parse(paymentCmd);
// Circuit breaker + retry built-in
const result = await this.paymentClient.createPayment(validated, {
timeout: 3000,
retry: { maxAttempts: 3 }
});
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (error) {
span.recordException(error);
span.setStatus({ code: SpanStatusCode.ERROR });
throw error;
} finally {
span.end();
}
}
Benefits:
Before: Synchronous Coupling
// Order service directly calls inventory, payment, shipping
async function createOrder(orderData: any) {
await inventoryService.reserveItems(orderData.items);
await paymentService.processPayment(orderData.payment);
await shippingService.scheduleDelivery(orderData.shipping);
// If shipping fails, everything fails
}
After: Event Choreography
// Order service publishes event, other services react independently
async function createOrder(orderData: CreateOrderCmd) {
const order = await this.orderRepository.create(orderData);
// Publish event - other services will react
await this.eventPublisher.publish('sales.order.created', {
orderId: order.id,
userId: order.userId,
items: order.items,
correlationId: order.correlationId
});
return order;
}
// Inventory service reacts to event
async function handleOrderCreated(event: OrderCreatedEvent) {
try {
await this.reserveItems(event.items);
// Publish success event
await this.eventPublisher.publish('inventory.items.reserved', {
orderId: event.orderId,
correlationId: event.correlationId
});
} catch (error) {
// Publish failure event for compensation
await this.eventPublisher.publish('inventory.reservation.failed', {
orderId: event.orderId,
reason: error.message,
correlationId: event.correlationId
});
}
}
Results:
# Install core dependencies
npm install @grpc/grpc-js @grpc/proto-loader kafkajs amqplib
npm install fastify fastify-zod pino opossum
npm install zod @opentelemetry/api @opentelemetry/sdk-node
# Development tools
npm install -D @types/node ts-node nodemon
// tsconfig.json
{
"compilerOptions": {
"strict": true,
"exactOptionalPropertyTypes": true,
"noImplicitOverride": true,
"target": "ES2022",
"module": "ESNext",
"moduleResolution": "node"
}
}
// contracts/order-commands.ts
import { z } from 'zod';
export interface CreateOrderCmd {
id: string;
userId: string;
items: OrderLine[];
correlationId: string;
}
export const CreateOrderCmdSchema = z.object({
id: z.string().uuid(),
userId: z.string().uuid(),
items: z.array(OrderLineSchema).min(1),
correlationId: z.string().uuid()
});
// contracts/index.ts - barrel export
export * from './order-commands';
export * from './payment-commands';
// grpc/order-service.ts
import { loadPackageDefinition } from '@grpc/grpc-js';
import { loadSync } from '@grpc/proto-loader';
const packageDefinition = loadSync('proto/order.proto', {
keepCase: true,
longs: String,
enums: String,
defaults: true,
oneofs: true
});
const orderProto = loadPackageDefinition(packageDefinition);
export class OrderServiceClient {
private client: any;
constructor(address: string) {
this.client = new orderProto.OrderService(address, {
'grpc.keepalive_time_ms': 20000,
'grpc.max_receive_message_length': 4194304
});
}
async createOrder(request: CreateOrderCmd): Promise<Order> {
return new Promise((resolve, reject) => {
this.client.CreateOrder(request, (error: any, response: any) => {
if (error) reject(error);
else resolve(response);
});
});
}
}
// infrastructure/kafka-publisher.ts
import { Kafka } from 'kafkajs';
export class EventPublisher {
private producer: Producer;
constructor() {
const kafka = new Kafka({
clientId: 'order-service',
brokers: ['localhost:9092']
});
this.producer = kafka.producer({
maxInFlightRequests: 1,
idempotent: true,
transactionTimeout: 30000
});
}
async publish(topic: string, event: any): Promise<void> {
await this.producer.send({
topic,
messages: [{
key: event.orderId,
value: JSON.stringify(event),
headers: {
'correlation-id': event.correlationId,
'event-type': topic,
'timestamp': Date.now().toString()
}
}]
});
}
}
// infrastructure/tracing.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
const sdk = new NodeSDK({
instrumentations: [getNodeAutoInstrumentations()],
serviceName: 'order-service',
traceExporter: new JaegerExporter({
endpoint: 'http://jaeger:14268/api/traces'
})
});
sdk.start();
Teams using these patterns typically see:
The rules also cover enterprise-grade patterns you'll need as you scale:
Stop building fragile distributed systems. These Cursor Rules give you the battle-tested patterns that scale from startup to enterprise. Your future self (and your on-call rotation) will thank you.
Get production-ready microservices communication patterns working in your codebase today. The complexity of distributed systems doesn't have to be your complexity.
You are an expert in TypeScript, Node.js, gRPC, REST/HTTP, Apache Kafka, RabbitMQ, AWS SQS/SNS/EventBridge, NATS, Istio Service Mesh, Kubernetes, OpenTelemetry, Camunda & Temporal, Redis Pub/Sub.
Key Principles
- API-First: Define and version explicit, language-agnostic contracts (OpenAPI or protobuf). Never expose internal representations.
- Principle of Least Synchrony: Default to asynchronous, event-driven messaging; fall back to synchronous calls only when the caller truly needs an immediate response.
- Resilience Over Perfection: Design every exchange to tolerate retries, duplication, out-of-order delivery, and partial failure.
- Observable by Default: Every request or message MUST be traceable end-to-end with correlation IDs and OpenTelemetry spans.
- Security Built-In: Mutual TLS for service-to-service traffic, OAuth2/JWT for edge authentication, RBAC for authorization.
- Small, Focused Services: Keep bounded contexts clear; never share databases across services—share via events or APIs.
TypeScript (Node.js)
- Use ESM syntax; one top-level export per file. File names: kebab-case ending with `.ts`.
- Enable `strict`, `exactOptionalPropertyTypes`, and `noImplicitOverride` in `tsconfig.json`.
- All public contract types live in `/contracts`; export ONLY interfaces (never classes) and re-export through an `index.ts` barrel.
- Validate incoming payloads with Zod schemas co-located with the interface (same file). Example:
```ts
export interface CreateOrderCmd { id: string; userId: string; items: OrderLine[] }
export const CreateOrderCmdSchema = z.object({
id: z.string().uuid(),
userId: z.string().uuid(),
items: z.array(OrderLineSchema).min(1)
});
```
- Use async functions with explicit `return` types (`Promise<Result<T,E>>`). Avoid `any`, avoid throwing inside business code—return rich error objects instead.
Error Handling and Validation
- Validate at API boundary; reject as early as possible. Use HTTP `400` / gRPC `INVALID_ARGUMENT` for validation failures.
- Centralise error mapping: `DomainError -> TransportError` adapter layer; ensure no domain leakage.
- Retry Strategy Matrix:
• HTTP : at most once, idempotent safe verbs (`GET`, `PUT`), exponential back-off ≤ 3 steps.
• gRPC : enable client-side retry policy in protobuf service config; respect `retryableStatusCodes`.
• Messaging : publisher confirms + dead-letter queues; consumer side idempotency key.
- Circuit Breakers (e.g., `opossum`) wrap all outbound synchronous calls; timeout ≤ 3000 ms, half-open after 30 s.
- Always attach `x-correlation-id`; propagate unchanged.
Framework-Specific Rules
REST (Fastify)
- Register one route per file under `/routes`; schema-based validation with `fastify-zod`.
- Use structured logging (`pino`). Each handler MUST call `req.log.info` once with summary {path, status, latencyMs}.
gRPC (grpc-js)
- Services defined in `/proto`; generate TS via `@grpc/proto-loader` + `grpc-tools`. Do NOT commit generated JS to VCS.
- Unary over streaming unless you need >1 MiB payload or long-lived sessions. Keep message < 256 KiB.
- Enable `grpc.keepalive_time_ms = 20000` and `grpc.max_receive_message_length = 4194304`.
Apache Kafka
- Topic naming: `<boundedContext>.<entity>.<event>` (e.g., `sales.order.created`).
- Default partitions = 3; replication factor = 3. Compression = `lz4`.
- Consumers use `kafkajs` with `eachBatch` processing; commit offsets only after successful business operation + outbox write.
RabbitMQ / AMQP
- Exchanges: `topic` type with dot notation routing keys. Mandatory queues have TTL ≤ 7 days.
- Publish with `messageId`, `timestamp`, `x-retry-count` headers. Dead-letter to `{queue}.dlq`.
Istio Service Mesh
- Enforce mTLS `STRICT` in the mesh.
- Outbound traffic policy `REGISTRY_ONLY`—all external calls declared via `ServiceEntry`.
- Traffic policy: 2 retries, 25 ms base delay, jitter 0.2.
Kubernetes
- Deploy one container per microservice; liveness probe < 10 s, readiness probe < 2 s.
- HPA based on `cpu` OR `kafka-lag` custom metric (Prometheus Adapter).
Additional Sections
Testing Guidelines
- Contract Tests: Generate pact files (`pact-js`) on provider CI; verify on consumer CI.
- Integration Tests: Spin up docker-compose with Kafka/RabbitMQ + service under test; run Jest with `--runInBand`.
- Performance: Use `k6` scripts covering p95 ≤ 200 ms (internal) / 500 ms (external). Alarm on regression >10 %.
- Chaos: Inject 30 % packet loss & 1 s latency via Istio `VirtualService` for resilience verification.
Observability
- Use OpenTelemetry SDK; auto-instrument HTTP, gRPC, and Kafka.
- Export traces to Jaeger; metrics to Prometheus; logs to Loki. Retain 90 days.
- Dashboards: p50/p95 latency, error rate, saturation (RUSE), inflight requests, queue lag.
Security
- All secrets delivered via Kubernetes secrets + sealed-secrets; never in env files committed to VCS.
- Edge traffic (Ingress) secured with TLS 1.3, HSTS 1 year.
- JWTs must include `sub`, `aud`, `exp`, `iss`; verify clock skew ≤ 30 s.
- Broker credentials rotated every 30 days; enable SCRAM-SHA-512 for Kafka.
Performance & Scalability
- Use gRPC for internal calls requiring p95 < 100 ms; REST otherwise.
- Offload heavy computation to async background workers reading from Kafka/RabbitMQ.
- Prefer partition/consumer-group scaling over vertical scaling.
Common Pitfalls & Guards
- ❌ Direct synchronous cascade calls across >2 services → ✅ replace with choreography events.
- ❌ Mixing process manager + saga logic inside multiple services → ✅ centralize in Workflow Engine (Temporal/Camunda).
- ❌ Huge, unversioned JSON payloads → ✅ strict versioning (`v1`, `v2`) and feature toggles.
Directory Layout (monorepo example)
```
apps/
orders-service/
src/
routes/
grpc/
consumers/
producers/
domain/
contracts/
infrastructure/
Dockerfile
payments-service/
packages/
shared-contracts/
otel-config/
```