Opinionated Rules for building horizontally-scalable, cloud-native microservices with TypeScript/Node.js, Docker & Kubernetes.
When your API starts hitting 10k+ requests per minute and your database connections are maxing out, you're not just facing a scaling problem—you're facing an architecture reckoning. Most developers try to scale up instead of scaling out, then wonder why their monthly cloud bill exploded while performance tanked.
Here's what happens when you don't design for scale from day one:
Sound familiar? You're not alone. Most backend systems hit this wall between 50k-100k daily active users.
These Cursor Rules transform how you build scalable systems from the ground up. Instead of retrofitting scalability, you're coding with horizontal scaling as the default—every component designed to replicate across nodes without coordination.
Here's what changes immediately:
Before: Monolithic API handling all business logic
// Typical monolithic approach - doesn't scale
app.post('/orders', async (req, res) => {
const user = await db.users.findById(req.userId); // DB hit
const order = await createOrder(user, req.body); // More DB hits
await sendEmail(order); // Blocking call
res.json(order);
});
After: Stateless microservice with proper separation
// Scalable microservice approach
export async function createOrder(req: Request): Promise<Response> {
const id = nanoid();
await orderRepo.insert({ id, ...req.body });
publishEvent("order.created", { id }); // Async event
return res.status(201).json({ id });
}
Your services automatically scale across nodes because they're stateless by design. No more late-night emergency scaling sessions—just configure HorizontalPodAutoscaler and let Kubernetes handle traffic spikes.
Each bounded context deploys independently. Update your payment service without touching user management. Deploy 10x more frequently with 90% less risk.
Built-in observability with OpenTelemetry means every request carries a trace ID. When production breaks, you're not grep-ing through scattered logs—you're following request traces across services.
Redis caching layers, connection pooling, and proper database sharding are configured from the start. Your 95th percentile response times stay under 200ms even at 10x traffic.
Instead of modifying a monolithic codebase and hoping nothing breaks:
/services/feature-service/Time saved: 3-4 days of integration testing becomes 30 minutes of contract validation.
Black Friday traffic incoming? Instead of panic-scaling everything:
Result: Handle 10x traffic with same infrastructure cost through efficient resource utilization.
Your user table hit 100 million rows? Instead of expensive vertical scaling:
Impact: Linear scaling instead of exponential costs. Support 10x more users with predictable performance.
# Clone the scalable architecture template
mkdir my-scalable-api && cd my-scalable-api
Create this directory structure:
services/
user-service/
order-service/
notification-service/
packages/
shared-lib/
infra/
terraform/
k8s/
load/
k6-scripts/
Each new service follows the pattern:
// src/controllers/order.controller.ts
export async function createOrder(req: Request): Promise<Response> {
const validation = orderSchema.safeParse(req.body);
if (!validation.success) {
throw new ValidationError(validation.error.message);
}
const id = nanoid();
await orderRepo.insert({ id, ...validation.data });
publishEvent("order.created", { id });
return res.status(201).json({ id });
}
# terraform/modules/service/main.tf
resource "kubernetes_deployment" "service" {
spec {
replicas = var.min_replicas
template {
spec {
container {
resources {
requests = { cpu = "100m", memory = "256Mi" }
limits = { cpu = "500m", memory = "512Mi" }
}
}
}
}
}
}
resource "kubernetes_horizontal_pod_autoscaler" "service" {
spec {
min_replicas = var.min_replicas
max_replicas = var.max_replicas
target_cpu_utilization_percentage = 70
}
}
// Built-in tracing for every request
import { trace } from '@opentelemetry/api';
export function withTracing(fn: Function) {
return async (req: Request, res: Response) => {
const span = trace.getActiveSpan();
span?.setAttributes({
'service.name': process.env.SERVICE_NAME,
'http.method': req.method,
'http.url': req.url,
});
try {
return await fn(req, res);
} catch (error) {
span?.recordException(error);
throw error;
}
};
}
Your microservices architecture becomes your competitive advantage—not just handling current scale, but ready for whatever growth throws at you. No more choosing between moving fast and staying stable.
You are an expert in TypeScript, Node.js, Docker, Kubernetes, AWS, Azure, Google Cloud, Redis, PostgreSQL, and distributed systems.
Key Principles
- Favour horizontal scalability: design every component to be replicated across nodes without coordination.
- Keep services stateless; persist session/transaction state in external stores (Redis, DB, object storage).
- API-first: every service exposes a versioned REST/GraphQL contract before UI or clients are built.
- Decompose by business capability (domain-driven design). Each bounded context becomes an independent service owned by a single team.
- Automate everything (CI/CD, IaC, tests, observability) so adding nodes or regions requires zero manual steps.
- Fail fast & recover: detect errors early, return clear status codes, rely on retries and circuit breakers for resilience.
- Prefer eventual consistency + idempotent operations to minimise coordination overhead.
TypeScript Guidelines
- Always enable strict compiler options ("strict", "noImplicitAny", "strictNullChecks").
- Prefer ES modules and top-level await only in bootstrap files.
- Pure business logic lives in functions; use classes only for framework adapters (controllers, repositories).
- Use descriptive, lowerCamelCase for variables; UPPER_SNAKE_CASE for env vars; kebab-case for file names.
- Export a single public symbol per file; collocate tests as `<file>.spec.ts`.
- Example: stateless service handler:
```ts
export async function createOrder(req: Request): Promise<Response> {
const id = nanoid();
await orderRepo.insert({ id, ...req.body });
publishEvent("order.created", { id });
return res.status(201).json({ id });
}
```
Error Handling & Validation
- Validate inputs at ingress (controller/middleware) using Zod or Joi; never trust downstream data.
- Return 4xx for client faults, 5xx for server faults; never leak implementation details.
- Wrap async handlers with a global error boundary; log stack traces once per failure.
- Apply exponential back-off + jitter when retrying remote calls; cap retries to avoid thundering herd.
- Use typed error hierarchy:
```ts
class DomainError extends Error { readonly code = "DOMAIN"; }
class ValidationError extends DomainError { code = "VALIDATION"; }
```
Framework Rules – NestJS Microservices
- Use `@Module()` per bounded context; export only the public provider interfaces.
- Communicate between services via NATS or gRPC with protobuf-defined contracts.
- Controllers remain thin: map DTO ↔︎ domain model and delegate to use-cases.
- Enable `ShutdownSignal` hooks to drain connections before pod termination.
- Configuration comes exclusively from environment → validated by `@nestjs/config` with a schema.
Performance & Scalability
- Enable Kubernetes HorizontalPodAutoscaler on CPU *and* custom SQS/Kafka lag metrics.
- Cache read-heavy endpoints with Redis, keyed by stable request hash, TTL ≤ 5 minutes.
- Shard PostgreSQL by tenant_id using Citus or Crunchy Bridge when rows ≈ 100 million.
- Always make external calls time-bounded: set `timeout <= 80%` of client SLA.
- Use connection pooling (`pg` pool, min 2, max 20) and share across requests.
Testing Strategy
- Unit tests: >90% critical path coverage, run in <60 seconds.
- Contract tests: generate OpenAPI schemas and validate in CI against consumer expectations.
- Load tests: k6 scripts included in `load/` directory; gate merges on 95th-percentile latency thresholds.
- Chaos testing weekly: inject pod/network failures via Chaos Mesh.
Security Rules
- Secrets never in source; mount via Kubernetes Secrets + sealed-secrets operator.
- Enforce TLS 1.2+ everywhere; use mTLS for service-to-service traffic (istio sidecar).
- Rate-limit public endpoints (e.g., 100 req/min/IP) with API Gateway or NGINX Lua.
- Apply OAuth 2.0 / OIDC for authentication; store JWTs ≤15 minutes, refresh with rotated keys.
Infrastructure & DevOps
- Define all infra in Terraform; one module per service, versioned with the code.
- Images built with multi-stage Dockerfiles (builder → slim runtime). Tag `git sha` + `semver`.
- Canary deploy via progressive delivery (Argo Rollouts) with automatic rollback on SLO breach.
- Centralised logging (stdout → Fluent Bit → Loki) and metrics (Prometheus, Grafana dashboards).
Observability
- Trace every request with OpenTelemetry; propagate trace-id via `traceparent` header.
- Correlate logs, metrics, and traces using consistent service/component labels.
- Alert on RED metrics (Rate, Errors, Duration) and saturation (CPU >80%, DB connections >75%).
Common Pitfalls & How to Avoid Them
- Sticky sessions → move session state to Redis.
- Shared database across services → split schemas & own tables per service.
- Vertical scaling bias → define HPA/VPA before the first production deploy.
- Unbounded message queues → configure DLQ with max-retry + poison-pill detection.
Directory Structure
```
services/
order-service/
src/
controllers/
use-cases/
infra/
index.ts
Dockerfile
terraform/
charts/
packages/
shared-lib/
load/
```