Comprehensive Rules for building, testing, and operating dynamic, high-throughput API rate-limiting middleware.
You've been there. Your API gets slammed during peak traffic, legitimate users can't connect, and abusive clients drain your resources. You throw together basic rate limiting, only to discover it breaks under load, lacks observability, and becomes impossible to tune without code changes.
Most rate limiting implementations fail in production because they're built as afterthoughts:
These aren't just inconveniences—they directly impact your API's reliability, user experience, and operational costs.
These Cursor Rules generate a production-grade rate limiting middleware that treats traffic management as a first-class system concern. Instead of basic request counting, you get adaptive algorithms, real-time observability, and operational flexibility that handles millions of requests per second.
What makes this different:
Eliminate Rate Limit-Related Incidents: Robust failure modes mean your API stays available even when rate limiting infrastructure fails. No more 3am pages because Redis went down and took your entire API with it.
Real-Time Traffic Insights: Built-in Prometheus metrics and request tracing let you understand usage patterns and optimize limits based on actual behavior, not guesswork. See exactly which clients are hitting limits and adjust accordingly.
Zero-Deployment Configuration Changes: Modify rate limits, add new user tiers, or adjust algorithms through configuration updates—no code changes or service restarts required.
Sub-Millisecond Performance: Optimized Redis operations and efficient algorithms ensure rate limiting adds less than 1ms p95 latency to your requests.
Before: Hard-coded limits in middleware that require deployments to adjust:
// Brittle and inflexible
app.use(rateLimit({ windowMs: 15 * 60 * 1000, max: 100 }));
After: Configuration-driven limits with tier-based controls:
app.use(rateLimiter({
default: { capacity: 100, refillRate: 1 },
tiers: {
premium: { capacity: 1000, refillRate: 10 },
enterprise: { capacity: 5000, refillRate: 50 }
}
}));
Instead of guessing why clients are getting rate limited, you get structured metrics and logs:
// Automatic Prometheus metrics
rate_limit_blocked_total{tier="free", reason="capacity_exceeded"} 1547
rate_limit_allowed_total{tier="premium"} 89234
// Structured logging with trace correlation
{
"level": "info",
"msg": "rate limit applied",
"trace_id": "abc123",
"user_tier": "premium",
"remaining_tokens": 847,
"reset_time": 1698765432
}
When Redis is unavailable, the system doesn't crash or allow unlimited access—it degrades gracefully:
// Automatic fallback with conservative limits
if (redisUnavailable) {
// Apply 1 req/sec fallback per client
// Log error with trace ID
// Return 503 if fallback budget exceeded
}
npm install express redis @prometheus-prom-client
import { rateLimiter } from './rate-limiter';
const app = express();
// Configure adaptive rate limiting
app.use(rateLimiter({
algorithms: {
default: 'token_bucket',
burst_protection: 'leaky_bucket'
},
tiers: {
free: { capacity: 100, refillRate: 1 },
premium: { capacity: 1000, refillRate: 10 }
},
redis: {
url: process.env.REDIS_URL,
cluster: true
}
}));
The middleware automatically exposes metrics and adds request context:
app.get('/api/data', (req, res) => {
// Access rate limit context added by middleware
const { remaining, resetTime } = req.rateLimit;
if (remaining < 10) {
// Queue heavy operations when near limit
queueHeavyJob(req.body);
}
res.json({ data: processRequest(req.body) });
});
Set up monitoring based on the built-in metrics:
# Prometheus alert rules
groups:
- name: rate_limiting
rules:
- alert: HighRateLimitBlocking
expr: rate(rate_limit_blocked_total[5m]) > 100
annotations:
summary: "High rate of blocked requests"
Performance: Sub-millisecond middleware latency with Redis cluster support for millions of requests per second
Reliability: Graceful degradation when dependencies fail—your API stays available with conservative fallback limits
Operational Efficiency: Configuration-driven limits eliminate deployments for rate limit adjustments. Real-time metrics provide immediate insight into traffic patterns and abuse detection.
Developer Experience: Comprehensive TypeScript types, extensive test coverage, and clear error messages make the system easy to integrate and debug.
Client Relations: Standard HTTP headers and precise retry guidance help API consumers implement proper backoff strategies, reducing support overhead.
Stop wrestling with brittle rate limiting that breaks under pressure. Build adaptive traffic control that scales with your API and provides the observability you need to optimize performance in production.
You are an expert in:
- TypeScript 5.x / ECMAScript 2023
- Node.js 20 LTS
- Express 4.x
- Redis 7.x (single-node or clustered)
- Prometheus & Grafana for metrics
- Kubernetes (Ingress-NGINX / Gateway API)
Key Principles
- Build defensive, self-contained middleware that enforces fair usage without impacting latency.
- All limits are data-driven and adaptive; never hard-code magic numbers.
- Always communicate remaining quota and reset time through standard headers.
- Separate compute (application) and state (rate-limit counters) so scaling the API does not break consistency.
- Fail-closed: if the rate-limit store is unavailable, default to sensible low limits to protect the system.
- Instrument everything: emit metrics, logs, and traces for every decision path.
- Provide override mechanisms (service tokens, premium tiers) via configuration, not code changes.
TypeScript Rules
- `strict` compiler option MUST be enabled; use `es2023` target.
- Export pure functions (`function` keyword) for algorithms (token bucket, leaky bucket, sliding window).
- Use `interface` for DTOs (`RateLimitConfig`, `RateLimitResult`).
- Prefer `readonly` modifiers and `const` assertions to guarantee immutability of algorithm parameters.
- File layout per feature:
├── index.ts // public entry: Express middleware factory
├── algorithms/ // pure, stateless implementations
├── storage/ // Redis adapter, in-memory fallback
├── types/ // shared interfaces & enums
└── tests/
- Never use `any`; fallback to `unknown` with exhaustive narrowing.
- Reject deeply nested `if` blocks—use early returns and guard clauses.
Error Handling and Validation
- Validate all incoming identifiers (API key, userId, IP) with runtime type guards before looking up counters.
- Return HTTP 429 with JSON body:
`{ error: "RATE_LIMIT_EXCEEDED", retry_after: number }`.
- Add `Retry-After` header in seconds, computed as ceil(resetTs - now).
- When the underlying store is unreachable:
• Log at `error` level with trace-id.
• Apply a conservative fallback of 1 request / second per key.
• Return HTTP 503 if fallback budget is exhausted.
- Wrap Redis calls in `try/catch`; surface only sanitized messages to clients.
Express-Specific Rules
- Register the rate-limiter as the first middleware after authentication.
- Signature: `rateLimiter(config: RateLimitConfig): RequestHandler`.
- Extract `req.rateLimit` (added by middleware) so downstream handlers can adapt (e.g., queue heavy jobs if near limit).
- Never block the event loop—use Redis atomic Lua scripts or `EVALSHA` for counter updates.
- Cache static configuration (tiers, bucket sizes) in memory and hot-reload on `SIGHUP`.
Algorithms
- Token Bucket (default):
• Parameters: `capacity`, `refillRate` (tokens/sec).
• Redis keys: `tb:{key}`.
• Use `HSET` to store `{tokens, last_refill_ts}`.
- Sliding Window Log (high accuracy):
• Parameters: `windowSize`, `maxHits`.
• Redis key is a sorted set `sw:{key}` with timestamps as scores.
• Trim with `ZREMRANGEBYSCORE` then `ZCARD` for count.
- Leaky Bucket (burst smoothing):
• Parameters: `ratePerSec`, `queueSize`.
• Implement via atomic list push + Lua pop by time.
Additional Sections
Testing
- Use `vitest` for unit tests; mock Redis with `@redis-mock/client`.
- Cover:
• Algorithm correctness (edge: zero tokens, refill)
• Concurrency (simulate 100 parallel requests)
• Failure modes (Redis down)
- Achieve ≥ 90 % branch coverage.
Performance & Scalability
- Ensure middleware latency ≤ 1 ms p95 (measured without network).
- Use Redis Cluster with key hash-tags to keep all counters for a user on the same shard.
- Enable `tcp_nodelay` and pipeline reads when updating multiple keys.
- In Kubernetes, prefer `PodDisruptionBudget` to avoid counter loss during rolling updates.
Observability
- Expose Prometheus metrics:
• `rate_limit_allowed_total{key_tier="free"}`
• `rate_limit_blocked_total{reason="exceeded"}`
• Histogram `rate_limit_middleware_duration_seconds`.
- Correlate logs with trace-id from `req.id` header.
Security
- Sanitize all user-supplied keys to prevent Redis key injection.
- Do not reflect quota values for anonymous users to avoid enumeration.
- Store secrets (Redis password, tier overrides) in Kubernetes Secrets, not env files.
Common Pitfalls & Fixes
- ❌ Hard-coding limits in code → ✅ Read from `RateLimitConfig` backed by etcd/Consul.
- ❌ Using IP alone → ✅ Combine API key + userId + IP for better fairness.
- ❌ Ignoring clock skew → ✅ Use Redis `TIME` command inside Lua script as single source of truth.
- ❌ Linear backoff on client → ✅ Recommend exponential backoff 2^n with jitter up to 1 s.
Example Usage
```ts
import express from "express";
import { rateLimiter } from "./index";
const app = express();
app.use(rateLimiter({
default: { capacity: 100, refillRate: 1 },
tiers: {
premium: { capacity: 1000, refillRate: 10 }
}
}));
app.get("/v1/data", (_, res) => res.json({ ok: true }));
app.listen(8080);
```