Opinionated coding & architecture rules for building Go-based, cloud-native distributed systems that proactively eliminate traffic and data hot spots.
Tired of watching your distributed Go services melt down under traffic spikes? Fed up with mysterious performance bottlenecks that emerge from nowhere and crater your SLAs? You know the drill – everything works fine in testing, then production traffic hits and suddenly one node is pegged at 100% while others sit idle.
Hot spots are the silent killers of distributed systems. They happen when traffic, data, or computational load concentrates on specific nodes instead of distributing evenly across your cluster. The result? Cascading failures, degraded user experience, and 3 AM emergency calls.
Here's what's probably happening in your system right now:
Traditional monitoring catches hot spots after they've already tanked your performance. By then, you're in reactive firefighting mode instead of proactive prevention.
These Cursor Rules implement a battle-tested methodology for building Go-based distributed systems that eliminate hot spots before they occur. Instead of reactive monitoring, you get proactive architecture patterns that distribute load evenly and handle traffic spikes gracefully.
The rules cover the entire stack – from Go code patterns that avoid bottlenecks, to Kubernetes configurations that ensure even pod distribution, to database schemas that prevent data hot spots.
Eliminate Traffic Concentration
Prevent Data Hot Spots
Build Resilient Infrastructure
Gain Operational Visibility
Before: Your user activity table uses sequential IDs, causing all new writes to hit the same database node.
-- This creates hot spots
CREATE TABLE user_actions (
id SERIAL PRIMARY KEY,
user_id UUID,
action JSONB,
created_at TIMESTAMP DEFAULT NOW()
);
After: The rules guide you to hash-prefixed keys that distribute writes evenly:
-- This prevents hot spots
CREATE TABLE user_actions (
pk BYTES PRIMARY KEY DEFAULT hash64(uuid_v4()::STRING),
user_id UUID,
action JSONB,
ts TIMESTAMPTZ DEFAULT clock_timestamp()
);
Result: Write performance scales linearly with cluster size instead of bottlenecking on single nodes.
Before: Your Kubernetes ingress uses session affinity, routing power users to the same overwhelmed pods.
After: The rules configure Traefik with consistent hashing that distributes load while maintaining performance:
# Traefik configuration for hot spot avoidance
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
spec:
routes:
- match: Host(`api.example.com`)
kind: Rule
services:
- name: api-service
loadBalancer:
method: drr # Dynamic round-robin
stickiness: false
consistentHashing: true
Result: Traffic distributes evenly across pods, eliminating performance degradation from user concentration.
Before: Your Go service spawns unlimited goroutines, leading to resource exhaustion under load.
After: The rules enforce goroutine budgeting:
// Bounded goroutine pool prevents resource exhaustion
maxGoroutines := runtime.GOMAXPROCS(0) * 4
semaphore := make(chan struct{}, maxGoroutines)
func handleRequest(ctx context.Context) {
select {
case semaphore <- struct{}{}:
defer func() { <-semaphore }()
// Process request
case <-ctx.Done():
// Reject overload with 429
return
}
}
Result: Predictable resource usage and graceful degradation instead of cascading failures.
Copy the rules configuration into your Cursor settings. The rules automatically activate for Go projects with distributed system markers (Kubernetes manifests, Docker files, or microservice directory structures).
Audit your existing table schemas for sequential primary keys. The rules will suggest hash-prefixed alternatives and flag hot spot risks during code review.
Update your Kubernetes ingress and service configurations. The rules provide specific annotations and configuration blocks for Traefik, NGINX Plus, and AWS ALB.
Add the required monitoring dashboards and alerting rules. The rules specify exact metrics (partition heatmaps, tail latency histograms) and alert thresholds (per-node QPS monitoring).
Run the prescribed chaos tests that inject 5x traffic spikes. The rules define pass/fail criteria: p99 latency < 2x baseline with no pod restarts.
Performance Gains
Operational Benefits
Development Velocity
These rules transform hot spot management from reactive firefighting into proactive system design. Your distributed Go services will handle traffic spikes gracefully, scale predictably, and maintain consistent performance under load.
The difference is systematic prevention versus reactive patching. Stop chasing hot spots – build systems that eliminate them by design.
You are an expert in Go, Distributed Systems, CockroachDB, Consistent Hashing, Kubernetes, Traefik, AWS Elastic Load Balancing, and SRE automation.
Key Principles
- Eliminate single-node or single-shard saturation ("hot spots") through uniform data distribution and adaptive traffic steering.
- Design first for horizontal scalability; add vertical scaling only as a stop-gap.
- Push logic to the edge (CDN/sidecars) when possible to lower core‐cluster pressure.
- Favor stateless services; keep state in partition-tolerant data stores with automatic rebalancing.
- Prefer idempotent, async, and batched APIs to reduce per-request overhead.
- Build everything observable: every partition, request, and retry must be measurable.
Go (Language-Specific Rules)
- Always pass context.Context as the first param; cancel early on overload (ctx.Err() == context.Canceled || DeadlineExceeded).
- Export only load-balanced APIs; internal helpers live in pkg/internal to avoid misuse.
- Use sync/atomic or lock-free structures for hot counters; avoid global maps.
- When hashing keys, use xxhash32 or fnv.New64a; never crypto/sha* in hot paths.
- Goroutine budgets: ≤ GOMAXPROCS * 4 outstanding routines per instance; drop excess.
- Enforce structured logging with zap: field keys kebab-case (e.g., partition-id).
- File layout: cmd/, internal/, pkg/, deploy/. Each microservice owns its own Dockerfile & Helm chart.
Error Handling and Validation
- Reject overload at the ingress layer (HTTP 429) before work starts.
- Implement a token bucket per partition and per IP (leaky bucket fallback) using redis-cell or in-process ratelimiter.
- Bulkhead pattern: isolate DB, cache, and external calls with separate worker pools.
- Circuit Breaker default thresholds: ≥50% errors or p95 latency > configured SLO for 30 s triggers open.
- Use early returns; wrap errors with %w and annotate with "partition-id", "node", "trace-id".
Kubernetes (Framework-Specific Rules)
- Deploy stateful data stores (CockroachDB, Redis Cluster) with podManagementPolicy: Parallel for faster re-balancing.
- Set topologySpreadConstraints to enforce even pod distribution across zones.
- Use Pod Disruption Budgets: minAvailable ≥ replicas – 1 to avoid thundering restarts.
- Enable horizontal pod autoscaler on both CPU and custom metric request_per_second; target 60–70% utilization.
- Ingress: Traefik with consistentHashing: true on cookie/user-id; fallback to leastConnections.
CockroachDB (Hot-Spot Avoidance)
- Always include a random UUID prefix or hash in primary keys to ensure even keyspace spread.
Example:
```sql
CREATE TABLE user_actions (
pk BYTES PRIMARY KEY DEFAULT hash64(uuid_v4()::STRING),
user_id UUID,
action JSONB,
ts TIMESTAMPTZ DEFAULT clock_timestamp()
);
```
- Turn on `experimental_repartitioning` for large range splits.
- Monitor `cr.node.qps` and alert when any node qps > cluster_qps/replicas * 1.3.
Traefik / NGINX Plus
- Enable loadBalancer.method = "drr" (dynamic round-robin) + `stickiness=false` for pure stateless APIs.
- Cache control headers: set max-age to 0 on endpoints that mutate state to avoid stale journey.
AWS Elastic Load Balancer
- For ALB, enable slow-start mode 30 s to prevent hot pod selection on scale-up.
- Configure cross-zone load balancing on; otherwise AZ imbalance causes hot zones.
Testing & Observability
- Chaos tests: inject 5× traffic spikes for 2 min; assert p99 < 2× baseline and no pod restarts.
- Canary every config change with 1% traffic for 20 min; promote only if error_rate_delta ≤ 0.1%.
- Dashboard minimums: partition heatmap, tail latency histogram, slow query log (CockroachDB), request queue depth.
Performance
- Batch writes (≥ 16 rows) or RPCs when possible; amortize coordination cost.
- Use HTTP/2 multiplexing; disable HTTP keep-alive timeouts < 30 s.
- Edge compute: run WebAssembly filters on Fastly for auth and routing decisions to shed invalid traffic early.
Security
- Validate hash keys against length/charset to avoid hash flooding.
- Encrypt all inter-service traffic with mTLS (SPIRE or cert-manager); rotate every 24 h.
- Limit public ingress to only required paths; expose admin endpoints on cluster-IP only.
Common Pitfalls
- Sequential UUIDs or timestamp prefixes ➜ immediate range hot spot.
- Sticky sessions on L7 LB ➜ single pod hot spot.
- Region-specific data affinity without correct multi-region replicas ➜ cross-region hot spot.
Checklist (Before Merge)
[ ] Key hashing logic includes randomness.
[ ] Rate-limiting and circuit breaker configs committed and tested.
[ ] Alert rules for per-node QPS and tail latency defined.
[ ] k6/locust performance profile attached in PR.
[ ] Helm chart topologySpreadConstraints reviewed.