End-to-end coding & IaC guidelines for designing, provisioning, and operating highly-available load-balancing layers across cloud and on-prem environments.
Production load balancers failing at 3 AM? Configuration drift breaking your deployments? Spending hours debugging traffic routing instead of shipping features? You're not alone.
Most teams treat load balancer configuration as a second-class citizen—manual tweaks through web consoles, inconsistent configurations across environments, and zero visibility into what actually changed when something breaks.
Here's what happens when load balancing isn't treated as code:
These Cursor Rules transform load balancer management from reactive firefighting into predictable, automated infrastructure operations. Every change goes through code review, every configuration is version controlled, and every deployment is repeatable.
What You Get:
# Before: Manual console clicking, inconsistent environments
# After: Declarative, version-controlled configuration
resource "aws_lb_listener" "https" {
for_each = var.listeners
load_balancer_arn = aws_lb.main.arn
port = each.value.port
protocol = "HTTPS"
ssl_policy = var.ssl_policy
certificate_arn = each.value.cert_arn
}
upstream api_backend {
zone api 64k;
least_conn;
server 10.0.1.11 max_fails=3 fail_timeout=30s;
health_check interval=10 fails=3 passes=2;
}
Before (Manual Process - 45 minutes):
After (Code-First Process - 5 minutes):
# Add to terraform/modules/alb/variables.tf
listeners = {
api_v2 = {
port = 443
cert_arn = data.aws_acm_certificate.main.arn
path = "/api/v2/*"
priority = 50
}
}
terraform plan to review changesBefore (Panic Mode - 20+ minutes):
After (Automated Response - 2 minutes):
# Automated health check handles failover
# Dead letter queue captures rejected requests
# Prometheus alerts trigger before users notice
if healthy_targets < min_required:
trigger_failover_to_secondary_region()
send_pagerduty_alert("Automatic failover initiated")
mkdir -p terraform/{modules,environments,examples}
cd terraform/modules && mkdir -p {alb,nginx,haproxy}
# terraform/modules/alb/main.tf
resource "aws_lb" "main" {
name = "${var.environment}-${var.component}-lb"
internal = var.internal
load_balancer_type = "application"
security_groups = var.security_groups
subnets = var.subnets
tags = merge(var.common_tags, {
Name = "${var.environment}-${var.component}-lb"
})
}
# terraform/modules/alb/variables.tf
variable "health_check_path" {
description = "Health check path for target group"
type = string
default = "/health"
validation {
condition = can(regex("^/", var.health_check_path))
error_message = "Health check path must start with /"
}
}
# .github/workflows/terraform.yml
- name: Terraform Validate
run: |
terraform validate
tflint --recursive
checkov -d . --framework terraform
# CloudWatch alarms for automatic scaling
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
alarm_name = "${var.environment}-lb-high-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "TargetResponseTime"
threshold = "1"
}
resource "aws_lb_listener_rule" "canary" {
listener_arn = aws_lb_listener.main.arn
priority = var.canary_priority
action {
type = "weighted_forward"
forward {
target_group {
arn = aws_lb_target_group.blue.arn
weight = var.blue_weight
}
target_group {
arn = aws_lb_target_group.green.arn
weight = var.green_weight
}
}
}
}
def check_regional_health():
primary_health = get_target_group_health('us-east-1')
if primary_health.healthy_count < min_required_instances:
update_route53_weighted_routing(
primary_weight=0,
secondary_weight=100
)
notify_team("Automatic failover to us-west-2 initiated")
These rules don't just improve your load balancer configuration—they transform how your entire team approaches infrastructure. No more manual changes, no more configuration drift, no more 3 AM emergency fixes.
Ready to automate everything? Implement these Cursor Rules and start treating your load balancers like the critical infrastructure they are. Your future self (and your sleep schedule) will thank you.
The question isn't whether you should automate your load balancer management—it's whether you can afford not to.
You are an expert in Terraform (HCL), AWS Elastic Load Balancer, NGINX/NGINX Plus, HAProxy, Traefik, Kubernetes Ingress, Python scripting, Prometheus/Grafana, and CI/CD automation.
Key Principles
- Build stateless, horizontally scalable services; never couple application state to a single node.
- Treat load-balancer configuration as immutable Infrastructure-as-Code (IaC) reviewed through pull requests.
- Prefer Layer 7 balancing for content-aware routing; fall back to Layer 4 for ultra-low-latency transport.
- Design for zero-downtime deployments using blue/green or canary patterns managed by the load balancer.
- Instrument everything: emit metrics, traces, and logs that can be scraped in <15 s intervals.
- Enforce security at the edge: TLS-1.2+, HSTS, WAF rules, rate limiting, and DDoS shields.
- Choose algorithms deliberately: Round-Robin for homogeneous fleets, Weighted Least Conn for mixed capacity, IP-Hash for session affinity.
Terraform (HCL)
- Directory layout: `modules/`, `environments/`, `examples/`; each module = single LB provider.
- Required variables first, optional vars with sane defaults next. Document every var via `description` + `type` + `validation`.
- Name resources using pattern `<env>-<component>-<sequence>` (e.g., `prod-web-lb-01`).
- Always tag resources: `Environment`, `Owner`, `CostCentre`, `GitSha`.
- Wrap repeating backend or listener blocks in `dynamic` constructs; keep lists in locals.
- Use `for_each` over `count` when deploying multiple listeners to avoid index drift.
- Example:
```hcl
resource "aws_lb_listener" "https" {
for_each = var.listeners
load_balancer_arn = aws_lb.main.arn
port = each.value.port
protocol = "HTTPS"
ssl_policy = var.ssl_policy
certificate_arn = each.value.cert_arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.app[each.key].arn
}
}
```
Python (Support Scripts)
- Use typing + `pydantic` models for validating generated LB config files.
- Wrap AWS calls in `try/except botocore.exceptions.ClientError` and surface actionable logs only.
- Implement retries with `tenacity` (max 3, exponential back-off starting 250 ms).
- Never embed secrets; read via environment or AWS Secrets Manager.
Error Handling and Validation
- Health-check every target at 10 s interval with 5 s timeout; deregister after 3 consecutive failures.
- Build early-exit guards:
- No healthy targets ⇒ automatically fail over to secondary region.
- Certificate expiration <15 days ⇒ raise PagerDuty P1.
- Validate configuration manifests through CI: run `terraform validate` → `tflint` → `checkov`.
- Log rejections in a dead-letter queue for post-mortem.
AWS Elastic Load Balancer Rules
- ALB for HTTP/HTTPS; define listener rules ordered by priority (1 = exact path, 100 = catch-all).
- Use target group stickiness only when session affinity is required; prefer JWT-based stateless auth.
- Enable access logs to S3 with lifecycle policy ≤30 days for compliance.
- Attach WAFv2 with managed OWASP Top-10 rules + custom rate-limit 100 req/IP/10 s.
NGINX / NGINX Plus Rules
- Use `stream` block for Layer 4, `http` for Layer 7; never mix in one file.
- Place `include /etc/nginx/conf.d/*.conf;` after global settings.
- Cache static assets (`.css`, `.js`) 24 h with `proxy_cache_valid`.
- Configure `health_check interval=10 fails=3 passes=2;` in `upstream`.
- Example snippet:
```nginx
upstream api_backend {
zone api 64k;
least_conn;
server 10.0.1.11 max_fails=3 fail_timeout=30s;
server 10.0.1.12 max_fails=3 fail_timeout=30s;
}
```
HAProxy Rules
- Global maxconn tuned to `(ulimit -n) - 1024`.
- Use `http-request set-header X-Forwarded-Proto https if { ssl_fc }` to maintain scheme.
- Enable `spread-checks 5` to avoid health-check bursts.
Traefik Rules (Kubernetes)
- Define IngressRouteCRD; annotate services with `traefik.ingress.kubernetes.io/router.entrypoints`.
- Enable `traefikPilot.dashboard=false` in Helm values for security.
- Use middleware chains: `redirectscheme` → `ratelimit` → `headers@file`.
Testing & Observability
- Load test weekly with k6 scripts: soak (2 h), spike (10 min @ 150 % peak), stress (until 5 % error rate).
- Budget ≤75 % CPU utilisation per node; auto-scale target groups on 60 % average.
- Expose metrics:
- ALB/ELB → CloudWatch metrics `TargetResponseTime`, `HTTPCode_Target_5XX_Count`.
- NGINX/HAProxy → Prometheus via `exporter` on :9113.
- Dashboards: error rate, p50/p95 latency, healthy host count, in-flight requests.
Security
- Terminate TLS at edge; re-encrypt to backend if crossing trust boundary.
- Rotate ACM certificates via AWS Certificate Manager DNS validation; automate with `certbot` for self-managed LBs.
- Enforce TLS 1.2+, disable legacy ciphers (`!RC4`, `!3DES`).
- Apply IP allow-lists using security groups or `acl src` in HAProxy.
Performance Optimisation
- Prefer EC2 NLB for TCP heavy traffic; supports 100 k RPS with sub-ms overhead.
- Enable HTTP/2 & gzip in NGINX; brotli when supported by client.
- Collocate edge LBs using AWS Global Accelerator or Cloudflare Anycast to minimise latency.
CI/CD Integration
- Pipeline order: `tflint` → `checkov` → `terraform plan -out=tf.plan` → manual approval → `terraform apply tf.plan`.
- Use Terraform Cloud workspaces per environment with remote state locked.
- Push NGINX/HAProxy config via immutable Docker image; helm upgrade Traefik with `--atomic`.
Common Pitfalls & Anti-patterns
- ❌ Hard-coding instance IDs in target groups.
- ❌ Single AZ deployment.
- ❌ Mixing Layer 4 & Layer 7 on same listener.
- ✅ Use DNS alias records; TTL ≤60 s for fast failover.
Documentation
- Each module must include `README.md` with input/outputs example.
- Diagrams via `terraform graph | dot -Tpng` committed to repo.