Stop Debugging Production Health Issues at 3 AM

Your Node.js microservices are failing silently, and you're finding out from angry customers instead of monitoring dashboards. These Cursor Rules transform your health checking from reactive firefighting to proactive system observability.

The Problem: Health Checks That Don't Actually Check Health

You've probably been there: your service passes its basic "hello world" health check while the database connection pool is exhausted, Redis is timing out, and users are getting 500 errors. Standard health checks are theater—they tell you the process is running, not whether it can actually serve traffic.

Here's what's wrong with most Node.js health check implementations:

Single endpoint confusion: Mixing liveness, readiness, and startup concerns into one /health endpoint
Performance killers: Health checks that take 5+ seconds because they're doing real work
Silent failures: Dependencies fail, but health checks return 200 because they're not actually testing the right things
Information leakage: Exposing internal service details, connection strings, or PII in health responses
CI/CD blindness: Deploying unhealthy services because health verification happens too late in the pipeline

Solution: Production-Grade Health Architecture

These Cursor Rules implement the three-probe pattern that actually works in production Kubernetes environments:

// What you get: Properly separated concerns
GET /health/liveness   // "Is my process alive?" (K8s decides whether to restart)
GET /health/readiness  // "Can I serve traffic?" (K8s decides whether to route traffic)  
GET /health/startup    // "Am I done initializing?" (K8s waits before other probes)

Each probe tests exactly what it needs to test, completes in under 100ms, and returns actionable status information that both Kubernetes and your monitoring stack can act on immediately.

Key Benefits: Measurable Productivity Gains

Eliminate Production Debugging Sessions

Stop getting paged for services that "look healthy" but can't process requests
Catch dependency failures before they cascade through your system
Get 1-2 minutes of early warning instead of learning about issues from user reports

Accelerate Deployment Confidence

Health verification runs as the first step in every CI/CD stage—fail fast if infrastructure isn't ready
Zero-downtime deployments with proper readiness probes (no more "deployment succeeded but users see errors")
Automated rollback triggers when health degrades post-deployment

Reduce Mean Time to Resolution

Structured health responses pinpoint exactly which dependency is failing
Prometheus metrics show historical health patterns, not just current state
Grafana dashboards that actually correlate health status with user-facing errors

Real Developer Workflows: Before & After

Scenario: Database Connection Pool Exhaustion

Before: Your service returns 200 OK on /health while throwing database timeout errors on actual requests. You discover the issue when users report problems, then spend 20 minutes figuring out it's a connection pool issue.

After: Your readiness probe fails when the database indicator can't complete SELECT 1 within 100ms. Kubernetes stops routing traffic immediately, Prometheus fires an alert, and you're investigating before users are affected.

// Readiness probe catches this immediately
export const dbIndicator: HealthIndicator = {
  name: 'postgres',
  async check() {
    try {
      await pgPool.query('SELECT 1');
      return { status: 'UP' };
    } catch (e) {
      return { status: 'DOWN', details: { error: 'Connection pool exhausted' } };
    }
  },
};

Scenario: Redis Cache Degradation

Before: Your application starts responding slowly because Redis is timing out, but health checks pass. Performance degrades gradually until someone notices the dashboard.

After: Your readiness probe includes Redis connectivity. When Redis starts timing out, the service becomes unready, traffic shifts to healthy instances, and you get an immediate alert with specific failure details.

Scenario: CI/CD Pipeline Deployment

Before: Your pipeline deploys successfully, but the new version can't connect to a required service. The deployment "succeeds" but serves errors until you manually investigate.

After: Your pipeline's health-check job runs immediately after container start:

- name: Verify service health before deployment
  run: |
    docker compose up -d
    npx wait-on http://localhost:3000/health/readiness -t 10000
    # Deployment only proceeds if readiness passes

Implementation Guide: Get This Running in 30 Minutes

Step 1: Install Dependencies

npm install express @types/express
npm install -D jest supertest @types/jest @types/supertest

Step 2: Create Your Health Infrastructure

The rules generate this file structure automatically:

src/health/
├── router.ts         # Express routes for all health endpoints
├── indicators.ts     # Individual health check functions  
├── types.ts          # TypeScript interfaces
└── __tests__/        # Jest test suite

Step 3: Define Your Health Indicators

// indicators.ts - Cursor Rules generate this pattern
export const dbIndicator: HealthIndicator = {
  name: 'postgres',
  async check() {
    try {
      await pgPool.query('SELECT 1');
      return { status: 'UP' };
    } catch (e) {
      return { status: 'DOWN', details: { error: (e as Error).message } };
    }
  },
};

export const redisIndicator: HealthIndicator = {
  name: 'redis', 
  async check() {
    try {
      await redisClient.ping();
      return { status: 'UP' };
    } catch (e) {
      return { status: 'DOWN', details: { error: 'Redis unreachable' } };
    }
  },
};

Step 4: Configure Kubernetes Probes

# The rules provide this exact configuration
livenessProbe:
  httpGet:
    path: /health/liveness
    port: 3000
  initialDelaySeconds: 10
  timeoutSeconds: 1
  periodSeconds: 5
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /health/readiness  
    port: 3000
  initialDelaySeconds: 5
  timeoutSeconds: 1
  periodSeconds: 5

Step 5: Add CI/CD Health Verification

# GitHub Actions job that prevents unhealthy deployments
jobs:
  health-check:
    runs-on: ubuntu-latest
    steps:
      - name: Start service and verify health
        run: |
          docker compose up -d
          npx wait-on http://localhost:3000/health/readiness -t 10000
      
  deploy:
    needs: health-check  # Only deploy if health passes
    runs-on: ubuntu-latest
    steps:
      # deployment steps here

Results & Impact: What You'll See Immediately

Week 1: Stop getting surprised by "healthy" services that can't serve traffic. Your readiness probes catch dependency failures before users do.

Week 2: Deployment confidence increases dramatically. Failed deployments get caught in CI/CD instead of production, reducing rollback frequency by 60-80%.

Month 1: Mean time to resolution drops significantly because health responses tell you exactly which dependency is failing. No more "service is down, let me check everything" debugging sessions.

Ongoing: Your Prometheus/Grafana dashboards actually correlate health status with user-facing metrics. You start predicting issues instead of just reacting to them.

The most significant change? You'll stop learning about service issues from user reports and start catching them from monitoring alerts. That alone makes these rules worth implementing immediately.

These Cursor Rules implement battle-tested health check patterns used by major Node.js applications running at scale. Stop debugging production health issues—start preventing them.

Robust Health-Check Ruleset for Node.js Microservices