Comprehensive Guideline Rules focused on designing, implementing, and operating a production-grade error handling and reporting pipeline in modern backend services.
Turn invisible failures into actionable insights with production-grade error handling that scales with your backend services.
You're shipping features fast, but every production incident costs hours of detective work. Stack traces disappear into log files. Critical errors get buried in noise. Users hit walls with generic 500s while you scramble through distributed logs trying to piece together what went wrong.
The real problem isn't the errors—it's that you can't see them coming or respond fast enough when they hit.
Modern backend services fail in complex ways: network timeouts cascade through microservices, database connections pool out during traffic spikes, and third-party APIs return unexpected responses. Without structured error handling, these failures become expensive mysteries that damage user experience and team productivity.
These rules transform your TypeScript/Go backend into an observable, self-healing system that catches problems before they become incidents. Instead of hunting through logs after the fact, you get:
Example transformation:
Before:
app.post('/users', async (req, res) => {
try {
const user = await createUser(req.body);
res.json(user);
} catch (err) {
console.log('Error creating user:', err);
res.status(500).json({ error: 'Something went wrong' });
}
});
After:
app.post('/users', wrapAsync(async (req, res) => {
assertValidEmail(req.body.email);
const user = await createUser(req.body);
res.json(user);
}));
// Central error handler automatically:
// - Enriches with trace ID and context
// - Reports to Sentry with impact classification
// - Returns proper HTTP status with domain error code
// - Emits metrics for monitoring dashboards
Stop jumping between terminals, log aggregators, and monitoring dashboards. Correlation IDs automatically connect every log entry, trace span, and error report across your entire service mesh.
Pre-classified error types with enriched context mean you know exactly what broke and where—before your users report it.
Whether you're running a monolith or 50 microservices, centralized error handling adapts without duplicating code across services.
Circuit breakers, exponential backoff, and retry logic become first-class citizens in your error handling pipeline, making your services self-healing.
Before: Service starts returning 500s. You ssh into production, grep through logs, discover connection pool warnings buried 200 lines up, then spend 30 minutes correlating the timeline.
After:
// Connection wrapper with automatic reporting
const withConnection = wrapRetryable(async <T>(fn: (conn: Connection) => Promise<T>) => {
const conn = await pool.acquire();
if (!conn) {
throw new OperationalError('CONNECTION_POOL_EXHAUSTED', {
retryable: false,
severity: 'critical'
});
}
return fn(conn);
});
Your dashboard immediately shows the CONNECTION_POOL_EXHAUSTED spike, circuit breaker activates to protect the database, and you get a Slack alert with the exact service and correlation ID.
Before: API gateway times out, but you don't know if it's the auth service, user service, or payment service failing downstream.
After:
// Every service call automatically enriched
const user = await withTracing('user-service-call', () =>
userService.getById(userId, { traceId: req.traceId })
);
OpenTelemetry traces show the exact service chain, Temporal workflows handle compensation logic, and intelligent error grouping separates the root cause from cascading symptoms.
Before: Invalid data gets processed, stored, and corrupts downstream calculations. You discover it days later during data analysis.
After:
export function assertValidUser(user: unknown): asserts user is User {
if (!isValidEmail(user.email)) {
throw new ValidationError('INVALID_EMAIL', {
field: 'email',
value: redact(user.email)
});
}
// Fails fast at service boundary
}
Guards validate every input immediately. Structured validation errors include field-level context. Bad data never enters your system.
Install the cursor rules and create your error handling foundation:
// errors/base.ts - Domain-specific error classes
export abstract class AppError extends Error {
abstract readonly code: string;
abstract readonly status: number;
abstract readonly retryable: boolean;
constructor(message: string, public readonly context: Record<string, unknown> = {}) {
super(message);
this.name = this.constructor.name;
}
}
export class ValidationError extends AppError {
readonly code = 'VALIDATION_FAILED';
readonly status = 400;
readonly retryable = false;
}
// middleware/error-handler.ts
app.use((err: Error, req: Request, res: Response, next: NextFunction) => {
const enriched = enrichError(err, {
traceId: req.traceId,
service: 'user-api',
endpoint: req.path
});
logger.error(enriched, err.message);
errorReporter.capture(enriched);
metrics.increment('errors_total', { code: enriched.code });
res.status(enriched.status).json({
error: enriched.code,
message: enriched.userMessage
});
});
// observability/tracing.ts
export const withTracing = <T>(operationName: string, fn: () => Promise<T>) => {
const span = tracer.startSpan(operationName);
return fn()
.catch(err => {
span.recordException(err);
span.setStatus({ code: SpanStatusCode.ERROR });
throw err;
})
.finally(() => span.end());
};
// utils/retry.ts
export const withRetry = async <T>(
fn: () => Promise<T>,
options: RetryOptions = {}
): Promise<T> => {
const { maxAttempts = 3, backoff = 'exponential' } = options;
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
return await fn();
} catch (err) {
if (!isRetryable(err) || attempt === maxAttempts) {
throw err;
}
await delay(calculateBackoff(attempt, backoff));
}
}
};
The bottom line: These rules don't just handle errors better—they eliminate entire categories of production problems while giving you the observability to prevent new ones. Your backend becomes antifragile: it gets stronger under stress instead of breaking down.
Stop hunting bugs in production. Start building systems that tell you exactly what's wrong and fix themselves when possible.
You are an expert in TypeScript (Node.js), Go, structured logging (pino, logrus, zap), distributed tracing (OpenTelemetry), monitoring (Prometheus, Grafana), error aggregation (Sentry, Rollbar), and durable execution (Temporal).
Key Principles
- Fail-Fast & Surface Early: validate inputs immediately; abort on invalid state.
- Single Source of Truth: centralize error capture, enrichment, and dispatch.
- Structured, Machine-Readable Output: emit logs and errors as JSON with consistent shape.
- Correlate Everything: attach correlation/request IDs to every log, trace, and error.
- Distinguish Error Classes: separate Operational (recoverable) vs. Programmer (bug) errors.
- Intelligent Noise Reduction: group, de-duplicate, rank by impact before alerting.
- Observability-Driven Development: treat telemetry as a first-class feature—test it.
TypeScript (Node.js)
- Use `never` returning guard functions for pre-condition checks.
- Create domain-specific error classes that extend `AppError` (base) and embed:
• `code` – stable string key (e.g., "USER_NOT_FOUND").
• `status` – HTTP status for web contexts.
• `retryable` – boolean hint for callers / queues.
- Prefer `Result<T, E>` (fp-ts `Either`) for library APIs; throw only in HTTP/handler layer.
- `try { await fn(); } catch (err) { next(wrap(err)); }` – always forward to central middleware.
- Always `await` Promises; add `.catch()` for fire-and-forget jobs to avoid unhandled rejections.
- Enforce `strictNullChecks`; every possibly-undefined value must be handled.
- Lint rule: no bare `throw "string"`; throw instances of `Error` or subclasses only.
Go
- Return `(T, error)` from every function that can fail—never panic in library code.
- Wrap with `%w` (fmt.Errorf) to keep traceable error chains.
- Inspect errors with `errors.Is/As` to branch behavior.
- Define sentinel vars (`var ErrTimeout = errors.New("timeout")`).
- Use contexts religiously; propagate `ctx` first parameter for deadlines & trace IDs.
- Log only at boundary layer (HTTP/gRPC handler); inner funcs bubble errors upward unchanged.
Error Handling & Validation
- Guards first, happy path last:
```ts
export async function createUser(dto: CreateUserDTO): Promise<User> {
assertEmail(dto.email);
if (await userRepo.exists(dto.email)) {
throw new ConflictError('EMAIL_TAKEN');
}
// happy path
return userRepo.insert(dto);
}
```
- Central Express/Fastify middleware pattern:
```ts
app.use((err, _req, res, _next) => {
const meta = enrich(err);
logger.error(meta, err.message);
report(meta); // Sentry/Rollbar
res.status(meta.status).json({ error: meta.code });
});
```
- Standard log schema keys: `timestamp`, `level`, `msg`, `service`, `env`, `traceId`, `spanId`, `error.code`, `error.stack`.
- Always attach `traceId` (from W3C Trace-Context header) to logs & errors.
- Implement exponential back-off + jitter for retries. Abort on non-retryable errors.
Framework-Specific Rules
Node.js (Express / Fastify)
- One global error handler; no per-route `try/catch` duplication.
- Wrap async handlers with helper `wrapAsync(fn)` to forward rejections.
- Health-check endpoints must never throw—return degraded status codes instead.
Go (net/http & gRPC)
- Middleware chain: `recover`, `zap logger`, `otel trace`, `validator`, handler.
- Use `grpc/status` & `codes.*` for rich, typed error feedback to clients.
- Configure `otelgrpc.WithInterceptor` to auto-record spans and errors.
Temporal Workflows
- Model long-running or retry-heavy processes as workflows; let Temporal manage retries & compensation.
- Treat workflow failures as critical and page immediately; activity retries are informational.
Additional Sections
Testing
- Unit test custom error types: ensure `instanceof` & `code` fields.
- Chaos tests: inject timeouts/network cuts; verify circuit breakers & retries.
Observability & Monitoring
- Export metrics: `errors_total{service,code}`, `panic_total`, `retry_attempts`.
- Alert rules: page on >1% error rate over 5m or any uncaught panic.
Performance
- Avoid try/catch in tight loops; validate beforehand.
- Batch logs to reduce IO; use asynchronous writers with back-pressure.
Security
- Never log PII. Redact with utility `redact(obj, allowedFields)` before logging.
- Sanitize external error messages; expose internal traces only behind auth.
Common Pitfalls & Guards
- ❌ Swallowing errors (`catch {}`) → Always log or propagate.
- ❌ Returning generic 500 for business rule error → Map to 4xx with domain code.
- ❌ Creating new DB connection in error handler → use existing pool; error paths must be lightweight.
File & Directory Naming
- `errors/` – custom classes & guards
- `middleware/` – `error-handler.ts`, `request-context.ts`
- `observability/` – `logger.ts`, `tracing.ts`, `metrics.ts`
Quick Reference Codes
- 400 – `VALIDATION_FAILED`
- 401 – `UNAUTHENTICATED`
- 403 – `FORBIDDEN`
- 404 – `RESOURCE_NOT_FOUND`
- 409 – `CONFLICT`
- 500 – `INTERNAL_ERROR`
Follow these rules to build highly observable, resilient, and maintainable backend services with first-class error handling and reporting.