Comprehensive Rules for producing, managing, and validating structured logs in distributed, cloud-native systems.
Your application throws an exception at 3 AM. You wake up to alerts, SSH into production, and start the familiar dance: greping through gigabytes of unstructured logs, reconstructing request flows across microservices, and trying to piece together what actually happened. Twenty minutes later, you're still hunting for context that should have been obvious.
This scenario plays out thousands of times daily across development teams. Unstructured logging isn't just inefficient—it's actively sabotaging your ability to understand, debug, and optimize your systems.
Most logging implementations suffer from critical flaws that compound in distributed systems:
These Cursor Rules implement battle-tested logging patterns used by companies processing billions of requests daily. Every log becomes a structured, searchable event with complete request context—no more grep archaeology.
Automatic Request Correlation: Every request gets a UUID that flows through your entire stack
{
"timestamp": "2024-01-15T10:30:15.123Z",
"level": "INFO",
"correlation_id": "550e8400-e29b-41d4-a716-446655440000",
"service": "payment-service",
"event": "payment_processed",
"order_id": "ord_12345",
"amount": 99.99,
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736"
}
Built-in Security: Automatic PII redaction and secret masking with compliance-ready controls
Zero-Latency Logging: Asynchronous appenders that never block your request threads
Cost-Optimized Storage: Smart retention policies and compression that cut storage costs by 60%
Before: "Let me check 6 different log files and correlate timestamps..."
After: Single query finds complete request flow with full context
Query by correlation ID reveals the exact failure point, request parameters, and downstream effects instantly
Built-in PII redaction, encryption, and audit trails handle GDPR/SOC2 requirements automatically
Async logging eliminates request thread blocking while providing richer data
Before (Traditional Logging):
# SSH into production
grep -r "payment failed" /var/log/app/*.log | head -20
grep -r "order_12345" /var/log/payment/*.log
# Check 3 more services, correlate timestamps
# 15 minutes later, still reconstructing the flow
After (Structured Rules):
# Single query in Kibana/Grafana
correlation_id:"550e8400-e29b-41d4-a716-446655440000"
# Instant complete request timeline across all services
Before: Manual log scraping, hoping sensitive data wasn't logged After: Automated compliance reports with guaranteed PII redaction
Before: Parsing timestamps from text logs, manual correlation After: OpenTelemetry integration provides request timing with log context automatically
Java (Spring Boot):
// Automatic MDC correlation ID injection
MDC.put("correlation_id", UUID.randomUUID().toString());
log.info("Payment processed",
kv("order_id", orderId),
kv("amount", amount));
Node.js (Express):
// Child logger inherits correlation context
const logger = pino().child({
correlation_id: req.correlationId
});
logger.info({ orderId, amount }, 'payment processed');
Python (FastAPI):
# Structured context binding
logger.bind(
correlation_id=correlation_id,
user_id=user_id
).info("payment_processed", order_id=order_id)
No more blocking I/O killing your request performance:
<!-- Logback async configuration -->
<appender name="ASYNC" class="ch.qos.logback.classic.AsyncAppender">
<discardingThreshold>5</discardingThreshold>
<includeCallerData>false</includeCallerData>
<appender-ref ref="JSON"/>
</appender>
Automatic PII redaction prevents compliance violations:
// Sensitive fields automatically masked
log.info("User login",
kv("email", redactor.mask(email)),
kv("session_id", sessionId));
Fluent Bit configuration for Kubernetes environments:
# Automatic log forwarding to your observability stack
outputs:
- name: elasticsearch
host: elasticsearch.logging.svc.cluster.local
port: 9200
index: logs-production
Automatic trace ID correlation connects logs with APM tools:
{
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"span_id": "00f067aa0ba902b7",
"correlation_id": "550e8400-e29b-41d4-a716-446655440000"
}
Smart tiering reduces storage costs while maintaining searchability:
For services processing 100k+ RPS:
Time Investment: 2-4 hours initial setup
Payback Period: First production incident (usually < 1 week)
These rules transform logging from a debugging afterthought into a strategic advantage. Your future self will thank you when that 3 AM incident becomes a 3-minute diagnosis instead of a 30-minute investigation.
You are an expert in Production-grade Logging & Observability for Java, Node.js, Python, Go, and .NET systems. Familiar with SLF4J, Logback, Log4j2, Pino, Winston, Structlog, Serilog, zerolog, OpenTelemetry, Fluent Bit, and ELK/Grafana stacks.
Key Principles
- Emit structured, machine-readable logs (JSON) only. Never rely on grep-friendly plain text.
- Treat logs as immutable events: never edit in place; enrich by adding fields.
- Every request or job gets a globally unique correlation_id (UUID v4) propagated end-to-end.
- Prefer INFO for business events, DEBUG for diagnostics, WARN for recoverable issues, ERROR for failures, FATAL for unrecoverable crashes.
- Log at the call-site, not the catch-site: include enough context to act without code dive.
- Never log secrets, tokens, passwords, or PII. Apply automatic redaction and allow-list what IS logged.
- Use asynchronous, non-blocking appenders/sinks to avoid request latency.
- Retain just enough: set explicit rotation & retention based on compliance and cost.
- Integrate logs with metrics & traces using OpenTelemetry semantic conventions.
Language-Specific Rules
Java
- Use the SLF4J façade with Logback or Log4j2 implementation.
- Inject correlation_id via MDC (Mapped Diagnostic Context):
```java
MDC.put("correlation_id", correlationId);
log.info("User login", kv("user_id", userId));
```
- Never concatenate strings in log messages; use parameterized style: `log.debug("Processed {} orders", count);`
- Configure async appenders with bounded queues to avoid OOM.
Node.js
- Prefer Pino for throughput; fall back to Winston only when transport flexibility outweighs perf.
- Use child loggers to inherit context:
```js
const logger = pino().child({ correlation_id })
logger.info({ orderId }, 'order placed')
```
- Stream Pino output to pino-transport or Fluent Bit.
Python
- Use structlog with JSONRenderer.
- Chain processors: add correlation_id, timestamp, level, message, stack.
- Example:
```py
logger.bind(correlation_id=cid).info("payment_received", order_id=oid)
```
Go
- Use zerolog for zero-alloc JSON logging.
- Instantiate once per request with context:
```go
log := zerolog.Ctx(ctx).With().Str("correlation_id", cid).Logger()
log.Info().Str("order_id", id).Msg("order completed")
```
.NET (C#)
- Use Serilog with `WriteTo.Async()` sinks.
- Enrich with `LogContext.PushProperty("correlation_id", cid);`
- Configure JSON output via `Serilog.Formatting.Json.JsonFormatter`.
Error Handling and Validation
- Validate inputs first; throw early, log once at boundary layer (controller, handler).
- Include stack traces on ERROR level but trim to 10 deepest frames to reduce noise.
- Mask sensitive fields with a redaction utility before logging: `***REDACTED***`.
- Unit-test redaction: assert that password/token substrings are absent in captured logs.
Framework-Specific Rules
OpenTelemetry
- Use the OTLP semantic attributes (e.g., `log.severity`, `log.scope.name`).
- Export logs, traces, and metrics through the same OTLP exporter to your backend.
- Propagate W3C Trace-Context headers; map `trace_id` and `span_id` into every log record.
SLF4J / Logback
- Configure `AsyncAppender` with `discardingThreshold=5` and `includeCallerData=false`.
- Use separate loggers per bounded context (e.g., `com.acme.payments.*`).
Pino
- Enable `timestamp: 'unix'` for low overhead.
- If logs exceed 1 MB/s, pipe through `pino-lambda` for AWS Lambda.
Structlog
- Use `structlog.processors.ExceptionRenderer()` only at ERROR level.
- Integrate with AWS Lambda via `aws_lambda_powertools.logging`.
Serilog
- Add sinks: `Seq`, `ElasticSearch`, `Console` (dev), `File` (when local fallback needed).
- Enable `AuditTo` sink for security-critical events.
Additional Sections
Testing
- Use log capture fixtures (e.g., `caplog` in pytest, `ListAppender` in Logback) to assert:
• Presence of correlation_id
• Correct level selection
• Redaction of secrets
- Add smoke tests verifying that log volume < expected threshold under load.
Performance
- All appenders/transports must be asynchronous and back-pressure-aware.
- Use batching: flush every 5 k records or 1 s, whichever comes first.
- In Kubernetes, sidecar Fluent Bit buffer memory ≤ 15 % container RAM.
Security & Compliance
- Encrypt logs in flight (TLS 1.2+) and at rest (KMS/AES-256).
- Apply RBAC: developers can read, operators can purge; only compliance can export.
- Automatic PII redaction list must cover: email, phone, SSN, credit card, geo.
Storage & Retention
- Hot storage (7 days) in Elasticsearch/OpenSearch, warm (30 days) in S3/Glacier via ILM.
- Apply index templates: `logs-<env>-yyyy.MM.dd` and set `number_of_shards` based on RPS.
- Compress logs with ZSTD or gzip level 3 for balance of CPU/cost.
Alerting & Monitoring
- Define log-based alerts: ERROR rate > 1% per 5 min, string match `"outOfMemory"`, pattern `status:5xx`.
- Route alerts through PagerDuty or Opsgenie with dedup by correlation_id.
- Dashboard key indicators in Grafana: log volume, ERROR/WARN ratio, top slow queries.
Example Directory Structure
```
logging/
java/ # Logback.xml, LogEnricher.java
node/ # logger.ts, transports/
python/ # logging.py, processors.py
infra/ # fluent-bit/, elasticsearch/, grafana/
```
Common Pitfalls
- Double-logging the same exception in nested catch blocks.
- Blocking I/O in synchronous appenders causing tail latency spikes.
- Forgetting to propagate correlation_id in background jobs.
- Over-logging at DEBUG in production leading to 2× storage cost.
Adopt these rules consistently to achieve high-fidelity, secure, and cost-effective logging across your services.