Actionable coding standards for designing, implementing, and observing resilient error-handling flows in modern Python services.
You're tired of debugging production failures that could have been caught earlier. Your APIs return cryptic 500 errors. Network timeouts bring down entire request flows. Exception handling feels like an afterthought—until something breaks.
Every production Python service faces the same brutal reality: external dependencies fail, user input is malformed, and network calls timeout. Without systematic error handling, you're playing whack-a-mole with production incidents.
The real problem isn't just crashes—it's the debugging nightmare that follows. Stack traces get swallowed, context disappears, and you're left guessing what actually went wrong. Meanwhile, your users see generic error messages that provide zero value.
These Cursor Rules establish a comprehensive error-handling architecture that makes failures predictable, debuggable, and recoverable. You'll catch issues at the right boundaries, preserve debugging context, and provide meaningful responses to both users and operators.
Instead of generic exception handling, you get:
Exception chaining with raise ... from ... preserves the complete failure context. Instead of hunting through logs, you see the exact failure chain from business logic down to the underlying network timeout.
Structured error responses mean frontend teams know exactly what to expect. No more guessing whether a 500 error is retryable or represents a permanent failure.
Circuit breakers and retry logic handle transient failures automatically. Network hiccups don't cascade into service outages.
Explicit exception types make unit tests deterministic. You can assert exact error conditions instead of catching generic exceptions.
Before: Silent failures and resource leaks
def get_user(user_id: int):
try:
conn = get_db_connection()
user = conn.execute("SELECT * FROM users WHERE id = ?", user_id)
return user
except:
return None # What went wrong? Connection? Query? Missing user?
After: Explicit failure modes with guaranteed cleanup
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=0.5))
async def get_user(user_id: int) -> User:
try:
async with get_db_session() as session:
result = await session.execute(
select(User).where(User.id == user_id)
)
user = result.scalar_one()
return user
except NoResultFound as exc:
raise UserNotFoundError(f"User {user_id} not found") from exc
except SQLAlchemyError as exc:
raise DatabaseError("Failed to fetch user") from exc
Before: Cryptic error responses
@app.post("/payments")
def create_payment(request):
try:
data = json.loads(request.body)
payment = process_payment(data)
return {"payment_id": payment.id}
except Exception as e:
return {"error": "Something went wrong"}, 500
After: Typed errors with actionable responses
@app.post("/payments")
async def create_payment(payment_request: PaymentRequest) -> PaymentResponse:
try:
payment = await process_payment(payment_request)
return PaymentResponse(payment_id=payment.id)
except PaymentValidationError as exc:
raise HTTPException(
status_code=422,
detail={
"message": str(exc),
"meta": {"field": exc.field_name},
"status_code": 422
}
)
except PaymentGatewayTimeoutError as exc:
logger.exception("Payment gateway timeout", extra={
"payment_amount": payment_request.amount,
"trace_id": get_trace_id()
})
raise HTTPException(status_code=503, detail="Payment service temporarily unavailable")
Before: Timeouts block entire request flows
def fetch_user_profile(user_id: str):
response = requests.get(f"https://api.service.com/users/{user_id}")
return response.json()
After: Timeouts, retries, and circuit breaking
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=0.5),
retry=retry_if_exception_type(TransientExternalError)
)
async def fetch_user_profile(user_id: str) -> UserProfile:
try:
async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=5)) as session:
async with session.get(f"https://api.service.com/users/{user_id}") as response:
if response.status == 404:
raise UserNotFoundError(f"User {user_id} not found")
response.raise_for_status()
data = await response.json()
return UserProfile.parse_obj(data)
except asyncio.TimeoutError as exc:
raise TransientExternalError("User service timeout") from exc
except aiohttp.ClientError as exc:
raise TransientExternalError("User service unavailable") from exc
pip install tenacity pybreaker opentelemetry-api pydantic
# exceptions/domain.py
class DomainError(Exception):
"""Base exception for all domain-specific errors"""
def __init__(self, message: str, meta: dict = None):
self.message = message
self.meta = meta or {}
super().__init__(message)
class ValidationError(DomainError):
"""Client input validation failed"""
pass
class TransientExternalError(DomainError):
"""Temporary external service failure - retryable"""
pass
class InternalError(DomainError):
"""Internal system error - not retryable"""
pass
# middleware/error_handler.py
from fastapi import FastAPI, HTTPException
from fastapi.responses import JSONResponse
app = FastAPI()
@app.exception_handler(ValidationError)
async def handle_validation_error(request, exc: ValidationError):
return JSONResponse(
status_code=422,
content={
"message": exc.message,
"meta": exc.meta,
"status_code": 422
}
)
@app.exception_handler(TransientExternalError)
async def handle_transient_error(request, exc: TransientExternalError):
return JSONResponse(
status_code=503,
content={
"message": "Service temporarily unavailable",
"meta": {"retry_after": "30s"},
"status_code": 503
}
)
from opentelemetry import trace
def trace_exceptions(func):
async def wrapper(*args, **kwargs):
span = trace.get_current_span()
try:
return await func(*args, **kwargs)
except Exception as exc:
span.record_exception(exc)
span.set_status(trace.Status(trace.StatusCode.ERROR))
raise
return wrapper
import pytest
def test_payment_validation_error():
with pytest.raises(PaymentValidationError, match="Invalid amount"):
create_payment(PaymentRequest(amount=-100))
def test_payment_gateway_timeout():
with pytest.raises(PaymentGatewayTimeoutError):
with mock.patch('aiohttp.ClientSession.post', side_effect=asyncio.TimeoutError):
create_payment(PaymentRequest(amount=100))
Immediate improvements:
Long-term benefits:
Team productivity gains:
Your error handling transforms from a debugging nightmare into a systematic advantage. Instead of chasing production fires, you're building resilient systems that fail gracefully and recover automatically.
You are an expert in Python 3.12, FastAPI, Pydantic, SQLAlchemy, asyncio, Tenacity (retry), PyBreaker (circuit-breaker), and OpenTelemetry tracing.
Key Principles
- Fail fast and loudly: raise early, never silently swallow errors.
- Catch only what you can meaningfully handle; re-throw or propagate everything else.
- Keep try-blocks minimal; the line that may raise should be inside, nothing more.
- Preserve context with "raise … from …" when adding information.
- Prefer built-in exceptions; create custom ones only for domain semantics.
- Treat error handling as a first-class concern: design, test, log, and trace it.
- All externally visible errors must be deterministic, typed, and documented.
Python
- Structure
• try/except/else/finally is the canonical pattern; use all four when needed.
• Group related exceptions in one except (IOError, OSError) using parentheses.
• Use explicit exception names, never a bare `except:`. Use `except Exception as exc:` only at the process boundary (CLI entrypoint, ASGI middleware, Celery worker).
• Use context managers (`with`) for files, locks, DB sessions; implement `__exit__` to translate/propagate errors.
- Custom Exceptions
• Always inherit from `Exception`, never `BaseException`.
• Naming: <Domain><Error> (e.g., `PaymentGatewayTimeoutError`).
• Provide `__str__` with actionable message and relevant IDs.
- Chaining / Wrapping
```python
try:
payload = json.loads(raw)
except json.JSONDecodeError as exc:
raise InvalidRequestError("Malformed JSON") from exc
```
- Retry Logic (Tenacity)
```python
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=0.5))
async def fetch_with_retry(url: str) -> str:
...
```
- Timeouts
• Always wrap network / I/O in `asyncio.wait_for` or requests-timeout.
Error Handling & Validation
- Error Taxonomy
• ValidationError → 4xx (client fault)
• TransientExternalError (network) → retry then 503
• InternalError → 500
- Standard error payload for APIs
```json
{
"message": "string",
"meta": {"field": "email"},
"status_code": 400
}
```
- Logging
• Log at catch-point, never at raise-point, to avoid duplicates.
• Use `logger.exception()` to capture stack-trace.
• Add request/trace id in structured log (JSON) via `extra={`trace_id`: trace.get_current_span_id()}`.
- Finally/cleanup
• Release locks, close cursors, cancel tasks in finally.
• Always protect finally blocks themselves with try/except to avoid secondary failures.
FastAPI (Framework Specific)
- Global Handler
```python
app = FastAPI()
class DomainError(HTTPException):
pass
@app.exception_handler(DomainError)
async def handle_domain_error(_, exc):
return JSONResponse(status_code=exc.status_code, content=exc.detail)
```
- Middleware order: Tracing → Metrics → ExceptionMiddleware → Router.
- Validation
• Use Pydantic models; never manually parse dicts.
• Raise `HTTPException(status_code=422)` for semantic errors.
- Background tasks
• Wrap tasks with `asyncio.shield` and local try/except; propagate failures through logging and tracing.
Additional Sections
Testing
- Unit tests must assert exception type and message:
```python
with pytest.raises(InvalidRequestError, match="Malformed JSON"):
parse_request(b"{bad json}")
```
- Property tests ensure no unexpected exceptions for valid input range (Hypothesis).
Observability
- Trace every external call with OpenTelemetry; tag spans that end with error using `span.record_exception(exc)` and `span.set_status(Status(StatusCode.ERROR))`.
- Correlate log lines with tracing via trace_id/span_id injection.
Performance
- Prefer exception-free control flow for hot paths; validate early.
- Avoid using exceptions for normal control flow (e.g., `dict.get` over `try/except KeyError` inside loops).
Security
- Never leak internal stack traces to clients.
- Scrub PII from logs (`logger.exception("error", extra={"email": obfuscated_email})`).
Common Pitfalls & Remedies
- Swallowing Errors → REMOVE bare except.
- Over-broad try block → Narrow down; move logic outside.
- Missing context → Use `from` to chain.
- Double Logging → Choose a single boundary for logging.
File & Folder Naming
- exceptions/
• __init__.py (re-exports)
• domain.py (custom domain errors)
- middleware/error_handler.py
- services/* (business logic raising domain errors)