Comprehensive Rules for establishing a culture of continuous improvement through CI/CD, GitOps, Infrastructure-as-Code, automated testing, and observability.
Transform chaotic deployments into predictable, scalable delivery pipelines that deploy safely hundreds of times per day.
You know the drill: merge requests pile up while you're debugging a failed deployment at 2 AM. Your team spends more time firefighting production issues than building features. Every release becomes a white-knuckle event where "works on my machine" meets production reality.
The hidden productivity killers:
These Cursor Rules establish a comprehensive DevOps culture through automated CI/CD pipelines, GitOps workflows, and observability-driven continuous improvement. Instead of cobbling together scripts and crossing your fingers, you get production-ready automation patterns that scale.
What makes this different:
Before: Manual deployments taking 2-3 hours with 20% failure rate
After: Automated deployments in 8 minutes with 99.5% success rate
# Automated deployment validation with retry logic
@retry_with_backoff(max_attempts=3)
def deploy_canary(service_name: str, version: str) -> DeploymentResult:
"""Deploy with automatic rollback on failure"""
result = kubectl_apply(f"manifests/{service_name}-canary.yaml")
if not health_check_passes(service_name, timeout=300):
rollback_deployment(service_name)
raise DeploymentError(f"Health checks failed for {service_name}")
return result
Before: Finding bugs days after merge through manual testing
After: Issues caught within 5 minutes of commit with automated test suites
# GitLab CI pipeline with parallel testing
stages:
- lint
- build
- unit_test
- integration_test
- security
- deploy
unit_test:
stage: unit_test
script:
- pytest -q --cov=src --cov-fail-under=90
parallel:
matrix:
- PYTHON_VERSION: ["3.11", "3.12"]
Before: 3-person ops team bottlenecking 20 developers
After: Self-service deployments with automated guardrails
Old workflow: "I'm waiting for ops to deploy my branch to staging"
New workflow: "My feature is live in canary, reducing checkout abandonment by 12%"
Your morning standup transforms from coordination overhead to impact measurement. When deployments are automated and observable, teams focus on business outcomes instead of process blockers.
# Automated security scanning catches issues before human review
def process_payment(amount: Decimal, card_token: str) -> PaymentResult:
# Type hints + mypy --strict catches type errors
# bandit catches security issues
# Performance tests validate under load
if amount <= 0:
raise ValueError("Amount must be positive") # Fail fast validation
return payment_gateway.charge(amount, card_token)
Impact: Code reviews become strategic discussions about architecture and business logic, not hunting for basic security issues or type errors.
When your error budget is exhausted (>1% 5xx errors for 5 minutes), automatic rollback kicks in before anyone gets paged:
# Prometheus alert triggers automated rollback
def handle_error_budget_breach(service: str, deployment_id: str):
rollback_deployment(service, deployment_id)
notify_team(f"Auto-rollback executed for {service}")
create_incident_timeline(service, deployment_id)
Add the cursor rules to your .cursorrules file
Create pipeline structure:
.gitlab-ci.yml # or .github/workflows/ci.yml
scripts/
├── deploy.py # Click CLI deployment tool
├── health_check.py # Service validation
└── rollback.py # Automated rollback
Initialize configuration:
poetry init --dependency click
poetry add --group dev pytest mypy bandit safety
Configure your CI/CD platform with the provided templates:
# GitLab CI example with caching and parallel execution
variables:
PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"
cache:
paths:
- .cache/pip
- .pytest_cache
security:
stage: security
script:
- bandit -r src
- safety check
- truffleHog --regex --entropy=False .
Add monitoring that feeds back into your improvement cycle:
# Prometheus metrics in your application
from prometheus_client import Counter, Histogram
deployment_counter = Counter('deployments_total', 'Total deployments', ['service', 'result'])
deployment_duration = Histogram('deployment_duration_seconds', 'Deployment duration')
@deployment_duration.time()
def deploy_service(service_name: str):
try:
result = execute_deployment(service_name)
deployment_counter.labels(service=service_name, result='success').inc()
return result
except Exception as e:
deployment_counter.labels(service=service_name, result='failure').inc()
raise
Your deployment anxiety transforms into deployment confidence. Your team shifts from reactive firefighting to proactive feature development. Your infrastructure becomes as reliable and predictable as your version control system.
The question isn't whether you need automated DevOps pipelines—it's whether you can afford to keep deploying manually while your competitors ship faster, safer, and more frequently.
You are an expert in Python, Bash, YAML, Docker, Kubernetes, Terraform, Ansible, GitLab CI/CD, Jenkins, GitHub Actions, Azure DevOps, Spinnaker, Prometheus and Grafana.
Key Principles
- Foster a culture of collaboration and trust across development, operations, security and business stakeholders.
- Everything is code: application, infrastructure, pipelines, monitoring dashboards.
- Maintain trunk-based development with short-lived feature branches guarded by feature toggles.
- Keep the main branch deployable at all times; every merge triggers the full CI/CD pipeline.
- Automate *all* repetitive tasks: build, test, security scan, package, release, rollback and infrastructure provisioning.
- Shift-left on quality & security: integrate tests, SAST, DAST and secret scanning early.
- Prefer small, incremental, reversible changes; use Blue-Green or Canary strategies for production rollouts.
- Observe → Learn → Improve: instrument code & infrastructure, collect metrics, analyse feedback and feed lessons back into the pipeline.
Python
- Use Python 3.12+ with strict type hints; run `mypy --strict` in the test stage.
- Scripts must be idempotent. External calls (e.g. kubectl, terraform) must be wrapped in retry-with-back-off helpers.
- Package CLI tools with `click` and expose a single entry point function.
- Always return non-zero exit codes on failure so CI detects errors immediately.
- Never `print()` operational data; use the standard `logging` module with JSON formatters for structured logs.
- Separate pure functions (in `/src`) from side-effect wrappers (in `/ops`).
- Naming:
• Functions: snake_case verbs, e.g. `deploy_canary()`
• Constants / env vars: UPPER_SNAKE_CASE.
- Use `pyproject.toml` + Poetry; lock dependencies; scan with `safety` in the security stage.
YAML (Pipeline & IaC configuration)
- Use 2-space indentation; no tabs.
- Externalise common fragments via YAML anchors/aliases (`&build`, `<<: *build`) to stay DRY.
- Variable names: SCREAMING_SNAKE_CASE; prefix secrets with `SECRET_` to improve audit filters.
- Keep one pipeline definition per repo (`.gitlab-ci.yml`, `.github/workflows/ci.yml`).
- Validate YAML with `yamllint -d relaxed` before committing (pre-commit hook).
Error Handling & Validation
- Fail fast: validate inputs at the top of every script/function; raise custom `UserError` or `SystemError` subclasses.
- Mask sensitive variables in logs (`CI_DEBUG_TRACE=false`, `add_mask` in GitHub Actions).
- Centralised error collector: send Python exceptions to Sentry and pipeline failures to Slack/Teams via webhooks.
- Add `allow_failure: false` (GitLab) / `continue-on-error: false` (GitHub Actions) to critical jobs.
- Use typed results (`Result[T, E]`) for non-fatal validations so pipelines can surface warnings without breaking flow.
Framework-Specific Rules
GitLab CI/CD
- Stages order: `lint` → `build` → `unit_test` → `integration_test` → `security` → `package` → `deploy` → `post_deploy`.
- Use `only: [merge_requests, main]` and protect `main` from direct pushes.
- Cache `~/.cache/pip` and `.pytest_cache` keyed by `CI_COMMIT_REF_SLUG` to speed up builds.
- Use `environment:` plus `on_stop:` jobs to support manual rollback.
GitHub Actions
- One workflow file per responsibility (`ci.yml`, `deploy.yml`).
- Pin action versions (`actions/checkout@v4`) to avoid breaking changes.
- Use `strategy.matrix` for parallel test shards; rely on `actions/cache` for pip & terraform.
Jenkins
- Declarative pipelines (`pipeline {}`) stored in `Jenkinsfile` at repo root.
- Enforce shared library (`@Library('ci-shared')`) for common stages; keep credentials in Jenkins Credentials Manager only.
Azure DevOps
- YAML pipelines stored in repo; use variable groups for shared secrets.
- Enforce SNI for HTTPS endpoints.
Spinnaker
- Store deployment pipelines in Git (GitOps); trigger on new container tags pushed by CI.
- Use kayenta for automated canary analysis; threshold `95%` confidence before promoting.
Additional Sections
Testing
- Unit tests: ≥ 90 % coverage; run with `pytest -q --cov=src`.
- Integration tests spin up disposable environments via Docker Compose; tear down with `--abort-on-container-exit`.
- End-to-End tests executed against canary environment only.
- Add contract tests between microservices to detect breaking changes early.
Security
- SAST: run `bandit -r src` in the `security` stage.
- DAST: OWASP ZAP container against the review environment.
- Secret scanning: built-in `truffleHog` pre-commit and CI job.
- Enable branch protection + required status checks.
Performance
- Use build cache & layer caching for Docker (`--cache-from` multi-stage builds).
- Parallelise long-running tests with `pytest-xdist -n auto`.
- Use Bazel for monorepos: specify `--remote_cache` for distributed caching.
Infrastructure as Code (Terraform/Kubernetes)
- Terraform modules pinned to version; validate with `terraform validate` and security scan with `tfsec`.
- Kubernetes manifests templated with Helm; lint with `helm lint`; dry-run installs to CI namespace.
- GitOps: Argo CD or Flux sync every 2 min; drift detection fails the pipeline.
Monitoring & Observability
- Export app metrics via Prometheus client; include build info (`build_info{version="v1.2.3"}`).
- Dashboards as code (Grafana JSON) stored in `/monitoring/grafana/` folder.
- Alert rules: page on 3 consecutive deployment failures or 5 × error-rate spike.
Release Strategies
- Blue-Green: run smoke tests on *green* before switching traffic.
- Canary: route 5 % → 25 % → 100 % traffic with 10-minute bake periods.
- Automatic rollback on error budget exhaustion (> 1 % 5xx for 5 min).
Branching & Versioning
- Main branch = production; all commits must pass CI.
- Feature flags via `configcat` SDK; stale flags purged every quarter.
- Semantic versioning; tag releases (`vX.Y.Z`) automatically in release stage.
Documentation
- Keep `docs/ci-cd/` with architecture diagrams (draw.io) and run `mkdocs gh-deploy` on merge.
- Update run-books after every incident before closing the post-mortem.
Checklist Before Merging to Main
1. All pipeline jobs green (incl. security & performance).
2. Coverage ≥ 90 %.
3. Changelog entry added.
4. Terraform plan reviewed & approved.
5. Feature flag defaults validated.
6. Documentation updated and link added to MR.