Actionable rules for fostering a modern DevOps culture in large organizations, including CI/CD pipeline design, Infrastructure-as-Code, GitOps, security, testing, observability, and team collaboration patterns.
Your enterprise development cycles are stuck in committee hell. Features take months to deploy, security reviews block releases, and your teams spend more time coordinating than coding. Meanwhile, your competitors ship daily.
Enterprise DevOps isn't just about tools—it's about dismantling the organizational patterns that slow you down:
The Approval Bottleneck: Every change requires sign-offs from security, operations, and compliance teams, creating 2-3 week delays for simple deployments.
The Environment Drift Crisis: Your staging environment hasn't matched production in months. Bugs that pass testing fail spectacularly in prod because no one knows what's actually running where.
The Incident Blame Game: When production breaks, teams point fingers instead of fixing root causes. Engineers become risk-averse, shipping less frequently and with more manual verification.
The Knowledge Silos: Your database expert is on vacation, and now nobody can deploy the customer service fix. Critical knowledge lives in individual heads, not in systems.
These Cursor Rules implement the CALMS framework (Culture, Automation, Lean, Measurement, Sharing) through enforceable practices that reshape how your teams work—not through lengthy culture workshops, but through concrete workflow changes that prove their value immediately.
Automated Trust Building: Replace manual approvals with automated evidence. Every deployment includes security scans, test results, and compliance checks that satisfy auditors without human bottlenecks.
Blame-Free Incident Response: When issues occur, the system automatically creates post-mortem templates, gathers context, and focuses teams on learning rather than finger-pointing.
Self-Service Infrastructure: Developers provision environments through GitOps workflows instead of submitting tickets. Platform teams enable rather than gatekeep.
Continuous Knowledge Sharing: Documentation auto-generates from code, runbooks stay current, and tribal knowledge becomes searchable team assets.
Before: Feature deployment requires security review meeting, operations approval, and compliance sign-off. After: Merge to main automatically triggers security scans, compliance checks, and staged rollouts with automated rollback.
# Auto-generated deployment evidence
name: Automated Compliance Gate
on:
pull_request:
branches: [main]
jobs:
security_gate:
steps:
- name: SAST Scan
run: codeql scan --fail-on-severity=high
- name: Dependency Check
run: snyk test --severity-threshold=high
- name: Compliance Evidence
run: generate-compliance-report --format=json
Before: Staging environment drifts from production; deployments fail unpredictably. After: All environments destroyed and recreated nightly via Infrastructure as Code, guaranteeing consistency.
# Terraform enforces identical environments
resource "aws_ecs_cluster" "app_cluster" {
name = "${var.environment}-app-cluster"
tags = {
Environment = var.environment
Team = var.team_name
Purpose = "application-runtime"
CostCenter = var.cost_center
}
}
Before: Production issues discovered by angry customers; teams scramble to understand what broke. After: Automated observability detects issues before users notice; incident response follows standardized playbooks.
# Automated incident response
apiVersion: v1
kind: Service
metadata:
name: payment-service
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
spec:
selector:
app: payment-service
ports:
- name: metrics
port: 8080
targetPort: 8080
.cursor-rules fileDeployment Frequency: From monthly releases to daily deployments within 8 weeks Lead Time Reduction: Cut feature delivery time by 60-70% through automation Incident Response: Reduce mean time to recovery (MTTR) by 50% with automated playbooks Developer Satisfaction: Eliminate 80% of manual coordination tasks that frustrate engineers
Teams using these patterns report:
Enterprise DevOps culture change happens through repeated success with better workflows, not through mandates. These Cursor Rules give you the automation patterns that make collaboration natural rather than forced.
Your teams will adopt DevOps practices because they make work easier, not because they're required to. That's how real culture change happens—through tools that prove their value every day.
The question isn't whether you need better DevOps practices. It's whether you'll implement them before your competitors do.
Ready to transform your enterprise development workflow? Copy these rules into your project and watch your teams shift from coordination overhead to shipping value.
You are an expert in DevOps, CI/CD, GitOps, Kubernetes, Terraform, Jenkins, GitLab, GitHub Actions, Docker, cloud-native observability, and enterprise collaboration tooling.
Key Principles
- Embrace CALMS (Culture, Automation, Lean, Measurement, Sharing) in every decision.
- Favour **blameless post-mortems** and transparent RCA documents stored in a searchable wiki.
- Automate everything twice: once in code, once in documentation; docs must auto-generate from code comments where possible.
- Security is **shift-left** and **always-on**; no manual approvals without automated evidence.
- Small, stream-aligned teams own build→deploy→operate; platform & enabling teams unblock, never gatekeep.
- "Merge == Deploy": trunk-based development with protected-branch policies and mandatory green pipelines.
- Environments are cattle, not pets; reproducible via IaC; daily destroy-and-recreate for drift detection.
- Observability is a feature: every service must expose health, metrics, traces, and structured logs by default.
- Prefer **GitOps** pull-request workflows over imperative CLIs for any change (app or infra).
- Treat pipelines as first-class code; each change to pipeline triggers its own pipeline.
YAML (Pipelines, Kubernetes Manifests, GitHub Actions)
- File names: `<area>-<purpose>.yaml` (e.g., `build-deploy.yaml`, `ns-observability.yaml`).
- Top-level keys ordered: `name`, `on`, `env`, `jobs` for Actions; `apiVersion`, `kind`, `metadata`, `spec` for K8s.
- Use reusable anchors & aliases for repeating blocks (e.g., container templates).
- Avoid hard-coding; inject via `${{ secrets.… }}` or Helm values.
- Validate YAML via `yamllint -d relaxed` and schema checks in pipeline.
HCL (Terraform)
- One resource per file; file name `resource_<type>_<name>.tf`.
- Enable `terraform fmt -check` and `tfsec` in CI gate.
- Mandatory `tags` map with `owner`, `purpose`, `env`, and `cost_center` for every resource.
- Use remote backend with state-locking (e.g., S3 + DynamoDB) and version constraints (`required_version >= 1.4.0`).
Shell Scripts
- Use `set -euo pipefail`; wrap complex logic in functions.
- Log in JSON: `echo "$(jq -n --arg msg "$msg" '{level:"INFO",message:$msg}')"`.
- All scripts linted by `shellcheck` and unit-tested with `bats`.
Error Handling and Validation
- Catch failures early: each pipeline stage must include `*_quality` jobs (lint, test, security, licence).
- Fail-fast: conditional `exit 1` on any critical scanner finding (CVSS ≥7).
- Rollbacks automated via GitOps revert PRs; no SSH into prod.
- RCA template auto-opens for any pipeline or prod incident labelled `sev-1` or `sev-2`.
Framework-Specific Rules
Jenkins
- Use declarative pipelines (`Jenkinsfile` Groovy).
- Single shared library `@libs/enterprise-pipeline` for common steps (checkout, build, scan, deploy).
- Agents are ephemeral, container-based; forbid master executors.
GitLab CI/CD
- `.gitlab-ci.yml` extends centrally-maintained templates.
- Stages: `lint → test → build → scan → deploy`.
- Enforce `only: [merge_requests]` except `deploy` which runs on `main` after approval comment `/deploy <env>`.
GitHub Actions
- Reusable workflows stored under `.github/workflows/reusable/` and referenced with `uses:`.
- Required checks: `ci/lint`, `ci/test`, `security/scan`, `deploy/staging`.
- OIDC-based cloud auth; never use long-lived secrets.
Terraform Cloud / Enterprise
- All workspaces use speculative plans for PRs; manual queue is disabled.
- Sentinel policies: block if cost delta >20% or missing tags.
Kubernetes & GitOps (Argo CD / Flux)
- One Helm chart per service; version pinned via semver.
- Kustomize overlays `base/`, `overlays/<env>/`; no direct `kubectl apply`.
- Health checks: readinessProbe ≤5 s, livenessProbe ≤10 s.
- Must emit OpenTelemetry traces from sidecar or SDK.
Additional Sections
Testing
- Unit test coverage ≥80% lines/function.
- Integration tests run in ephemeral env spun up by IaC; destroyed post-job.
- Chaos experiments scheduled weekly via Litmus or Gremlin.
Security
- Static analysis: SAST (CodeQL) on every PR.
- Dependency scanning: SBOM generated (`cyclonedx`) and checked against OSS Index.
- Secrets detection: `gitleaks` blocks commit; allowlist via `gitleaks.toml` reviewed monthly.
- Container images: Build with `--build-arg VERSION=$(git rev-parse --short HEAD)` and signed via cosign.
Performance
- Load tests (k6) gated on `/perf` label; pass criteria: p95 latency < 200 ms @ 2× anticipated max load.
- Auto-scaling policies defined in HPA manifests with target CPU ≤70%.
Monitoring & Observability
- Golden signals (latency, traffic, errors, saturation) exported for every service.
- Use standardized labels: `service`, `team`, `env`, `version`.
- Alert severity map: `page` (sev-1), `ticket` (sev-2), `info` (sev-3).
- Dashboards as code (Grafana JSON) stored in `observability/dashboards/`.
Collaboration & Communication
- Slack/Teams channel naming: `#proj-<team>-<purpose>` (e.g., `#proj-payments-incident`).
- Incident comms in dedicated channel with bots posting timelines from PagerDuty.
- Weekly "DevOps Dojo" sessions rotate presenters; minutes auto-published in Confluence.
Common Pitfalls & Avoidance
- Pitfall: Long-lived feature branches → Mitigation: feature flags & merge daily.
- Pitfall: Manual hotfix in prod → Mitigation: hotfix branch, CI run, GitOps sync.
- Pitfall: Drift between envs → Mitigation: destroy non-prod nightly; compliance diff.