Comprehensive Rules for consistent, auditable, and automated ML model versioning across the full MLOps lifecycle.
When your ML models hit production, versioning chaos kills productivity. You're manually tracking experiments, losing model lineage, and spending hours debugging "which version is actually running?" Instead of shipping features, you're archaeology-digging through Git commits and MLflow runs trying to reconstruct what broke.
Your current workflow probably looks like this nightmare:
Sound familiar? Every minute spent on versioning archaeology is a minute not spent on model improvement.
These Cursor Rules transform your chaotic model lifecycle into a bulletproof, automated versioning system. They implement enterprise-grade MLOps patterns that Fortune 500 companies use to ship hundreds of models safely.
What you get: Semantic versioning for models, automated CI/CD integration, complete lineage tracking, one-command rollbacks, and audit-ready provenance—all automated through your existing tools.
Before: 2-3 hours per week manually tracking model versions across MLflow, Git, and deployment systems
After: Zero manual versioning—everything automated through CI/CD
Before: 30-45 minutes to identify and redeploy previous stable model
After: Single command rollback to any previous version (kubectl patch kserve/model --patch '{"spec":{"predictor":{"model":{"storageUri":"gs://models/classifier/v1.2.1"}}}}')
Before: "Which data trained this model?" requires detective work across multiple systems
After: Every prediction traces back to exact code commit, data version, and training run
Before: Manual documentation for model audits
After: Automated provenance manifests with cryptographic signatures
Before: The Manual Nightmare
# Developer needs to remember 15+ manual steps
git tag model/classifier/v1.3.0 # Often forgotten
mlflow models transition "Classifier" "Staging" # Manual UI clicking
docker build -t classifier:1.3.0 . # Hope the tag matches
kubectl apply -f manifests/ # Pray configs are updated
# Result: 45 minutes of error-prone manual work
After: Automated Perfection
# Developer just increments version in code
# File: model/__init__.py
__version__ = "1.3.0" # Single source of truth
# CI/CD handles everything else automatically:
# - Semantic version validation
# - MLflow registration with full lineage
# - Docker image build with signed artifacts
# - Kubernetes deployment with rollback safety
# Result: Git push triggers entire release pipeline
Before: Production Fire Drill
# Model accuracy drops, team panics
# 30 minutes of frantic debugging to find last good version
# Manual MLflow UI navigation
# Hope the Docker image still exists
# Cross fingers on deployment
After: One-Command Recovery
# Automated monitoring detects accuracy drop
# CI/CD automatically triggers rollback
make rollback-to v1.2.8
# Complete rollback in under 2 minutes with full audit trail
Before: Configuration Hell
# Configs scattered across notebooks and local files
# No way to reproduce training
# Manual parameter tracking in spreadsheets
# "It worked on my machine" syndrome
After: Reproducible Pipeline
# params.yaml - versioned with code
model:
learning_rate: 0.001
batch_size: 32
architecture: "resnet50"
data:
version: "2023-10-15-abc123" # DVC tracked
# Full reproducibility: dvc repro recreates identical model
# Add to your ML project
mkdir -p model/ artifacts/ infra/
echo '__version__ = "0.1.0"' > model/__init__.py
# .github/workflows/model-release.yml
name: Model Release
on:
push:
paths: ['model/__init__.py']
jobs:
release:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Extract Version
run: echo "VERSION=$(python -c 'import model; print(model.__version__)')" >> $GITHUB_ENV
- name: Train & Register
run: |
dvc repro
mlflow models register --name $MODEL_NAME --version $VERSION
- name: Deploy
run: |
docker build -t $REGISTRY/$MODEL_NAME:$VERSION .
kubectl set image deployment/$MODEL_NAME model=$REGISTRY/$MODEL_NAME:$VERSION
# training/validate.py
def validate_model_performance(current_metrics, baseline_version):
"""Block promotion if accuracy drops > 5%"""
baseline_metrics = load_metrics_for_version(baseline_version)
if current_metrics['accuracy'] < baseline_metrics['accuracy'] * 0.95:
raise ModelPerformanceError(
f"Accuracy drop detected: {current_metrics['accuracy']:.3f} < {baseline_metrics['accuracy'] * 0.95:.3f}"
)
# monitoring/model_drift.py
@app.route('/predict')
def predict():
prediction = model.predict(request.json)
# Log prediction with full provenance
logger.info({
'model_version': model.__version__,
'git_sha': os.environ['GIT_SHA'],
'prediction_id': prediction_id,
'latency_ms': response_time
})
return prediction
"We went from 2-hour emergency rollbacks to 90-second automated recovery. Our model deployment confidence went from 60% to 95%." — ML Engineering Lead, Financial Services
"These rules eliminated our 'which model is running?' Slack channels. Everything is traceable and automated." — Senior Data Scientist, E-commerce
# Handles complex model dependencies
def deploy_model_ensemble(models: Dict[str, str]):
"""Deploy multiple model versions with traffic splitting"""
for model_name, version in models.items():
validate_model_compatibility(model_name, version)
deploy_with_traffic_split(model_name, version, traffic_percent=10)
# Traffic splitting configuration
traffic_split:
model_v1_2_8: 80%
model_v1_3_0: 20% # Canary deployment
rollback_threshold:
error_rate: 0.05
latency_p95: 200ms
# Cryptographically signed model artifacts
cosign sign $REGISTRY/model:$VERSION
cosign verify $REGISTRY/model:$VERSION --certificate-identity=$CI_EMAIL
Start implementing these rules today and transform your ML versioning from chaos to competitive advantage. Your models deserve the same engineering rigor as your application code—these rules make it automatic.
You are an expert in Python, Git, Docker, Kubernetes, MLflow, DVC, Kubeflow, CI/CD (GitHub Actions, GitLab CI), and cloud ML registries (AWS SageMaker, Azure ML, GCP Vertex AI).
Key Principles
- Version **every** mutable artefact: model weights, training/evaluation code, configuration, data, environment, and infra manifests.
- Apply Semantic Versioning (MAJOR.MINOR.PATCH) to models; increment PATCH for retrains, MINOR for architecture/tuning changes, MAJOR for breaking I/O-contract changes.
- Automate version bumps and tagging inside CI/CD; never bump versions manually on local machines.
- Immutable artefacts only: once a version tag is published, never mutate or delete it—create a new version instead.
- Enable single-command rollback to any prior production model.
- Maintain full lineage (code ⇄ data ⇄ model ⇄ metrics) to guarantee reproducibility and auditability.
Python
- Store the model version string in a single source of truth (e.g. `__version__` in `model/__init__.py`).
- Pin all dependencies in `requirements.txt` or `poetry.lock`; disallow wildcards (`pandas>=1.5` is forbidden, use exact pin).
- Persist training parameters with `yaml`/`json` config files and commit them to Git; hash configs and add to MLflow run tags.
- Prefer functional style for training/serving scripts; avoid hidden global state.
- Use type hints (`typing`, `pydantic`) to lock model I/O schema; breaking schema ⇒ MAJOR bump.
Error Handling & Validation
- Fail fast: validate presence & checksum of model, data, and config before training or serving.
- Wrap critical steps (data load, model load, prediction) in `try/except` blocks that raise domain-specific exceptions (`ModelNotFoundError`, `DataVersionMismatchError`).
- In CI/CD, abort pipeline if model accuracy/regression metrics drop > defined threshold vs. previous production version.
- Implement automated rollback stage: if post-deploy smoke tests fail, tag build as `failed-<build-id>` and redeploy last stable tag.
Framework-Specific Rules
MLflow
- Track every experiment/run; log artefacts, metrics, params, and code snapshot (`mlflow.start_run(code_paths=[...])`).
- Use MLflow Model Registry as the canonical source; promote versions through `Staging → Production → Archived` states via automation, never by UI clicks.
- Attach SemVer tag to registered model versions (`mlflow.set_tag('semver', '1.3.2')`).
- Block Production transition if model is not linked to a Git commit SHA and DVC data version.
DVC
- Store large datasets and model binaries under DVC control; push to remote storage on every merge to `main`.
- Use DVC params files (`params.yaml`) and pipelines (`dvc.yaml`) to codify training; lock model version output path as `models/<model_name>/<semver>/model.pkl`.
- Gates: PR merge fails if `dvc repro` diverges from committed artefacts.
Kubeflow & Kubernetes
- Package each model as a Docker image tagged with the exact SemVer plus Git SHA (`ghcr.io/org/model:1.4.0+abc1234`).
- Deploy via KServe; route traffic using switching/percentage rollout; rollback == update TrafficSplit to previous revision.
- Store Kustomize/Helm manifests in `infra/` directory, versioned under Git.
CI/CD Pipeline Rules
- Trigger on PR and on merge; generate version from `git describe --tags --abbrev=0` + bump script.
- Stages: `lint → unit-test → dvc repro --dry → train → evaluate → register → deploy`.
- Publish provenance manifest (`provenance.json`) containing: SemVer, commit SHA, data hash, metrics, Docker digest.
Testing
- Unit tests: ≥ 90 % coverage for data preprocessing & model wrapper functions.
- Integration tests: load model by version tag and perform canonical prediction set; assert numerical tolerance vs. expected.
- Canary tests: after deploy, run real-time shadow traffic for ≥ N requests before routing production traffic.
Performance & Monitoring
- Log inference latency histograms per model version to Prometheus; alert if P95 latency increases > X % vs. previous version.
- Store model-specific feature drift stats; trigger retraining pipeline when drift score > threshold.
Security & Compliance
- Sign Docker images with Cosign; verify signature before deployment.
- Encrypt model artefacts at rest; restrict registry access via least privilege.
- Maintain audit trail linking each production prediction to model version & commit SHA (regulatory compliance).
Common Pitfalls & Guards
- ❌ Don’t overwrite `latest` tag; always use immutable tags.
- ❌ Don’t couple data and model repos separately—use sub-modules or monorepo with DVC to ensure synchronised versions.
- ✅ Always link CI run, MLflow run, and Git commit together; if any link missing → fail pipeline.
Directory/Branch Naming
- Git branches: `feature/<ticket-id>`, `bugfix/<ticket-id>`, `experiment/<model-name>/<short-desc>`.
- Model artefact path: `artifacts/<model-name>/<semver>/<timestamp>/`.
- Tags: `model/<model-name>/vX.Y.Z`.
Checklist Before Production Release
1. SemVer bump committed.
2. All tests pass; eval metrics ≥ gateway.
3. Artefacts pushed to DVC remote & MLflow registry.
4. Provenance manifest updated & signed.
5. Change log (`CHANGELOG.md`) entry added.
6. CI/CD auto-promotes model to `Production` and records deployment in monitoring.