Comprehensive Rules for designing, training, deploying, and maintaining fair machine-learning systems.
Stop shipping biased AI systems. Your models impact real people's lives—and your organization's reputation—every day. These production-tested Cursor Rules implement fairness as a first-class requirement throughout your entire ML pipeline, not an afterthought.
Your current ML workflow probably looks like this: train model → evaluate accuracy → deploy → hope for the best. Meanwhile, your models are making decisions that systematically disadvantage entire demographic groups, creating legal liability and eroding user trust.
The real problems you're facing:
These Cursor Rules establish fairness as a quantifiable, testable requirement—just like performance or security. You get standardized workflows across IBM AIF360, Fairlearn, and modern MLOps tooling that catch bias early and maintain fairness throughout your model's lifecycle.
What you get:
Instead of manual analysis across demographic slices, automated MetricFrame evaluation gives you comprehensive fairness metrics in seconds:
# Before: Hours of manual demographic analysis
# After: Comprehensive fairness evaluation
metrics = MetricFrame(
metrics={'accuracy': accuracy_score, 'eq_odds_diff': equalized_odds_difference},
y_true=y_test, y_pred=y_pred, sensitive_features=sensitive_feature['test']
)
print(metrics.by_group) # Instant per-group breakdown
Continuous fairness testing fails CI when metrics deviate >2% from baseline—catching bias before it reaches users instead of discovering it through customer complaints or legal action.
Transform fragmented bias mitigation into standardized three-step pipelines:
Before these rules:
# Train model without fairness considerations
model = RandomForestClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, predictions)}")
# Deploy → discover demographic disparities in production
With fairness rules:
# Fairness-first development with automated mitigation
base_estimator = LogisticRegression(max_iter=1000, n_jobs=-1)
mitigator = ExponentiatedGradient(base_estimator, constraints=EqualizedOdds())
mitigator.fit(X_train, y_train, sensitive_features=race_train)
y_pred = mitigator.predict(X_test)
metrics = MetricFrame(
metrics={'accuracy': accuracy_score, 'eq_odds_diff': equalized_odds_difference},
y_true=y_test, y_pred=y_pred, sensitive_features=race_test
)
# Automatic compliance documentation and audit trails
Before: Manual bias audits months after deployment reveal systematic discrimination.
With fairness rules: Real-time Prometheus metrics trigger alerts when demographic parity drops below 0.05, enabling immediate intervention:
# Automated fairness monitoring in production
fairness_gauge = Gauge('model_demographic_parity', 'Demographic parity difference', ['model_version', 'slice'])
if demographic_parity_diff > SLA_THRESHOLD:
trigger_retraining_pipeline()
mkdir your-fair-ml-project && cd your-fair-ml-project
mkdir -p src/{data,models,metrics,mitigation,notebooks}
pip install fairlearn aif360 pandas numpy scikit-learn pytorch
pip install pydantic mypy ruff black isort
pip install shap lime mlflow dvc prometheus-client
.cursor-rules in your project root# src/models/fair_classifier.py - Auto-generated with rules
from fairlearn.reductions import ExponentiatedGradient, EqualizedOdds
from fairlearn.metrics import MetricFrame, demographic_parity_difference
class FairClassifier:
def __init__(self, base_estimator, fairness_constraint=EqualizedOdds()):
self.mitigator = ExponentiatedGradient(base_estimator, constraints=fairness_constraint)
def fit_with_fairness(self, X_train, y_train, sensitive_features):
"""Fit model with automatic fairness constraints"""
self.mitigator.fit(X_train, y_train, sensitive_features=sensitive_features)
return self
def evaluate_fairness(self, X_test, y_test, sensitive_features):
"""Generate comprehensive fairness metrics"""
y_pred = self.mitigator.predict(X_test)
metrics = MetricFrame(
metrics={'accuracy': accuracy_score, 'demographic_parity': demographic_parity_difference},
y_true=y_test, y_pred=y_pred, sensitive_features=sensitive_features
)
return metrics
# tests/test_fairness_continuous.py
def test_fairness_regression():
"""Fail CI if fairness metrics degrade"""
current_metrics = evaluate_model_fairness()
baseline_metrics = load_baseline_metrics()
assert abs(current_metrics.demographic_parity - baseline_metrics.demographic_parity) < 0.02
assert abs(current_metrics.equalized_odds - baseline_metrics.equalized_odds) < 0.02
Your next model deployment could perpetuate systemic bias—or help eliminate it. These Cursor Rules make fairness-first development as natural as writing tests.
Copy the rules, open Cursor, and start building AI systems that work fairly for everyone.
The difference between biased and fair AI isn't complexity—it's having the right development patterns built into your workflow from day one.
You are an expert in Responsible AI, Python, Scikit-Learn, PyTorch, TensorFlow, Fairlearn, IBM AIF360, Pandas, NumPy, and modern MLOps tooling.
Key Principles
- Fairness is a first-class, quantifiable, and testable requirement.
- Address bias at every stage: data → model → deployment → monitoring.
- Prefer transparent, explainable models when performance trade-off is acceptable.
- Always isolate and measure metrics per demographic slice and intersectional groups.
- Document choices, assumptions, and mitigation steps in model cards and data sheets.
- Automate reproducibility with notebooks as experiments, Python packages as production code.
- Version data, code, and trained artefacts (e.g., with DVC or MLflow).
- Never deploy without a rollback strategy and scheduled bias audits.
Python
- Use Python 3.11+ with type hints (PEP 484) and static analysis (mypy, Ruff).
- Adopt black for formatting (line length 88) and isort for import ordering.
- Pure functions for data transformations; limit side-effects.
- Naming
- df_* for DataFrames, X_/y_ for features/labels.
- fairness_*, bias_*, parity_* for mitigation utilities.
- use snake_case for modules & functions; PascalCase for classes when unavoidable.
- Directory layout
- src/
- data/: raw, processed, synthetic generators
- models/: training scripts & saved weights
- metrics/: fairness metric implementations
- mitigation/: preprocessing, in-processing, post-processing techniques
- notebooks/: exploratory analysis (never imported by prod code)
- Never keep secrets or PII in code or notebooks; read from env vars or vault.
Error Handling and Validation
- Validate input schema with pydantic.BaseModel at API & training boundaries.
- Prepend all functions that consume external data with `assert_not_null(df)` helper.
- On missing protected attributes:
- raise `ProtectedAttributeMissingError` with guidance string.
- Wrap third-party mitigation calls in try/except; on failure, log `fairness.error` event and re-raise custom `BiasMitigationError`.
- Use early returns; avoid nesting > 2 levels.
- Maintain structured logs (JSON) including: model_version, dataset_hash, slice_id, metric_name, metric_value.
Fairness Toolkits / Frameworks
Fairlearn
- Always split data into train/val/test BEFORE calling `ExponentiatedGradient` or `GridSearch` mitigation.
- Use `fairlearn.metrics.MetricFrame` to compute overall & group metrics in one call.
- Register the following mandatory metrics: demographic_parity_difference, equalized_odds_difference, equal_opportunity_difference, selection_rate.
- Persist MetricFrame to Parquet with timestamp for audit.
IBM AIF360
- Convert Pandas DataFrame → `BinaryLabelDataset` with explicit `protected_attribute_names` and `privileged_classes`.
- Chain: pre-processing (e.g., Reweighing) → classification → post-processing (e.g., CalibratedEqOdds).
- Store original, mitigated, and delta metrics side-by-side in `evaluation_report.json`.
Explainability (SHAP/LIME)
- Generate per-slice SHAP summary; flag feature attributions that differ >25% between groups.
- Include explainer artifacts in the model registry entry.
Testing
- Unit: pytest with 90% coverage; include edge cases of empty, all-null, and single-class datasets.
- Property tests (hypothesis) for mitigation methods: invariants — no new NaNs, group counts unchanged.
- Continuous fairness tests: fail CI if any registered fairness metric deviates >2% from last accepted baseline.
Performance & Scalability
- Use batch processing for mitigation when dataset >1M rows; leverage Spark + AIF360-spark wrappers.
- Cache intermediate datasets with parquet & Zstandard compression.
- Profile mitigation pipeline with PyInstrument; optimize hotspots before deployment.
Security & Compliance
- Hash all personal identifiers using SHA-256 + salt BEFORE storage.
- Apply differential privacy noise when exporting aggregate metrics externally.
- Maintain GDPR/CCPA compliance logs: data_origin, consent_status.
Monitoring & Observability
- Deploy Prometheus exporters exposing fairness metrics per slice.
- Trigger alert if demographic_parity_difference > configured SLA (e.g., 0.05).
- Retrain trigger: concept drift OR fairness drift beyond threshold.
Documentation
- Produce Model Card v0.5 with: intended use, metrics, ethical considerations.
- Data Sheet for Datasets v1.0 — include sampling and balancing description.
Common Pitfalls & Anti-patterns
- DO NOT drop protected attributes early; keep for analysis then remove for scoring.
- Avoid single global metric; always review intersectional results.
- Never hard-code threshold; derive per group if justified, otherwise uniform.
- Resist over-mitigating to the point of harming accuracy for all groups; justify trade-offs.
Example Snippet: enforcing equalized odds with Fairlearn
```python
from fairlearn.reductions import ExponentiatedGradient, EqualizedOdds
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from fairlearn.metrics import MetricFrame, equalized_odds_difference
# X_train, X_test, y_train, y_test, sensitive_feature defined earlier
base_estimator = LogisticRegression(max_iter=1000, n_jobs=-1)
mitigator = ExponentiatedGradient(base_estimator, constraints=EqualizedOdds())
mitigator.fit(X_train, y_train, sensitive_features=sensitive_feature['train'])
y_pred = mitigator.predict(X_test)
metrics = MetricFrame(metrics={'accuracy': accuracy_score,
'eq_odds_diff': equalized_odds_difference},
y_true=y_test, y_pred=y_pred,
sensitive_features=sensitive_feature['test'])
print(metrics.by_group)
```
This configuration serves as an immediately applicable guide for building, testing, and maintaining fairness-aware AI systems across the entire ML lifecycle.