Comprehensive Rules for building, auditing, and maintaining bias-aware ML systems
Your ML model shipped with great performance metrics, but three months later stakeholders are reporting concerning disparities in outcomes across demographic groups. Sound familiar? Fairness drift is the silent productivity killer that turns successful deployments into emergency firefighting sessions.
Traditional ML workflows treat fairness as a post-deployment afterthought. You're stuck with:
These aren't just operational headaches—they're business risks that can derail entire ML initiatives.
These Cursor Rules transform bias detection from reactive damage control into proactive engineering discipline. You get a complete fairness-first ML development framework that:
Embeds fairness into your existing ML pipeline instead of bolting it on afterward. Every sklearn Pipeline automatically includes fairness constraints and monitoring hooks.
Automates continuous bias monitoring with configurable thresholds and alerting. Catch fairness drift before stakeholders do.
Enforces reproducible fairness metrics with standardized implementations across AI Fairness 360, Fairlearn, and AWS SageMaker Clarify.
Generates compliance documentation automatically through Model Cards and Dataset Datasheets that satisfy regulatory requirements.
# Scattered bias checks after model training
model = LogisticRegression().fit(X_train, y_train)
predictions = model.predict(X_test)
# Manual disparity calculation
male_precision = precision_score(y_test[gender=='M'], predictions[gender=='M'])
female_precision = precision_score(y_test[gender=='F'], predictions[gender=='F'])
disparity = abs(male_precision - female_precision)
if disparity > 0.05: # Threshold buried in code
print("Bias detected!") # No automated response
# Fairness constraints built into the pipeline
pipeline = build_fairness_pipeline(
fairness_constraints=DemographicParity(),
protected_attributes=['gender', 'age_group'],
threshold=0.02
)
# Automatic bias monitoring with alerting
monitor = FairnessMonitor(
model=pipeline,
metrics=['demographic_parity', 'equalized_odds'],
alert_channels=['#ml-ops', '#legal-compliance']
)
# Production deployment with continuous monitoring
pipeline.fit(X_train, y_train)
monitor.deploy_with_monitoring(schedule="0 */6 * * *")
Before: 3-day bias audit cycles blocking releases, manual metric calculations prone to errors, reactive stakeholder escalations requiring emergency model rollbacks.
After: Sub-hour fairness validation during CI/CD, automated drift detection preventing production issues, self-documenting compliance artifacts ready for regulatory review.
Build fairness constraints directly into sklearn Pipelines. No separate tools or workflow disruption—just enhanced pipelines that happen to be bias-aware.
Automated bias regression tests fail your build before unfair models reach production. Fix fairness issues during development when they're cheap to address.
Continuous monitoring jobs run fairness audits automatically. Get Slack alerts when bias thresholds are breached instead of discovering issues through stakeholder complaints.
Auto-generated Model Cards and Dataset Datasheets mean regulatory documentation writes itself. No more scrambling to create audit trails after the fact.
Replace your existing sklearn pipelines with fairness-aware versions:
from fairlearn.reductions import ExponentiatedGradient, DemographicParity
from sklearn.pipeline import Pipeline
def build_fairness_pipeline(base_estimator, protected_attrs):
fair_classifier = ExponentiatedGradient(
base_estimator,
constraints=DemographicParity(),
eps=0.02 # Configurable fairness tolerance
)
return Pipeline([
('preprocessor', build_preprocessor(protected_attrs)),
('classifier', fair_classifier)
])
Add continuous bias monitoring to your MLOps pipeline:
# Deploy with built-in fairness monitoring
monitor = FairnessMonitor(
model_name="credit_model_v2",
fairness_metrics=["demographic_parity", "equalized_odds"],
protected_attributes=["race", "gender"],
threshold=0.05,
alert_config={
"slack_webhook": "https://hooks.slack.com/...",
"escalation_threshold": 0.10
}
)
monitor.schedule_monitoring(cron="0 */6 * * *")
Generate audit-ready documentation automatically:
from modelcard_toolkit import ModelCard
# Auto-generate compliance docs
card = ModelCard.from_pipeline(
pipeline=fairness_pipeline,
training_data=train_dataset,
fairness_metrics=computed_metrics
)
card.export_to_pdf("model_cards/credit_model_v2.pdf")
Your team shifts from reactive bias firefighting to proactive fairness engineering. Models ship with confidence, stakeholders trust your fairness processes, and regulatory compliance becomes a byproduct of good engineering practice.
Ready to eliminate fairness debt from your ML systems? These Cursor Rules provide the complete framework for production-grade bias detection and monitoring. Your future self—and your stakeholders—will thank you.
You are an expert in Bias-aware Machine-Learning pipelines using Python, Jupyter, scikit-learn, PyTorch, TensorFlow, AI Fairness 360, Fairlearn, SHAP/LIME, AWS SageMaker Clarify, and Holistic AI.
Key Principles
- Prioritise fairness, transparency and reproducibility across the entire ML lifecycle.
- Treat bias as context-specific; define metrics with domain stakeholders before coding.
- Automate continuous bias monitoring ("fairness drift") in production.
- Prefer immutable, version-controlled datasets; log all data lineage and model artefacts.
- Employ interpretable models first; graduate to complex models only with documented justification.
- Engage multidisciplinary reviews (legal, domain, ethics) at every milestone.
- Never ship a model without a completed Model Card and Datasheet for Datasets.
Python
- Use Python 3.11+ with strict type hints (PEP-484) and "from __future__ import annotations".
- Follow PEP-8/PEP-257; 120-char max line length for notebooks & scripts.
- Use dataclasses or pydantic BaseModel for structured configs. Avoid mutable default args.
- Keep functions < 40 lines; favour pure functions & pipeline composition over large classes.
- File layout:
├── data/ (immutable raw)
├── notebooks/ (EDA & prototype)
├── src/
│ ├── pipelines/ (sklearn Pipeline objects)
│ ├── metrics/ (custom fairness metrics)
│ ├── monitors/ (drift + bias detection jobs)
│ └── reports/ (datasheets, model cards)
└── tests/
Error Handling and Validation
- Validate inputs with pydantic; raise ValidationError with context-rich messages.
- Catch ML library errors (e.g. sklearn.exceptions.NotFittedError) and wrap in custom FairnessError hierarchy.
- Always fail fast when fairness metrics breach thresholds; emit alerts and rollback model.
- Log fairness events to central store (e.g. OpenTelemetry) with correlation IDs.
Framework-Specific Rules
AI Fairness 360 / Fairlearn
- Wrap preprocessing, in-processing, post-processing debiasers in sklearn-compatible Pipelines.
- Store interim transformed datasets for audit.
- Compute at least one group fairness metric (Demographic Parity, Equalised Odds) and one individual metric (Counterfactual Fairness).
scikit-learn
- Use ColumnTransformer to isolate sensitive attributes; never drop them prior to training.
- Parameterise random_state everywhere for reproducibility.
PyTorch / TensorFlow
- Isolate model architecture in src/models; no data access inside model class.
- Register custom loss functions that include fairness regularisers (e.g., equalised odds penalty).
AWS SageMaker Clarify
- Configure pre-training and post-training bias jobs as CI pipeline stages.
- Persist Clarify reports to S3; expose via Model Card.
Additional Sections
Testing
- Unit: pytest with 90% coverage. Include tests for fairness metrics edge cases (e.g., zero positives).
- Integration: run end-to-end pipeline in GitHub Actions using small synthetic dataset.
- Fairness regression tests: lock baseline disparity scores; fail build on regression.
Performance & Scalability
- Optimise data pipelines with pandas-api on Spark when rows >10M.
- Cache SHAP computations; sample background dataset to <= 1k rows.
Security & Privacy
- Remove PII before logging; encrypt sensitive attributes at rest.
- Differential privacy noise must not exceed ε = 1 for published metrics.
Documentation
- Generate Model Card (modelcard-toolkit) and Dataset Datasheet automatically in docs/.
- Use Sphinx-autodoc; enforce complete docstrings in CI.
Common Pitfalls & Mitigations
- Pitfall: Dropping protected attributes ➜ Instead, retain for evaluation; mask only at inference.
- Pitfall: Single snapshot audit ➜ Schedule cron-based bias monitors.
- Pitfall: Proxy variables leaking sensitive attributes ➜ Run correlation checks > 0.8 with sensitive cols.
Example Snippet (Bias-Aware Pipeline)
```python
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from fairlearn.reductions import ExponentiatedGradient, DemographicParity
from aif360.metrics import BinaryLabelDatasetMetric
num_cols = ["age", "salary"]
cat_cols = ["department"]
protected = ["gender"]
label = "promotion"
def build_pipeline():
pre = ColumnTransformer([
("num", StandardScaler(), num_cols),
("cat", OneHotEncoder(handle_unknown="ignore"), cat_cols)
])
unfair_clf = LogisticRegression(max_iter=1000, n_jobs=-1)
fair_clf = ExponentiatedGradient(
unfair_clf,
constraints=DemographicParity(),
eps=0.02,
)
return Pipeline([
("pre", pre),
("clf", fair_clf)
])
```
Continuous Fairness Monitoring Skeleton
```python
from fairnow import Monitor
monitor = Monitor(
model_name="promotion_2024Q1",
fairness_metrics=["demographic_parity", "equalized_odds"],
threshold=0.05,
alerting={"slack_channel": "#fairness-alerts"}
)
monitor.run_schedule(cron="0 */6 * * *")
```