Comprehensive coding rules for building privacy-centric AI systems in Python with FastAPI & modern ML frameworks.
Privacy breaches in AI systems aren't just expensive—they're existential threats to your business. A single GDPR violation can cost 4% of annual revenue, while a data breach in ML systems exposes not just individual records but entire model architectures and training patterns.
Most development teams bolt privacy onto AI systems as an afterthought, creating fundamental vulnerabilities:
These aren't edge cases—they're systematic failures that occur when privacy isn't architected from day one.
These Cursor Rules embed enterprise-grade privacy controls directly into your development workflow. Instead of retrofitting compliance, you build it into every function, API endpoint, and model training loop.
What You Get:
Compliance Automation: Cut privacy review cycles from weeks to hours with automated compliance reporting and audit trails built into your CI/CD pipeline.
Safe Model Training: Eliminate accidental privacy budget exhaustion with hard limits (ε ≤ 8, δ ≤ 1e-5) and jurisdiction-aware parameter management through HashiCorp Vault.
Breach Prevention: Stop data leaks before they happen with structured logging that automatically redacts PII and API responses that exclude private attributes by default.
Regulatory Confidence: Deploy models with complete data lineage documentation, privacy budget accounting, and automated model cards that satisfy auditor requirements.
# Before: Risky manual privacy controls
model.fit(raw_user_data, epochs=10) # No privacy guarantees
# After: Built-in differential privacy with budget tracking
@privacy_budget_check(epsilon=2.0, delta=1e-5)
def train_private_model(encrypted_features: EncryptedDataset):
dp_optimizer = DPAdamGaussianOptimizer(
l2_norm_clip=1.0,
noise_multiplier=1.1,
num_microbatches=250
)
# Training with automatic budget accounting
# Automatic PII exclusion and consent validation
@app.post("/train", dependencies=[Depends(require_consent)])
async def train_endpoint(
request: TrainingRequest,
user: User = Depends(get_current_user)
):
# Structured logging with field redaction
logger.info("training_request", extra={
"user_id": user.id,
"fields_redacted": True,
"privacy_budget_requested": request.epsilon
})
# Keep sensitive data on-premise while enabling collaborative training
@federated_strategy(min_clients=3, fraction_fit=0.3)
class PrivacyPreservingStrategy(fl.server.strategy.FedAvg):
def configure_fit(self, server_round, parameters, client_manager):
# Automatic client selection with privacy constraints
return super().configure_fit(
server_round, parameters,
client_manager.sample(
num_clients=self.min_clients,
min_num_clients=self.min_clients
)
)
# Set up the privacy-first directory structure
mkdir -p app/{api/v1/endpoints,ml/{training,inference},core} tests/privacy docs/model_cards
# core/config.py - Load privacy parameters from Vault
class PrivacySettings(BaseSettings):
dp_epsilon_eu: float = Field(..., le=8.0) # GDPR-compliant budget
dp_epsilon_us: float = Field(..., le=10.0) # CCPA-compliant budget
data_retention_days: int = Field(default=365)
class Config:
vault_url = "https://vault.company.com"
vault_path = "privacy/ai-service"
# tests/privacy/test_privacy_regression.py
def test_no_new_pii_fields():
"""Ensure no new PII fields added to API schemas"""
current_schemas = extract_schema_fields()
approved_schemas = load_approved_schemas()
new_pii_fields = detect_pii_fields(
set(current_schemas) - set(approved_schemas)
)
assert not new_pii_fields, f"New PII fields detected: {new_pii_fields}"
# .github/workflows/privacy-audit.yml
- name: Privacy Budget Audit
run: |
python -m privacy_audit.check_budget_usage
python -m privacy_audit.generate_compliance_report
- name: Upload Compliance Artifact
uses: actions/upload-artifact@v3
with:
name: privacy-compliance-report
path: compliance-report.pdf
Week 1: Complete setup with automated privacy controls, differential privacy training pipeline, and GDPR-compliant API endpoints.
Month 1: Full federated learning deployment for sensitive datasets, automated compliance reporting, and zero privacy-related security incidents.
Quarter 1: 90%+ reduction in privacy review overhead, complete audit trail for all model training, and demonstrated compliance with multiple jurisdictions.
Beyond: Your AI systems become the privacy compliance benchmark for your organization, with built-in protections that scale automatically as you add new models and data sources.
Stop treating privacy as a post-deployment problem. These rules make privacy violations literally impossible to deploy, turning your development workflow into a continuous compliance engine that builds trust with every commit.
You are an expert in Python, FastAPI, TensorFlow Privacy, Opacus, Flower (federated learning), PostgreSQL, Docker, Kubernetes, HashiCorp Vault, and modern DevSecOps tooling.
Key Principles
- Adopt Privacy-by-Design and Security-by-Design from the first commit.
- Collect and process only data that is strictly necessary (data minimisation).
- Treat all personal data as toxic: encrypt in transit & at rest, redact in logs, purge when no longer needed.
- Prefer stateless, functional, immutable code; side-effects must be explicit.
- Fail fast & loud on privacy-relevant errors; never silently degrade into insecure modes.
- Keep the “happy path” last—handle edge cases and validation first.
- Automate everything: reproducible builds, automated compliance tests, continuous privacy audits.
Python
- Follow PEP 8 + Black formatting. Enforce with pre-commit.
- Use type hints everywhere; mypy error level: strict.
- Prefer dataclasses or pydantic.BaseModel over unstructured dicts.
- Never log raw user input or model features; use structured logging with field redaction:
```python
logger.info("train_request", extra={"user_id": user.id, "fields_redacted": True})
```
- Use context managers for all I/O (files, DB, network) to guarantee closing & zeroising buffers.
- Naming: snake_case for functions/vars, CapWords for classes, UPPER_SNAKE for constants with clear intent (e.g., GDPR_ERASURE_DAYS).
Error Handling & Validation
- Validate external data at API boundary with pydantic validators; reject unknown fields (config.extra = "forbid").
- Centralised error handler returns privacy-preserving messages:
• 4xx: “Invalid input”,
• 5xx: generic “Service error; reference ID=XYZ”.
Log the full traceback internally tagged with correlation_id.
- Implement breach-detection hooks:
```python
try:
sensitive_op()
except UnauthorizedAccess as exc:
alert_security(exc, severity="high")
raise HTTPException(403, detail="Access denied")
```
- Use early returns instead of nested if/else chains to keep code flat and auditable.
FastAPI
- Every endpoint must declare a Scope enum {public, user, admin}. Deny by default.
- Require explicit dependency injection for auth (OAuth2 + PKCE) and rate limiting.
- Pydantic response_model must exclude private attributes (`response_model_exclude={"ssn", "email"}`).
- Version APIs (/v1/) and freeze contracts; breaking changes → new version.
- Add `X-Data-Processing-Consent: true` header to all mutating requests; middleware denies if absent.
Machine Learning (TensorFlow Privacy / Opacus)
- Training: default to DP-SGD with ε ≤ 8, δ ≤ 1e-5. Hard-fail pipeline if budget exceeded.
- Parameterise privacy budget per jurisdiction (EU, US) via config file stored in Vault.
- Always store raw data encrypted; decrypt in a secure enclave or tmpfs, destroy after epoch.
- Publish a model card including: data sources, ε, fairness metrics, known limitations.
- Use Flower for federated learning when data must remain on-premise; orchestrate with TLS 1.3 + mTLS.
Database (PostgreSQL)
- Column‐level encryption (pgcrypto) for PII.
- Enable row-level security; policies must reference user.jurisdiction.
- Schedule automatic deletion jobs (GDPR Right to Erasure) using pg_cron.
DevSecOps & Infrastructure
- Docker images: start FROM distroless python, non-root user, COPY only wheel + entrypoint.
- Kubernetes: enforce PodSecurity Standards baseline + restricted; secrets via Vault CSI driver.
- Enable NetworkPolicies; block egress except whitelisted domains.
- Use OPA/Gatekeeper policy preventing containers with CAP_SYS_ADMIN.
Testing & Auditing
- Unit tests require ≥ 90 % coverage, run with pytest-cov.
- Privacy regression tests: verify no new PII fields added to schemas (snapshot test).
- Fuzz test all API endpoints with schemathesis.
- Continuous privacy audit pipeline:
1. Static analysis (Bandit, Semgrep privacy ruleset).
2. Differential privacy accounting check.
3. Compliance report artifact (PDF) uploaded to Confluence.
- Human-in-the-loop review for model drift & bias every sprint.
Performance & Observability
- Add OpenTelemetry traces, tagging `data_retention` and `privacy_budget_used`.
- Grafana dashboard: ε spending over time, DP noise scale, federated round latency.
- Cap API p95 latency ≤ 200 ms; but refuse optimisation that weakens privacy guarantees.
Security
- Implement Zero Trust: mutual TLS everywhere, short-lived JWTs, rotate keys ≥ daily.
- AI Firewall (e.g., ProtectAI) in-line to inspect prompts & outputs for policy violations.
- XDR integration for anomaly detection; map alerts to MITRE ATT&CK.
Documentation
- Each module must include a PRIVACY.md explaining data flows and legal basis (GDPR Art. 6).
- README badges: build, coverage, ε budget, last audit date.
Directory Structure (example)
```
app/
api/
v1/
endpoints/ # FastAPI routers
schemas.py # Pydantic models
ml/
training/
dp_sgd.py # Differential privacy trainers
inference/
service.py # Model serving logic
core/
config.py # Settings, loaded from environment + Vault
security.py # Auth & encryption helpers
tests/
privacy/ # Privacy regression tests
docs/
model_cards/
```
Common Pitfalls & Guardrails
- Never disable DP noise for “just one debug run”. Use synthetic data for debugging.
- Avoid mutual information leaks through logging/metrics—hash or bucket sensitive values.
- Do not parallelise DP accounting incorrectly—use library-provided accountant.
- Respect jurisdictional boundaries: no cross-region data replication without anonymisation.