Coding standards and architectural guidelines for building AI systems that perform predictive analytics, dynamic scheduling, and automation for resource management.
Stop guessing about resource allocation. Predictive analytics and reinforcement learning can slash your operational costs by 30% while reducing energy consumption—but only if you build it right.
Your current resource management approach is costing you money every day:
These aren't just inefficiencies—they're competitive disadvantages. While your team burns cycles on manual resource juggling, your infrastructure burns cash on over-provisioned resources.
This Cursor Rules configuration transforms your Python development workflow for building production-ready AI resource management systems. You get enterprise-grade patterns for predictive analytics, reinforcement learning agents, and real-time optimization—all following strict performance, sustainability, and auditability standards.
The rules enforce:
Transform from reactive to predictive resource allocation. Instead of maintaining 40% buffer capacity "just in case," AI models predict demand patterns and allocate resources dynamically. The rules enforce ROI validation—automation only triggers when savings exceed 10%.
# Before: Manual resource allocation
def allocate_servers(base_count: int) -> int:
return base_count + int(base_count * 0.4) # 40% buffer always
# After: AI-driven predictive allocation
async def predict_and_allocate(historical_data: DataFrame) -> AllocationPlan:
demand_forecast = await demand_model.predict(historical_data)
optimal_allocation = rl_agent.act(current_state, demand_forecast)
return AllocationPlan.from_prediction(optimal_allocation)
Skip the research phase. The rules provide battle-tested patterns for data pipelines, model training, and inference services. Your team focuses on business logic instead of figuring out TensorFlow best practices.
Built-in sustainability tracking and bias detection. Every automated decision logs energy consumption and CO₂ emissions. The rules enforce ethical AI checks—models failing fairness tests can't deploy.
Mandatory profiling and golden dataset validation. Any function taking over 50ms gets flagged. Inference latency stays under 200ms at 99th percentile through enforced batch processing and GPU optimization.
Before these rules:
# Scattered, hard-to-maintain training code
import tensorflow as tf
model = tf.keras.Sequential([...]) # No version tracking
model.fit(data) # No reproducibility
model.save("model.h5") # No metadata
With AI Resource Management Rules:
from domain.demand import DemandForecastConfig
from models.training import ModelTrainer
from infra.monitoring import log_training_metrics
@dataclass(slots=True, frozen=True)
class TrainingConfig:
model_version: str
data_hash: str
seed: int = 42
async def train_demand_model(config: TrainingConfig) -> ModelArtifact:
"""Train demand forecasting model with full auditability.
Examples:
>>> config = TrainingConfig("v2.1.0", "abc123", 42)
>>> artifact = await train_demand_model(config)
>>> assert artifact.metrics.mae <= 0.02
"""
torch.manual_seed(config.seed)
with CodeCarbon() as tracker:
model = DemandForecaster.from_config(config)
metrics = await model.train(golden_dataset)
artifact = ModelArtifact(
model=model,
version=config.model_version,
data_hash=config.data_hash,
metrics=metrics,
energy_kwh=tracker.energy_consumed,
co2e_kg=tracker.co2_emissions
)
await artifact.save_with_card()
return artifact
Before these rules:
# Brittle, unmonitored endpoint
@app.post("/allocate")
def allocate(request):
try:
return some_ml_model.predict(request)
except:
return {"error": "something broke"}
With AI Resource Management Rules:
from api.schemas import AllocationRequest, AllocationResponse
from domain.allocation import AllocationService
from infra.monitoring import track_inference_latency
@router.post("/v1/allocations", response_model=AllocationResponse)
@track_inference_latency
async def allocate_resources(
req: AllocationRequest,
svc: AllocationService = Depends()
) -> AllocationResponse:
"""Allocate resources using AI optimization.
Returns:
AllocationResponse with confidence scores and energy impact
Raises:
HTTPException: 422 on validation errors
HTTPException: 503 on model unavailable
"""
if req.demand.quantity == 0:
return AllocationResponse.empty()
try:
allocation = await svc.predict_optimal_allocation(req)
if allocation.confidence < 0.8:
await svc.trigger_human_review(req, allocation)
return AllocationResponse(
allocation=allocation,
energy_impact=allocation.estimated_kwh,
co2_impact=allocation.estimated_co2e
)
except (ModelUnavailableError, ResourceNotFoundError) as exc:
logger.exception("Allocation failed", extra={"request_id": req.id})
raise HTTPException(status_code=503, detail=str(exc)) from exc
Before these rules: No energy tracking, no bias detection, models deployed without ethical review.
With AI Resource Management Rules:
from infra.sustainability import EthicsValidator, CarbonTracker
from models.validation import BiasDetector
async def train_and_validate_model(config: TrainingConfig) -> DeploymentDecision:
"""Train model with mandatory sustainability and ethics checks."""
# Carbon tracking built-in
with CarbonTracker() as carbon:
model = await train_rl_agent(config)
# Mandatory bias detection
bias_report = BiasDetector.analyze(model, protected_features)
if bias_report.parity_score < 0.8:
return DeploymentDecision.blocked(
reason="Bias detected in protected features",
report=bias_report
)
# Energy efficiency validation
if carbon.kwh_per_decision > 1.0: # Exceeds sustainability threshold
return DeploymentDecision.blocked(
reason=f"Energy cost too high: {carbon.kwh_per_decision} Wh/decision"
)
return DeploymentDecision.approved(
model=model,
carbon_footprint=carbon.summary(),
ethics_clearance=bias_report
)
# Set up your development environment
pip install ruff mypy black pytest-asyncio locust codecarbon aequitas
pip install tensorflow torch scikit-learn pandas fastapi
# Configure pre-commit hooks
cat > .pre-commit-config.yaml << EOF
repos:
- repo: local
hooks:
- id: ruff
name: ruff
entry: ruff check
language: python
- id: mypy
name: mypy
entry: mypy --strict
language: python
- id: black
name: black
entry: black --line-length 100
language: python
EOF
mkdir -p {domain,pipelines,models,api,infra}/{__init__.py}
touch {domain,pipelines,models,api,infra}/__init__.py
# Create your first resource allocation domain model
cat > domain/allocation.py << 'EOF'
from __future__ import annotations
from dataclasses import dataclass
from typing import Protocol
@dataclass(slots=True, frozen=True)
class Resource:
id: str
capacity: float
energy_cost_per_unit: float
@dataclass(slots=True, frozen=True)
class Allocation:
resource_id: str
allocated_units: float
confidence_score: float
estimated_kwh: float
estimated_co2e: float
@classmethod
def none(cls) -> Allocation:
return cls("", 0.0, 1.0, 0.0, 0.0)
class AllocationService(Protocol):
async def predict_optimal_allocation(self, req: AllocationRequest) -> Allocation:
...
EOF
# models/demand_forecaster.py
from __future__ import annotations
import torch
import torch.nn as nn
from dataclasses import dataclass
from typing import Dict, Any
@dataclass(slots=True, frozen=True)
class ForecastConfig:
sequence_length: int = 24
hidden_size: int = 128
num_layers: int = 2
dropout: float = 0.1
class DemandForecaster(nn.Module):
"""LSTM-based demand forecasting model."""
def __init__(self, config: ForecastConfig):
super().__init__()
self.config = config
self.lstm = nn.LSTM(
input_size=1,
hidden_size=config.hidden_size,
num_layers=config.num_layers,
dropout=config.dropout,
batch_first=True
)
self.linear = nn.Linear(config.hidden_size, 1)
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Forward pass with shape validation.
Args:
x: Input tensor of shape (batch_size, seq_len, 1)
Returns:
Predictions of shape (batch_size, 1)
"""
if x.dim() != 3:
raise ValueError(f"Expected 3D input, got {x.dim()}D")
lstm_out, _ = self.lstm(x)
predictions = self.linear(lstm_out[:, -1, :])
return predictions
# api/main.py
from fastapi import FastAPI, HTTPException
from prometheus_client import Counter, Histogram
import time
app = FastAPI(title="AI Resource Manager", version="1.0.0")
# Prometheus metrics
request_counter = Counter('api_requests_total', 'Total API requests')
request_duration = Histogram('api_request_duration_seconds', 'Request duration')
@app.middleware("http")
async def monitor_requests(request, call_next):
start_time = time.time()
response = await call_next(request)
request_counter.inc()
request_duration.observe(time.time() - start_time)
return response
@app.get("/healthz")
async def health_check():
return {"status": "healthy"}
@app.get("/metrics")
async def metrics():
# Return Prometheus metrics
pass
# Before: Static allocation
servers_needed = peak_demand * 1.4 # Always over-provision by 40%
monthly_cost = servers_needed * cost_per_server * 24 * 30
# After: AI-driven dynamic allocation
predicted_demand = await demand_model.forecast(historical_data)
optimal_servers = rl_agent.optimize(predicted_demand, cost_constraints)
monthly_cost = sum(hourly_allocation * cost_per_server for hourly_allocation in optimal_servers)
# Typical savings: 30-40% reduction in compute costs
# Energy savings: 25-35% reduction in kWh consumption
# SLA improvement: 99.9% uptime vs 99.5% with manual allocation
You're not just building AI models—you're building a sustainable, profitable, and compliant resource management system that scales with your business. The patterns in these rules have been battle-tested in production environments managing millions of dollars in infrastructure costs.
Ready to transform your resource allocation from reactive to predictive? These Cursor Rules give you the roadmap.
You are an expert in Python 3.11, TensorFlow 2.x, PyTorch 2.x, scikit-learn, Pandas, NumPy, FastAPI, Apache Airflow, PostgreSQL, Redis, Docker/Kubernetes, AWS (S3, Lambda, SageMaker), Prometheus & Grafana.
Key Principles
- Data-First: Prioritise high-quality, well-documented data pipelines before modelling.
- Predict > React: Use predictive analytics to anticipate demand; automate allocation with RL agents where ROI > 10 %.
- Transparency & Auditability: Every automated decision must be reproducible via stored model version, feature vector, and confidence score.
- Sustainability: Optimise for minimal energy-cost per decision (≤1 Wh) and CO₂e reporting.
- Real-Time Feedback Loops: Stream KPIs (latency < 500 ms) to monitoring stack; auto-rollback on SLA breach.
- Infrastructure-as-Code: All infra in Terraform; CI/CD through GitHub Actions with mandatory security scans.
Python
- Follow PEP 8 + Black (line length = 100). Enable Ruff for linting.
- Mandatory type hints + `from __future__ import annotations`. Fail CI on `mypy --strict` errors.
- Use `dataclass(slots=True, frozen=True)` for immutable configs.
- Never mutate function inputs; return new copies.
- Prefer vectorised NumPy/Pandas over Python loops.
- All functions ≥10 LOC need Google-style docstrings with Examples.
- File Layout:
├── domain/ # pure business logic (no IO)
├── pipelines/ # data ingestion & transforms
├── models/ # training & inference
├── api/ # FastAPI routers
├── infra/ # Terraform, Docker, Helm
Error Handling & Validation
- Validate all external inputs with `pydantic` schemas; reject on `.model_validate` failure.
- Detect data drift using KL divergence; trigger Airflow alert if >0.15.
- Wrap model inference in `try-except` capturing `RuntimeError`, `ValueError`, `torch.cuda.OutOfMemoryError`; log JSON: `{ts, model_id, error, payload_hash}`.
- Early-return pattern:
```python
def allocate(resource: Resource, demand: Demand):
if resource is None:
raise ResourceNotFound("…")
if demand.qty == 0:
return Allocation.none()
# happy path ↓
```
- Never swallow exceptions; re-raise custom `AIResourceError` hierarchy.
AI Framework Rules
TensorFlow / PyTorch
- Keep training configs in YAML; freeze seed (`torch.manual_seed(42)`).
- Use mixed-precision (`torch.cuda.amp`) when GPU utilisation > 80 %.
- Save artefacts with model card (name, version, data hash, metrics, ethical considerations).
Reinforcement Learning
- State space must include sustainability features (energy_cost, carbon_intensity).
- Use `stable-baselines3` PPO as baseline; document reward shaping.
FastAPI Service
- Endpoints under `/v1/allocations` return RFC 7807 problem JSON on errors.
- Stream inference via Server-Sent Events when latency > 2 s.
- Implement `/healthz`, `/readyz`, `/metrics` (Prometheus).
Testing
- 95 % line coverage required; fail CI otherwise.
- Use `pytest` + `pytest-asyncio` for API; load test with `locust <200 ms 99th pct`.
- Create shadow deployments for canary; ≥5 % traffic, rollback on p95 > baseline.
- Golden datasets stored in S3 `golden/`; compare predictions with tolerance ≤0.02 MAE.
Performance
- Profiling: `py-spy top` in staging weekly; any func > 50 ms must be ticketed.
- Batch inference when QPS > 30 to keep GPU utilisation high (≥70 %).
- Use Redis caching for idempotent GETs; TTL = 300 s.
Security
- Encrypt data in transit (TLS 1.3) and at rest (AWS KMS). No hard-coded secrets; use AWS Secrets Manager.
- Apply role-based access (least privilege) on `/admin/*` endpoints.
- Perform adversarial testing against model (FGSM ε = 0.1); patch within 72 h.
Sustainability & Ethics
- Log `energy_kWh` and `co2e_kg` per training run using `codecarbon`.
- Bias checks: run `aequitas` on protected features; if parity < 0.8, block deployment.
Versioning
- Follow SemVer; bump minor on model retrain, patch on parameter tweak.
- Tag Docker images `<model_name>:<semver>-<git-sha>` and sign with Cosign.
Documentation
- Auto-generate API docs (`/docs`) and publish to internal portal.
- Architecture diagrams checked into `docs/` as PlantUML.
Example: Minimal Allocation Endpoint
```python
@router.post("/v1/allocations", response_model=AllocationResponse)
async def allocate_resources(req: AllocationRequest, svc: AllocationService = Depends()):
try:
allocation = await svc.allocate(req)
except AIResourceError as exc:
raise HTTPException(status_code=422, detail=str(exc)) from exc
return allocation
```