Opinionated rules for systematic, reproducible hyperparameter tuning in Python with scikit-learn, Optuna, Hyperopt and Ray Tune.
You've been there: hours of manual parameter tweaking, inconsistent results across runs, and that sinking feeling when your "optimized" model performs worse in production. Traditional hyperparameter tuning is broken—it's time-consuming, unreproducible, and leaves performance on the table.
Every data scientist faces the same workflow bottlenecks:
The real cost isn't just time—it's the performance gap between your current models and what's actually achievable with systematic optimization.
These Cursor Rules transform hyperparameter tuning from guesswork into a reproducible, intelligent process. You get battle-tested patterns for progressive search strategies, robust validation, and unified APIs across all major tuning frameworks.
The rules enforce a proven methodology: start with simple grid/random search to identify promising regions, then deploy Bayesian optimization or advanced schedulers like ASHA for fine-tuning. Every trial is logged, every search space is validated, and every result is reproducible.
Key Implementation Principles:
def tune(model_fn: Callable[[], BaseEstimator],
search_space: dict[str, Any],
X: NDArray, y: NDArray) -> dict:
# Pure functions, no global state
# Structured search spaces, not loose dicts
# Built-in validation and error handling
Before: Manual parameter tweaking, inconsistent results
# Typical ad-hoc approach
for lr in [0.01, 0.1, 1.0]:
for depth in [3, 5, 10]:
model = XGBClassifier(learning_rate=lr, max_depth=depth)
# No CV, no logging, no persistence
score = model.fit(X_train, y_train).score(X_test, y_test)
print(f"LR: {lr}, Depth: {depth}, Score: {score}")
After: Systematic, reproducible optimization
# Rules-compliant approach
search = RandomizedSearchCV(
estimator=pipeline,
param_distributions=search_space,
n_iter=80,
scoring="roc_auc",
cv=StratifiedKFold(5, shuffle=True, random_state=SEED),
n_jobs=-1
)
# Automatic logging, persistence, and validation
Impact: 3-5x faster hyperparameter discovery with measurably better model performance
Challenge: Training neural networks with early stopping while systematically exploring architecture and learning rate combinations.
Solution: Optuna with MedianPruner integration
def objective(trial):
lr = trial.suggest_loguniform('lr', 1e-5, 1e-1)
n_layers = trial.suggest_int('n_layers', 2, 6)
model = create_model(lr=lr, n_layers=n_layers)
for epoch in range(50):
score = train_epoch(model)
trial.report(score, epoch)
if trial.should_prune():
raise optuna.TrialPruned()
return score
study = optuna.create_study(
direction="maximize",
pruner=optuna.pruners.MedianPruner(n_startup_trials=5),
storage="sqlite:///optimization.db"
)
Result: 60% reduction in training time through intelligent pruning, with automated persistence and resumable studies.
Challenge: Scaling hyperparameter optimization across multiple GPUs for compute-intensive models.
Solution: Ray Tune with ASHA scheduler
from ray import tune
from ray.tune.schedulers import ASHAScheduler
def train_model(config):
model = create_model(**config)
# Training logic with tune.report() for metrics
search_space = {
"lr": tune.loguniform(1e-4, 1e-1),
"batch_size": tune.choice([16, 32, 64, 128]),
"dropout": tune.uniform(0.1, 0.5)
}
scheduler = ASHAScheduler(
max_t=100,
grace_period=10,
reduction_factor=3
)
tune.run(
train_model,
config=search_space,
scheduler=scheduler,
resources_per_trial={"cpu": 2, "gpu": 0.5}
)
Result: Linear scaling across available hardware with intelligent resource allocation and early termination of unpromising trials.
Challenge: Comparing optimization results across different tuning frameworks while maintaining consistent evaluation protocols.
Solution: Unified configuration and logging patterns
@dataclass
class TuningConfig:
search_space: dict[str, Any]
n_trials: int
cv_folds: int
random_seed: int
def run_tuning_study(config: TuningConfig, framework: str):
# Consistent CV setup across all frameworks
cv = StratifiedKFold(
n_splits=config.cv_folds,
shuffle=True,
random_state=config.random_seed
)
# Framework-specific implementation with unified logging
results = framework_dispatch[framework](config, cv)
# Standardized artifact persistence
persist_results(results, f"./artifacts/hp_tuning/{timestamp}/")
Result: Fair comparisons between frameworks with full reproducibility and standardized reporting.
.cursorrules filepip install scikit-learn optuna hyperopt ray[tune] mlflow
Create a structured tuning script:
from __future__ import annotations
import logging
from pathlib import Path
from dataclasses import dataclass
from sklearn.model_selection import RandomizedSearchCV, StratifiedKFold
@dataclass
class SearchConfig:
n_trials: int = 50
cv_folds: int = 5
random_seed: int = 42
def setup_logging():
logging.basicConfig(level=logging.INFO)
return logging.getLogger(__name__)
def tune_model(X, y, search_space, config: SearchConfig):
logger = setup_logging()
cv = StratifiedKFold(
n_splits=config.cv_folds,
shuffle=True,
random_state=config.random_seed
)
search = RandomizedSearchCV(
estimator=your_pipeline,
param_distributions=search_space,
n_iter=config.n_trials,
cv=cv,
n_jobs=-1,
verbose=2
)
results = search.fit(X, y)
# Automatic persistence
artifact_dir = Path("./artifacts/hp_tuning") / f"run_{timestamp}"
artifact_dir.mkdir(parents=True, exist_ok=True)
return results
Structure your hyperparameter spaces for optimal exploration:
# Log-uniform for learning rates, categorical for discrete choices
search_space = {
'model__learning_rate': loguniform(1e-4, 1e-1),
'model__max_depth': randint(3, 15),
'model__n_estimators': [50, 100, 200, 500],
'preprocessor__scaler': ['standard', 'robust', 'minmax']
}
Upgrade to Bayesian optimization once you've identified promising regions:
import optuna
def objective(trial):
params = {
'learning_rate': trial.suggest_loguniform('learning_rate', 1e-4, 1e-1),
'max_depth': trial.suggest_int('max_depth', 3, 15),
'n_estimators': trial.suggest_categorical('n_estimators', [50, 100, 200, 500])
}
# Cross-validation with early stopping
scores = cross_val_score(
create_model(**params), X, y,
cv=StratifiedKFold(5, shuffle=True, random_state=42),
scoring='roc_auc'
)
return scores.mean()
study = optuna.create_study(
direction="maximize",
storage="sqlite:///study.db",
study_name="model_optimization"
)
study.optimize(objective, n_trials=100)
Time Savings:
Model Performance:
Development Quality:
Production Reliability:
These rules don't just optimize your models—they systematize your entire approach to hyperparameter tuning, transforming it from a time sink into a competitive advantage. Your models will perform better, your experiments will be reproducible, and your tuning process will scale efficiently across any framework or compute environment.
You are an expert in Python, scikit-learn, Optuna, Hyperopt, Ray Tune and modern MLOps tooling.
Key Principles
- Start simple (grid/random search), then move to smarter search (Bayesian, ASHA) once promising regions are found.
- Tune high-impact knobs first: learning-rate, regularisation strength, model depth/width, batch size.
- Always evaluate with K-fold cross-validation (≥5 folds) or stratified variants for imbalanced data.
- Stop overfitting early: use early-stopping callbacks or pruning and monitor validation metrics in real time.
- Reproducibility is non-negotiable: fix random seeds, persist search space, results and environment versions.
- Automate, log and version every trial via MLflow/W&B; never tune interactively without persistence.
- Prefer interpretable search spaces (log-uniform for LR, categorical for activations) to avoid skewed sampling.
Python
- Use Python 3.10+ with strict typing (from __future__ import annotations).
- Encapsulate tuning logic in pure functions; avoid global state:
```python
def tune(model_fn: Callable[[], BaseEstimator], search_space: dict[str, Any], X: NDArray, y: NDArray) -> dict:
...
```
- Represent search spaces as StructuredConf (dataclass/YAML) not loose dicts.
- Use pathlib for file IO, never raw strings.
- Log every trial with logging.Logger at INFO level; avoid print.
- Name variables with intent: lr, n_layers, trial_no, best_score.
- Persist all artefacts to a ./artifacts/hp_tuning/ timestamped directory.
Error Handling and Validation
- Validate search space before run: raise ValueError if bounds overlap or are illogical (e.g. max_depth < min_depth).
- Fail fast: wrap objective in try/except and return np.inf (minimisation) on any exception so optimiser continues.
- Use Optuna pruning or Tune’s ASHAScheduler to terminate hopeless trials early.
- Catch and log CV splits with class imbalance >20 % variance; warn developer.
- Use sklearn.model_selection.check_cv to validate custom CV objects.
Framework-Specific Rules
scikit-learn (GridSearchCV / RandomizedSearchCV)
- Always pass n_jobs=-1, verbose=2, return_train_score=True.
- Prefer RandomizedSearchCV with ≥50 iterations before grid search on narrowed space.
- Wrap estimators in sklearn.pipeline.Pipeline; include preprocessing so CV is honest.
- Example:
```python
search = RandomizedSearchCV(
estimator=pipeline,
param_distributions=space,
n_iter=80,
scoring="roc_auc",
cv=StratifiedKFold(5, shuffle=True, random_state=SEED),
n_jobs=-1,
)
```
Optuna
- Use study = optuna.create_study(direction="maximize", pruner=optuna.pruners.MedianPruner(n_startup_trials=5)).
- Name trials with study.enqueue_trial for reproducible restarts.
- Store study in SQLite (study = create_study(storage="sqlite:///hpt.db", ...)).
- Define search space inside objective with suggest_* API; keep it deterministic w.r.t. trial.number.
Ray Tune
- Use tune.with_parameters to inject data, not global vars.
- Configure resources per trial (cpus, gpus) explicitly; avoid default scheduling.
- Attach ASHAScheduler for deep models:
```python
scheduler = tune.schedulers.ASHAScheduler(max_t=50, grace_period=5, reduction_factor=3)
```
Hyperopt
- Use hp.loguniform for learning rates, hp.choice for categorical.
- Set max_evals according to 20 × #hyperparameters minimum.
- Persist Trials() object with pickle after each run.
Additional Sections
Testing
- Unit-test objective functions with toy data to ensure they return finite metrics.
- Use pytest-parametrize to test boundary hyperparameter values.
Performance
- Downsample large datasets during prototyping; switch to full data only after space is refined.
- Parallelise CV via joblib; never set n_jobs in both estimator and searcher simultaneously.
Reproducibility & Reporting
- Record: dataset hash, git commit, search space, metric, wall-clock.
- Auto-generate markdown report with top-10 configs and metric distributions.
- Example CLI:
```bash
python tune.py --config conf/resnet50.yaml --seed 42 --report reports/resnet50.md
```
Security & Privacy
- Mask sensitive features during logging (hash or redact).
- Do not serialize raw data inside study objects; store only indices or hashes.