Cursor rules for building secure, automated, and test-driven backup & restore solutions in Python across on-prem and cloud environments.
Your infrastructure generates terabytes of critical data daily. When disaster strikes—ransomware, hardware failure, or that dreaded "oops, I deleted production"—you need bulletproof backup and restore systems that actually work under pressure.
Most organizations practice "backup theater"—they run backups religiously but discover during emergencies that:
You're not just backing up data—you're building the foundation that determines whether your organization survives a catastrophic failure.
These Cursor Rules transform your Python backup and restore development into a systematic, test-driven process that eliminates human error and ensures reliable recovery. Instead of cobbling together scripts, you'll build enterprise-grade solutions that follow the 3-2-1 rule and actually work when everything goes wrong.
Production-Ready Architecture: Automated backup orchestration with proper error classification, retry logic, and monitoring integration that enterprises depend on.
Security-First Design: Built-in encryption, immutable storage, credential management, and audit logging that passes compliance audits.
Verified Reliability: Automated restore testing, checksum validation, and performance monitoring that proves your backups work before you need them.
Before: Wrestling with boto3 for S3, then rewriting everything for Azure Blob, hardcoding credentials, and hoping restore works.
After: Clean abstraction layer that handles any storage backend:
@app.command()
def backup(path: Path, target: str, incremental: bool = True):
"""Back up <path> to <target> (s3://, azure://, file://)."""
backend = get_storage_backend(target) # Auto-detects from URL
job = BackupJob(source=path, target=backend, strategy=incremental)
await job.execute_with_verification() # Built-in checksum validation
Impact: Write once, deploy anywhere. Swap storage providers in configuration, not code.
Before: Running backups nightly, discovering during disaster recovery that 30% of files are corrupted and restore chains are broken.
After: Automated weekly restore testing with failure notifications:
async def weekly_restore_test():
"""Test random sample restores and validate integrity."""
samples = select_random_backup_samples(count=5)
for sample in samples:
try:
restored_path = await sample.restore_to_temp()
await verify_checksum_match(sample.original, restored_path)
metrics.increment("restore_test.success")
except ChecksumMismatchError:
alert_slack(f"Restore verification failed: {sample.path}")
metrics.increment("restore_test.failure")
Impact: Catch backup corruption proactively instead of during emergencies.
Before: Ransomware encrypts your "backup" directories because they're just network shares with write access.
After: Immutable, time-locked storage that attackers cannot modify:
async def upload_immutable(path: Path, bucket: str, key: str, days: int):
"""Upload with governance lock - cannot be deleted for specified days."""
checksum = await calculate_sha256(path)
await s3_client.put_object(
Bucket=bucket, Key=key, Body=path.read_bytes(),
ChecksumSHA256=checksum,
ObjectLockMode="GOVERNANCE",
ObjectLockRetainUntilDate=datetime.utcnow() + timedelta(days=days)
)
Impact: Backups become truly immutable and air-gapped from attackers.
mkdir backup-service && cd backup-service
python -m venv venv && source venv/bin/activate
pip install typer boto3 azure-storage-blob structlog pytest-asyncio
.cursor/rules filemain.py as your CLI entry pointThe rules guide you to create pluggable storage backends:
# src/backends/protocol.py
class StorageBackend(Protocol):
async def upload(self, local_path: Path, remote_key: str) -> str:
"""Upload file and return checksum."""
...
async def download(self, remote_key: str, local_path: Path) -> None:
"""Download and verify checksum."""
...
The rules emphasize testing because backups without verified restores are worthless:
pytest tests/ --cov=src --cov-report=html
# Achieve 90%+ coverage with mocked cloud services
Your backup and restore systems become the foundation that lets you sleep well at night, knowing your organization can survive any disaster. These rules don't just help you write backup scripts—they help you build the infrastructure that keeps businesses running when everything else fails.
You are an expert in Python, Bash, PowerShell, AWS S3, Azure Blob Storage, Veeam Backup & Replication, Acronis Cyber Protect, Linux/Windows filesystems, and containerised workloads.
Key Principles
- Follow the 3-2-1 rule: 3 copies, 2 media types, 1 off-site/immutable.
- Everything is code: Infrastructure-as-Code (IaC) for storage, Policy-as-Code for retention, Backup-as-Code for jobs.
- Automate end-to-end (scheduling, validation, reporting) to remove human error.
- Encrypt in transit (TLS1.2+) and at rest (AES-256 or KMS-managed keys).
- Define and document RTO/RPO; implement monitoring to alert on SLA breaches.
- Treat restores as the primary objective: test restores weekly; record metrics.
- Prefer incremental-forever with periodic synthetic fulls to optimise bandwidth.
- Use immutable, air-gapped storage (e.g., S3 Object Lock, Azure Immutable Blob) for ransomware defence.
Python
- Use Python ≥3.10. Enable `from __future__ import annotations` and `typing` for clarity.
- Follow PEP-8 plus:
• snake_case for variables/functions, PascalCase for classes, UPPER_SNAKE for constants.
• Place public CLI entry point in `main.py`; logic lives in `/src` packages.
- Use `argparse` or `typer` to expose backup/restore CLI commands. Example:
```python
@app.command()
def backup(path: Path, target: str, incremental: bool = True):
"""Back up <path> to <target> (s3://, azure://, file://)."""
```
- Always stream file reads (chunk_size ≤16 MiB) to avoid RAM spikes.
- Interface with cloud via official SDKs (`boto3`, `azure-storage-blob`). Abstract to `StorageBackend` protocol for pluggability.
- Calculate and persist SHA-256 checksums after upload; verify during restore.
- Use `asyncio` + `aiofiles` for parallel uploads/downloads; cap concurrency via `asyncio.Semaphore(cpu_count*4)`.
- Logging: use `structlog`; emit JSON lines for centralised ingestion (ELK/CloudWatch/Log Analytics).
- Tests: employ `pytest`, `pytest-asyncio`, and `moto`/`azurite` for cloud mocks. Achieve ≥90 % coverage.
Error Handling & Validation
- Validate CLI arguments early (destination reachability, free space, credentials).
- Wrap every transfer in `try/except Exception as exc`; classify errors:
• `TransientError` (network timeout) – retry with exponential backoff (max 5 attempts).
• `FatalError` (403, checksum mismatch) – abort job, raise.
- Use context managers to ensure file handles are closed even on failure.
- Abort restore if checksum mismatch after download; do NOT auto-overwrite corrupted local data.
- Surface human-readable errors plus machine parseable codes for orchestration pipelines.
Framework-Specific Rules
Veeam Backup & Replication
- Define jobs via PowerShell module; store scripts in VCS.
- Enable "SureBackup" automatic restore verification nightly; fail pipeline if any VM test boot fails.
- Activate GFS retention (Weekly, Monthly, Yearly) and Hardened Repository immutability (min 7 days).
Acronis Cyber Protect
- Use policy templates stored as JSON; apply via REST API.
- Enable advanced anti-malware scanning on backup archives.
- Tag backups with `prod` / `test` metadata; purge test after 30 days via automated cleanup.
Additional Sections
Testing
- Perform quarterly full-scale DR test: restore entire stack into isolated VPC; measure RTO vs target.
- Automate weekly sample restore (1 random VM, 5 random files).
Performance
- Rotate incremental chain: synthetic full every 30 increments to avoid long restore chains.
- Use multi-part upload (5 – 64 MiB parts) for S3 to maximise throughput.
Security & Compliance
- Store credentials only in Secrets Manager/Azure Key Vault; never hard-code.
- Enable MFA-Delete (S3) or Immutable Blob (Azure) on off-site buckets.
- Log every backup and restore operation with user identity and IP for audit.
Monitoring & Reporting
- Push metrics (duration, bytes transferred, success flag) to Prometheus; export Grafana dashboards.
- Notify via Slack/Teams on failure or SLA breach using webhooks.
Documentation
- Maintain `/docs/runbooks/backup-restore.md` with step-by-step restore procedures and screenshots.
- Embed version table mapping schema versions to backup client versions.
Directory Structure Example
```
backup-service/
├── main.py # Typer CLI entry
├── src/
│ ├── backends/
│ │ ├── s3.py
│ │ ├── azure.py
│ │ └── local.py
│ ├── models.py # pydantic data models
│ ├── orchestrator.py # schedules, retention logic
│ └── utils.py
├── tests/
├── scripts/ # Veeam / Acronis scripts
└── docs/
```
Common Pitfalls
- Skipping restore verification – always test.
- Long incremental chains (>60) – causes slow restores; plan synthetic fulls.
- Overlooking permissions on restored files – preserve ACLs/NTFS/S3 metadata during backup.
- Assuming region durability – replicate off-region.
Ready-to-Use Snippet: S3 Immutable Upload
```python
async def upload_immutable(path: Path, bucket: str, key: str, days: int) -> None:
s3 = aioboto3.client("s3", endpoint_url=os.getenv("S3_URL"))
with path.open("rb") as file:
checksum = hashlib.sha256(file.read()).hexdigest()
file.seek(0)
await s3.put_object(
Bucket=bucket,
Key=key,
Body=file,
ChecksumSHA256=checksum,
ObjectLockMode="GOVERNANCE",
ObjectLockRetainUntilDate=datetime.utcnow() + timedelta(days=days),
)
```
Follow these rules to deliver resilient, secure, and auditable backup & restore solutions.