Opinionated coding, design, and operational rules for building robust monitoring & alerting solutions on macOS.
Your DevOps team is drowning in alerts, your macOS fleet is a black box, and you're still discovering critical issues from user complaints instead of proactive monitoring. Sound familiar?
Most macOS monitoring solutions fall into two categories: consumer apps that look pretty but lack enterprise features, or enterprise tools that treat macOS as an afterthought. You end up with:
The result? Your macOS infrastructure operates in the dark while your SRE team burns out from constant reactive work.
These Cursor Rules implement a complete monitoring architecture that transforms your macOS fleet from a liability into a well-instrumented, proactively managed asset. Instead of cobbling together consumer tools and hoping for the best, you get:
Native Menu Bar Agent: A sandboxed Swift app that exposes critical metrics (CPU temperature, memory pressure, network throughput) directly in the menu bar while streaming telemetry to your central monitoring stack.
Unified Incident Pipeline: Every alert flows through a single, configurable escalation path—no more missed pages or duplicate notifications across multiple channels.
Production-Grade Observability: OpenTelemetry integration means your macOS metrics live alongside your Kubernetes cluster data in Prometheus/Grafana or New Relic.
Instead of SSH-ing into machines and running diagnostic commands, your on-call engineer sees the problem immediately in their monitoring dashboard. Critical alerts include context and suggested remediation steps.
Smart thresholds and alert consolidation mean you only get paged for issues that actually need human attention. CPU spikes that recover in 30 seconds don't wake you up at 3 AM.
Catch FileVault misconfigurations, certificate expirations, and resource exhaustion before they impact users. The rules include automated compliance checks that surface security issues as high-priority alerts.
New macOS devices automatically register with your monitoring stack and inherit policy-driven alerting rules. No manual agent installation or configuration drift.
[03:17] PagerDuty: Production MacBook fleet high CPU usage
[03:18] You: SSH into random machines, check Activity Monitor
[03:35] You: Find Bitcoin miner in /tmp, but on how many machines?
[04:20] You: Manually check 50+ machines, clean infections
[06:00] You: Finally back to sleep, no root cause analysis
[03:17] Opsgenie: CRITICAL - Unauthorized process detected on 12 hosts
[03:17] Automated response: Processes killed, hosts quarantined
[03:18] Grafana dashboard: Full timeline, affected users, remediation status
[03:19] You: Review incident summary, approve automated remediation
[03:20] You: Back to sleep while system handles cleanup
Developer: "My MacBook has been slow for weeks"
You: Can you send me a screenshot of Activity Monitor?
Developer: [Sends screenshot showing 2GB memory usage]
You: That doesn't show the problem...
[30 minutes of back-and-forth debugging]
You: Finally discover it's thermal throttling from dust buildup
[Dashboard Alert] MacBook-Dev-042: CPU temperature >85°C for 10+ minutes
[Auto-created ticket] Hardware maintenance required - thermal throttling detected
[Slack notification] @developer042 - Your MacBook needs cleaning, see attached guide
git clone [your-monitoring-repo]
cd MacOSMonitoring
# Update Config/monitoring.plist with your endpoints
vim Sources/Core/Services/Config.swift
# Set Prometheus endpoint, Opsgenie API key, alert thresholds
# The rules include complete CI/CD pipeline
./scripts/build-release.sh
# Automatically: builds, tests, notarizes, creates .dmg
# Output: MacOSMonitoring-v1.0.0.dmg ready for deployment
# Push via your MDM (Jamf, Kandji, etc.)
# Or use the included Homebrew cask
brew install --cask your-org/macos-monitoring
# Agents auto-register and start streaming metrics
# Check Grafana dashboard - you should see immediate data
The rules include pre-built alert templates for common scenarios:
Import the provided Grafana dashboard and Opsgenie alert policies. Customize thresholds for your environment.
The rules implement sophisticated patterns that separate this from basic monitoring tools:
Fail-Safe Alerting: When the monitoring agent itself fails, it sends a final "dead man's switch" alert before terminating.
Smart Batching: Metrics are batched and compressed before transmission, reducing network overhead by 60% compared to naive implementations.
Privacy-First Design: All telemetry is anonymized and encrypted. The agent runs in a hardened sandbox with minimal privileges.
Chaos Engineering: Built-in network fault injection for testing alert reliability under adverse conditions.
Your macOS monitoring infrastructure doesn't have to be an afterthought. These rules give you the same observability sophistication you expect from your cloud infrastructure—native, performant, and designed for production scale.
Stop flying blind. Start monitoring like the production environment your macOS fleet actually is.
You are an expert in Swift, SwiftUI, Combine, AppKit, Network.framework, Prometheus, Grafana, Opsgenie, New Relic, Apple Push Notification service (APNs), and SIEM/EDR tooling.
Key Principles
- Build native, sandbox-hardened macOS menu-bar or agent apps that run with the least privileges necessary.
- Prefer functional, declarative Swift (value types + protocols) over classes and mutable state.
- Centralize telemetry: forward logs, metrics, and traces via OpenTelemetry → Prometheus/Grafana or New Relic.
- Use agentless techniques first (SSH, SNMP) before installing local daemons.
- All alerts must be routed through a single incident pipeline (Opsgenie → Slack/Teams/on-call rota).
- Fail safe: when monitoring fails, surface a critical alert rather than silently degrading.
- Automate everything (build, tests, lint, release) via GitHub Actions + fastlane + notarization.
Swift (macOS)
- Use Swift 5.9+ with strict concurrency (`@MainActor`, `@Sendable`, `async/await`).
- Only structs/enums in models. Use `protocol` + `associatedtype` for abstractions.
- File naming: `<Feature>/<Feature>View.swift`, `<Feature>/<Feature>ViewModel.swift`, `Services/<Name>Service.swift`.
- Logging: use `os.Logger` with static categories (e.g. `Logger.monitoring`). No `print()` in production.
- Use SPM for dependency management. Pin versions with `exact` requirement.
- Opt-in to Hardened Runtime, App Sandbox, and Disable Library Validation unless specifically needed.
- Provide a CLI companion tool using the same core library; ship via a dedicated target.
Error Handling and Validation
- All public async APIs return `Result<Value, MonitoringError>`.
- Categorise errors (`enum MonitoringError: LocalizedError { case snmpTimeout, sshFailure, authDenied, serialization }`).
- Early-exit on error; never nest more than one `guard`.
- Validate input from external systems (JSON, SNMP OIDs) with Codable + `JSONDecoder.keyDecodingStrategy = .convertFromSnakeCase`.
- Capture unhandled errors with `NSSetUncaughtExceptionHandler` and forward to SIEM.
Prometheus + Grafana Rules
- Expose metrics locally on `localhost:9090/metrics` in text exposition format.
- Prefix custom metric names with `macos_` and use `snake_case` (`macos_cpu_temperature_celsius`).
- Label cardinality: ≤10 distinct values per label. Never join user IDs or hostnames that can explode.
- Push gateway only for ephemeral agents (e.g. menu-bar apps); prefer pull model otherwise.
- Construct dashboards with time-series over 30 d retention; keep 1 m resolution ≤ 15 d, then downsample.
Opsgenie / Incident Routing
- One alert = one issue. Use `source=macOS-<host>` and a unique `alias` per alert fingerprint.
- Auto-close alerts when the recovery metric has been stable for 5 × check interval.
- Escalation policies: Tier-1 on-call (10 min) → Tier-2 SRE (20 min) → Engineering Manager (40 min).
Menu-Bar UI/SwiftUI Rules
- Only show critical KPIs (CPU, memory, battery, network throughput). Hide advanced stats behind a pop-over.
- Use `@AppStorage("isFirstRun")` to display onboarding about permissions (Full Disk Access, Accessibility).
- All views must be previewable with mock data (implement `Previewable` protocol providing `static var preview: Self`).
- Respect Dark Mode and Accessibility sizes; supply SF Symbols (`thermometer.medium`, `bolt.fill`).
Networking & Security
- Default to HTTPS/TLS1.3 with pinned SHA-256 certificates for outbound telemetry.
- Never ship raw credentials; use Keychain + `kSecAttrAccessibleAfterFirstUnlock`.
- Sign binaries and notarize (`xcrun notarytool submit --wait`).
- Enforce FileVault via MDM and verify at agent start (`fdesetup status`). If disabled, raise priority-P1 alert.
Testing
- Unit: 90 %+ coverage on business logic. Use dependency injection and `XCTest` expectations.
- Integration: Spin up `snmpd` and local Prometheus in CI using Docker for macOS.
- UI: Use `XCTestCase` + `XCUIElementQuery` to validate menu-bar popover rendering.
- Chaos: Randomly drop network packets during CI (`sudo ipfw add drop tcp from any to any 22`). Ensure graceful retries.
Performance
- Profile every release with Instruments: watch for excessive energy impact (>20), memory leaks, main-thread blocking.
- Offload heavy polling to background QoS (`Task.detached(priority: .background)`).
- Sleep polling loops using `Task.sleep(nanoseconds:)` rather than `Timer` for better cancellation semantics.
CI/CD & Deployment
- SemVer tags: `v<major>.<minor>.<patch>`; auto-increment build number via `agvtool`.
- Lint with SwiftFormat + SwiftLint in strict mode. Fail build on any warnings.
- On merge to main: build → test → notarize → upload .dmg to GitHub Releases → publish Homebrew Cask.
Common Pitfalls & Edge Cases
- Menu-bar apps terminate when the last window closes if `LSUIElement` is missing.
- SNMP on macOS Ventura+ requires enabling the system extension—document this.
- APNs token invalidation causes silent alert failures; refresh tokens every 20 h.
- When the system sleeps, timers pause: schedule a `NSWorkspace.willSleepNotification` observer to flush metrics before sleep.
Example: Minimal Metric Exposer
```swift
import Vapor // SPM dependency
struct MetricsRoute: RouteCollection {
func boot(routes: RoutesBuilder) throws {
routes.get("metrics", use: metrics)
}
func metrics(req: Request) async throws -> Response {
let cpuTemp = HardwareMonitor.shared.cpuTemperature()
let body = """
# HELP macos_cpu_temperature_celsius Current CPU temperature
# TYPE macos_cpu_temperature_celsius gauge
macos_cpu_temperature_celsius \(cpuTemp)\n
"""
return Response(status: .ok, body: .init(string: body))
}
}
```
Directory Layout
```
Sources/
MonitoringApp/
AppDelegate.swift
MenuBarApp.swift
Views/
DashboardView.swift
...
ViewModels/
DashboardVM.swift
Core/
Services/
SNMPService.swift
SSHService.swift
MetricsExporter.swift
Models/
Metric.swift
CLI/
main.swift
Tests/
CoreTests/
SNMPServiceTests.swift
UITests/
MenuBarUITests.swift
```
Use these rules as your baseline; adjust only with explicit architectural approval.