Comprehensive Rules for designing, implementing, and automating tests in Message Communication Protocol (MCP) architectures.
You're already using MCP architectures in production. Your Universal Connectors and Protocol Translators are handling real traffic. But here's the brutal truth: most MCP implementations are running on hope and prayer when it comes to testing.
Message communication protocols demand surgical precision. One malformed frame, one capability negotiation failure, one timeout edge case, and your entire distributed system becomes a house of cards. Yet most teams are still testing MCP architectures like they're simple REST APIs.
Your current testing approach is silently bleeding productivity:
You need a systematic approach that treats MCP architectures as the complex, stateful, distributed systems they are.
These Cursor Rules transform your MCP testing from reactive debugging to proactive quality assurance. Instead of discovering protocol violations in production, you catch them at compile time. Instead of brittle integration tests, you get deterministic, isolated validation of every protocol interaction.
The Core Philosophy: Test the contract first, implementation second. Every MCP request/response gets a versioned, type-safe schema before you write a single line of business logic.
# Before: Hope your frames are valid
@pytest.mark.asyncio
async def test_capability_exchange():
response = await client.send_capability_request()
assert response # Pray it works
# After: Validate every boundary
@pytest.mark.asyncio
async def test_when_capability_request_sent_then_valid_schema_returned():
request = CapabilityRequest.model_validate({
"method": "capabilities/list",
"correlation_id": "test-123"
})
response = await connector.send(request)
# Schema validation at the boundary
validated_response = CapabilityResponse.model_validate(response)
assert validated_response.capabilities == ["resources", "tools"]
assert validated_response.correlation_id == "test-123"
Pure function validation with zero external dependencies. Your message mappers, serializers, and protocol parsers get bulletproof coverage.
Generated from your schemas, these validate that your Protocol Translator speaks the same language as your Universal Connector - before integration.
Real containers, real protocol handshakes. Your docker-compose.test.yml spins up the full stack: Universal Connector, Protocol Translator, Redis, Postgres. Health checks complete in under 250ms.
Full user workflows with browser automation. Intercept MCP frames with page.route('mcp://**') and validate schema conformance in real-time.
Toxiproxy injects network failures. Locust simulates 95th percentile load. Your system proves it can handle the worst-case scenarios.
Before: Write code → Deploy → Hope it works → Debug production issues → Hotfix → Repeat
After: Define contract → Generate tests → Implement → Validate locally → Deploy with confidence
Here's what your daily workflow looks like:
# Your colleague changed the capability negotiation logic
git pull origin main
# Contract tests catch the breaking change immediately
pytest tests/contract/ --mcp-endpoint=localhost:8080
# FAIL: CapabilityResponse missing 'protocol_version' field
# Fix the contract, regenerate tests, implement - all before lunch
# Your PR gets automatic validation
@pytest.fixture(scope="session")
def universal_connector():
"""Spin up containerized Universal Connector for integration tests"""
with docker_compose_up("test") as containers:
yield containers["universal-connector"]
def test_when_invalid_capability_then_raises_capabilityerror(universal_connector):
"""Contract violation surfaces immediately with actionable context"""
invalid_request = {"method": "invalid_capability"}
with pytest.raises(CapabilityError) as exc_info:
await universal_connector.send(invalid_request)
assert exc_info.value.error_code == "CAPABILITY_NOT_SUPPORTED"
assert "invalid_capability" in str(exc_info.value)
Your Prometheus dashboard shows green across all Protocol Translators. Why? Because your chaos tests already proved your system handles:
pip install pytest pytest-asyncio pydantic mypy
npm install ts-jest zod @playwright/test
Copy these rules into your .cursorrules file. Your AI assistant now understands MCP testing patterns and will generate contract-first tests automatically.
# docker-compose.test.yml
version: '3.8'
services:
universal-connector:
build: ./connector
environment:
- MCP_ENDPOINT=protocol-translator:8080
depends_on:
- protocol-translator
protocol-translator:
build: ./translator
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/healthz"]
interval: 5s
timeout: 250ms
# tests/contract/test_capability_exchange.py
def test_capability_request_schema():
"""Contract test: validate request schema"""
request_data = {
"method": "capabilities/list",
"correlation_id": "test-correlation-123"
}
# This will fail if schema changes without migration
request = CapabilityRequest.model_validate(request_data)
assert request.method == "capabilities/list"
def test_capability_response_schema():
"""Contract test: validate response schema"""
response_data = {
"capabilities": ["resources", "tools"],
"protocol_version": "1.0.0",
"correlation_id": "test-correlation-123"
}
# Schema validation catches breaking changes
response = CapabilityResponse.model_validate(response_data)
assert len(response.capabilities) > 0
# .github/workflows/mcp-test.yml
name: MCP Test Pipeline
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Test Pipeline
run: |
# Contract-first pipeline
pytest tests/contract/ -v
docker compose --profile test up -d
pytest tests/integration/ -v
pytest tests/e2e/ -v --video=retain-on-failure
Sarah (Backend Engineer): "I used to spend half my day debugging MCP handshake failures. Now the contract tests catch schema mismatches before I even commit."
Mike (DevOps Engineer): "Our deployment rollbacks dropped 80%. The chaos tests prove our resilience before we hit production traffic."
Lisa (QA Lead): "Finally, test evidence instead of manual protocol validation. Our release cycles went from 2 weeks to same-day deploys."
@pytest.mark.chaos
def test_protocol_translator_survives_network_partition():
"""Verify auto-recovery after network failure"""
with toxiproxy.network_partition(duration=30):
# System should recover within 45 seconds
await wait_for_health_check(max_wait=45)
# Assert distributed trace shows recovery span
trace = get_trace_by_correlation_id("chaos-test-123")
assert trace.status == "OK"
def test_95th_percentile_under_750ms():
"""SLA enforcement via automated testing"""
with locust_load_test(users=100, spawn_rate=10):
metrics = await collect_prometheus_metrics()
assert metrics.p95_response_time <= 750 # ms
assert metrics.error_rate <= 0.01 # 1%
def test_secrets_never_logged():
"""Prevent credential leakage in protocol logs"""
with log_capture() as logs:
await connector.authenticate(api_key="secret-key-123")
# Regex detector ensures no secrets in logs
for log_line in logs:
assert not re.search(r'secret-key-\d+', log_line)
Stop accepting brittle protocol implementations. Stop debugging MCP failures in production. Stop worrying about capability negotiation edge cases.
Your MCP architecture deserves systematic, contract-first testing that scales with your team and catches issues before they reach users.
Next Steps:
The difference between teams that struggle with MCP reliability and teams that deploy with confidence isn't talent - it's having the right testing strategy. These rules give you that strategy, battle-tested and ready for production workloads.
Your future self will thank you when the next protocol change deploys flawlessly instead of breaking half your downstream services.
You are an expert in MCP (Message Communication Protocol) architectures, Python 3.11+, TypeScript 5+/JavaScript (ES2023), Go 1.21+, Docker, Google Cloud Run, Prometheus + Grafana, OpenTelemetry, Locust, and Apache JMeter.
Key Principles
- Contract-first: define versioned, type-safe schemas for every MCP request/response before coding.
- Validate all inputs & outputs at every boundary; never trust upstream payloads.
- Build an automation-first test pyramid: Unit ➜ Contract ➜ Integration ➜ End-to-End ➜ Chaos/Perf; each layer must run independently.
- Keep tests isolated, deterministic, and idempotent; state is reset on every run.
- Favour early returns and guard clauses to keep “happy path” last.
- Name tests using GIVEN/WHEN/THEN semantics (e.g., `test_when_invalid_capability_then_raises_capabilityerror`).
- Fail fast & loud: surface actionable errors with context and remediation hints.
- Instrument every component with structured logs, metrics, and distributed traces that tests can assert against.
- Continuous feedback: integrate all suites into CI/CD; block merges on red pipelines.
Python
- Use `pytest` for all new tests; migrate legacy `unittest` modules during refactors.
- Type-check with `mypy --strict`; every function must declare types.
- Model messages with `pydantic.BaseModel`; call `.model_validate()` on ingress; `.model_dump()` on egress.
- Prefer `pytest.fixture(scope="session")` for Universal Connector & Protocol Translator containers; parameterise with env vars.
- Assertions: always provide a custom message, e.g. `assert frame.opcode == 0x03, "Opcode 0x03 (PING_ACK) expected"`.
- Mark async flows with `@pytest.mark.asyncio`; never block the event loop (`await asyncio.to_thread()` for CPU work).
TypeScript / JavaScript
- Use `ts-jest` with `isolatedModules: true`; compile target ES2023.
- Validate payloads with `zod`; throw `ZodError` immediately on failure.
- Keep tests flat: no nested `describe`; one protocol scenario per file.
- Avoid `any`; use `unknown` + refinement via `zod`.
Go
- Table-driven tests in `_test.go` files; each case includes `name`, `in`, `want`, `wantErr`.
- Statically verify message structs with `go vet` and `go test -race`.
- Use `context.Context` everywhere; deadline & cancel propagation is asserted in tests.
Error Handling and Validation
- All components must raise/return one of the canonical MCP errors: `ValidationError`, `CapabilityError`, `TimeoutError`, `UpstreamError`, `InternalError`.
- Validate at ingress: size limits, schema, capability negotiation, authentication.
- For integration tests, assert both the error type and the `error_code` field in the MCP frame.
- Use a global exception catcher that maps internal errors to user-friendly codes; tests must confirm mapping.
- Always include `correlation_id` in logs and error frames; test that it propagates end-to-end.
Framework-Specific Rules
Playwright MCP
- Always run browsers in headless =true, devtools =false for CI speed.
- Intercept MCP frames with `page.route('mcp://**', handler)` and assert schema conformance via `zod`.
- Use `test.use({ storageState: 'e2e/.auth.json' })` to avoid re-auth in every test.
LambdaTest Automation MCP Server
- Configure via env vars (`LT_USERNAME`, `LT_ACCESS_KEY`); never hard-code.
- Tag every run with `build=<<commit-sha>>`; tests assert the build tag via LambdaTest API.
- Clean up orphan sessions in an `afterAll` hook.
pytest
- Keep `conftest.py` lean; no implicit fixtures.
- Implement `pytest_addoption` to allow `--mcp-endpoint` CLI switch.
- Mark long-running tests with `@pytest.mark.slow`; exclude from default run.
Docker & Cloud Run
- Provide `docker-compose.test.yml` bringing up Universal Connector, Protocol Translator, Redis, Postgres.
- Health-check endpoints (`/healthz`) must respond within 250 ms; integration tests assert this.
- CI: `docker compose --profile test up -d && pytest -q`.
Additional Sections
Testing Strategy
- Unit: 90%+ statement coverage on pure functions & message mappers.
- Contract: Generated from OpenAPI/Protobuf; use Pydantic schema tests or `jest-openapi`.
- Integration: Spin up real containers; assert full protocol handshake (CAPS, AUTH, HEARTBEAT) and error paths.
- End-to-End: Via Playwright MCP simulating a full user workflow; video & trace artifacts saved.
- Chaos/Resilience: Use Toxiproxy to inject latency, packet loss, abrupt disconnects; verify auto-recovery & retries.
Performance & Load
- Locust scripts simulate 95th percentile throughput; threshold ≤750 ms P95.
- Apache JMeter for spike tests; Protocol Translator must sustain 5× baseline for 10 min.
- Export Prometheus metrics; Grafana dashboard IDs `12345-uc` (Universal Connector) & `67890-pt` (Protocol Translator) must stay green.
Security
- Run `pip-audit`, `npm audit`, and `govulncheck` in pipeline.
- Integration tests assert that secrets are never logged; use regex detector.
CI/CD
- GitHub Actions workflow: lint ➜ unit ➜ contract ➜ integration (docker) ➜ e2e (cloud-run) ➜ perf-smoke (> PR) ➜ deploy.
- Require branch protection: tests must pass on `push` and `pull_request`.
Observability
- Use OpenTelemetry SDK (v1) with B3 propagation.
- Integration tests assert presence of root span `mcp.session` with status `OK`.
Common Pitfalls & Guards
- "Flaky ≈ Fail" — quarantine flaky tests within 24 h or block merges.
- Never share mutable global state across tests; use fixture scopes and container isolation.
- Always close WebSocket connections in `finally` blocks; tests assert `CLOSE` opcode.