Comprehensive Rules for designing, implementing, and maintaining high-performance, well-governed NoSQL data models.
Most backend developers are still modeling NoSQL databases like relational ones – and wondering why their apps hit performance walls at scale. These Cursor Rules implement battle-tested data modeling patterns used by engineering teams at companies processing millions of daily transactions.
Your application architecture might be cloud-native and your APIs might be blazingly fast, but if your data model fights against your database's strengths, you're building on quicksand. Here's what's probably happening in your codebase right now:
The Relational Mindset Trap: You're normalizing everything, creating excessive references, and forcing expensive joins across collections – exactly what NoSQL databases weren't designed for.
Hot Partition Hell: Your partition keys cluster around obvious patterns (like timestamps), creating bottlenecks that no amount of horizontal scaling can fix.
Query Pattern Mismatch: Your schema looks clean in MongoDB Compass, but your most critical user flows require multiple round trips and complex aggregations.
Consistency Confusion: You're using strong consistency everywhere because "data integrity," not realizing you're sacrificing availability and performance for use cases that don't need it.
These Cursor Rules transform how you approach NoSQL data modeling by implementing query-first design patterns that align with how your application actually accesses data. Instead of fighting your database, you'll work with its strengths.
Query-Pattern Driven Design: Model your data around actual read/write patterns, not theoretical relationships. The rules guide you to analyze your access patterns first, then structure your schema to serve them efficiently.
Smart Embedding vs. Referencing: Get clear, implementable guidelines for when to embed data (1:1 relationships queried together) versus when to reference (unbounded or independently updated datasets).
Production-Grade TypeScript Integration: Generate type-safe interfaces that enforce your data model invariants at compile time, preventing schema drift and runtime errors.
// This interface enforces partition key patterns and prevents schema drift
export interface OrderDoc {
pk: `ORDER#${string}`; // Template literal enforces key pattern
sk: `METADATA#${string}`; // Sort key for DynamoDB-style access
customerId: string;
status: 'PENDING' | 'PAID' | 'SHIPPED' | 'CLOSED'; // Enum constraints
createdAt: ISODateString;
items: ReadonlyArray<OrderLineItem>; // Immutable by default
}
Eliminate Cross-Collection Queries: By correctly modeling relationships, you'll reduce complex aggregation pipelines and multi-step queries by up to 80%. Your API response times drop from hundreds of milliseconds to sub-50ms.
Prevent Hot Partition Issues: The rules include specific patterns for partition key design that ensure even data distribution. No more mysterious performance degradation as your dataset grows.
Reduce Development Debugging Time: Type-safe schemas with runtime validation catch data model violations at development time, not in production. You'll spend less time debugging mysterious data inconsistencies.
Accelerate Feature Development: Clear patterns for common scenarios (user profiles, order systems, time-series data) mean you're not reinventing data access patterns for every new feature.
// Multiple queries for a simple user dashboard
const user = await users.findOne({ _id: userId });
const orders = await orders.find({ userId: userId }).sort({ createdAt: -1 }).limit(10);
const preferences = await userPreferences.findOne({ userId: userId });
const loyaltyPoints = await loyaltyAccounts.findOne({ userId: userId });
// 4 database round trips, complex error handling, potential consistency issues
// Single query with embedded data for common access patterns
const userDashboard = await users.findOne({
pk: `USER#${userId}`,
sk: 'PROFILE'
});
// Contains embedded recent orders, preferences, and loyalty data
// 1 database round trip, type-safe, consistent view
Multi-Collection Approach: 4 database queries, 200-400ms response time, complex error handling for partial failures, eventual consistency issues.
Embedded Document Approach: 1 database query, 15-30ms response time, atomic consistency, simpler error handling.
Instead of fighting MongoDB's document size limits with time-series data:
// Anti-pattern: Growing documents that hit 16MB limits
export interface UserActivityDoc {
userId: string;
events: Event[]; // This grows unbounded - bad!
}
The rules guide you to bucket patterns:
// Pattern: Time-bucketed documents with predictable size
export interface UserActivityBucketDoc {
pk: `USER_ACTIVITY#${string}`;
sk: `BUCKET#${string}`; // BUCKET#2024-01-15-14 (hourly buckets)
userId: string;
bucketStart: ISODateString;
events: Event[]; // Bounded to ~1000 events per bucket
eventCount: number;
}
This pattern prevents document growth issues while maintaining query performance for time-range queries.
npm install zod @types/node
# For your specific NoSQL driver (MongoDB, DynamoDB, etc.)
npm install mongodb @aws-sdk/client-dynamodb
Create /src/data/models/OrderDoc.ts:
import { z } from 'zod';
export const OrderDocSchema = z.object({
pk: z.string().regex(/^ORDER#[a-zA-Z0-9-]+$/),
sk: z.string().regex(/^METADATA#[a-zA-Z0-9-]+$/),
customerId: z.string(),
status: z.enum(['PENDING', 'PAID', 'SHIPPED', 'CLOSED']),
createdAt: z.string().datetime(),
items: z.array(OrderLineItemSchema).readonly(),
});
export type OrderDoc = z.infer<typeof OrderDocSchema>;
Create /src/data/orders.ts:
import { OrderDoc, OrderDocSchema } from './models/OrderDoc';
export async function createOrder(doc: Omit<OrderDoc, 'pk' | 'sk'>): Promise<OrderDoc> {
const orderDoc: OrderDoc = {
pk: `ORDER#${generateOrderId()}`,
sk: `METADATA#${doc.customerId}`,
...doc,
};
// Runtime validation
const validated = OrderDocSchema.parse(orderDoc);
// Database write with conditional check
await ordersCollection.insertOne(validated, {
writeConcern: { w: 'majority' }
});
return validated;
}
Copy the complete ruleset into your Cursor Rules configuration. The rules will immediately start guiding your code completion and suggestions toward these patterns.
Week 1: You'll notice fewer runtime data validation errors as the TypeScript interfaces catch schema mismatches during development.
Week 2: API response times improve by 40-60% as you eliminate unnecessary queries and optimize access patterns.
Month 1: Your team stops debating data modeling decisions because the rules provide clear, proven patterns for common scenarios.
Month 3: New feature development accelerates as you build on established data access patterns instead of architecting from scratch each time.
Quantifiable Improvements:
These rules don't just change how you write code – they change how you think about data in distributed systems. You'll start designing schemas that serve your application's actual needs instead of following outdated relational patterns that work against NoSQL databases' strengths.
The difference between developers who struggle with NoSQL performance and those who build scalable systems often comes down to understanding these modeling patterns. These Cursor Rules put that knowledge directly into your development workflow.
You are an expert in NoSQL data systems (MongoDB, DynamoDB, Cassandra, Couchbase, RedisJSON), TypeScript/Node.js, and cloud-native architectures.
Key Principles
- Model for read/write access patterns, not for storage resemblance.
- Prefer embedding for 1:1 or 1:n (bounded) relationships queried together; prefer referencing for n:n, unbounded, or independently updated datasets.
- Embrace strategic denormalization to minimise joins and network hops.
- Balance consistency, availability, and partition-tolerance per workload; use tunable consistency where supported.
- Treat the partition/shard key as a first-class design decision; ensure even data distribution and predictable routing.
- Use secondary indexes only when an access pattern cannot be satisfied by the primary key.
- Capture schema intent in code via TypeScript interfaces and JSON Schema; enforce with CI validation gates.
- Evolve schemas with immutable versioned records; never perform in-place destructive mutations.
TypeScript (Node.js)
- Represent documents with PaschalCase interfaces suffixed with `Doc`:
```ts
export interface OrderDoc {
pk: `ORDER#${string}`; // Partition Key pattern
sk: `METADATA#${string}`; // Sort Key pattern (DynamoDB style)
customerId: string;
status: 'PENDING' | 'PAID' | 'SHIPPED' | 'CLOSED';
createdAt: ISODateString;
items: ReadonlyArray<OrderLineItem>;
}
```
- Use template-literal types to codify key-pattern invariants.
- Keep embedded arrays < 10 k elements; otherwise split into child collections/table‐items.
- Expose all data-access utilities through pure functions in `/data/*` folders; no classes or ORM-style models.
- Use Zod or Yup for runtime validation; throw `DataValidationError` with context-rich payloads.
Error Handling and Validation
- Validate incoming DTOs at API boundaries; reject early.
- Verify partition-key invariants (length, charset, prefix) before issuing write commands.
- Wrap driver errors into domain-level errors (`DuplicateKeyError`, `ConditionalCheckFailedError`).
- Log validation and driver errors with a structured logger and correlation id.
- Apply idempotent write patterns (conditional put/update) to avoid duplicate creates on retries.
Framework-Specific Rules
MongoDB (native / Mongoose)
- Store documents ≤ 16 MB; for large binaries use GridFS or external object storage.
- Always create a compound index that satisfies the most critical query.
- Use `readConcern: 'majority'` + `writeConcern: { w: 'majority' }` for monetary workflows.
- Use change streams for event-driven integrations; resume tokens must be checkpointed.
DynamoDB (AWS SDK v3)
- Model all access patterns in a single table using composite keys + GSI/LSI instead of multiple tables.
- Prefix keys with entity tags (`USER#`, `ORDER#`) to avoid hot partitions.
- Never project large (> 400 KB) attributes into GSIs.
- Configure `ReturnValues: 'ALL_NEW'` in updates to keep code stateless.
Cassandra (DataStax driver)
- Define a clustering column that supports your primary sort requirement; avoid ALLOW FILTERING.
- Use `IF NOT EXISTS` & `lightweight transactions` sparingly—high latency.
- Size partitions to ≤ 100 MB on disk; larger partitions degrade read latencies.
Additional Sections
Testing
- Provide deterministic fixtures with JSON files stored under `/fixtures/{model}/{version}.json`.
- Use jest + `@aws/dynamodb-local` / `mongodb-memory-server` for integration tests.
- Validate eventual consistency with retry loops (exponential backoff ≤ 3 attempts).
Performance
- Monitor P99 latency, throughput, and hot-partition metrics; automate alarms.
- Run weekly index usage reports; drop unused secondary indexes.
- Benchmark critical queries with production-like dataset snapshots before roll-outs.
Security
- Never store secrets or PII in unencrypted attributes; use KMS or client-side crypto.
- Enforce least-privilege IAM policies; separate read-only and write roles.
- Enable at-rest encryption and TLS 1.2+ in transit for all clusters.
Migration & Versioning
- Employ forward-compatible writes: only add fields, never remove/rename directly.
- Use background scripts or lambda‐driven backfill jobs to migrate old documents.
- Keep schema version field (`_v`) in every record; update consumers before producers.
Observability
- Add OpenTelemetry tracing to every driver call with span attributes: `collection`, `operation`, `partitionKey`.
- Correlate driver metrics with API request ids for drill-down analysis.
Governance
- Schedule quarterly data-model reviews; include SRE, security, and product architects.
- Document every access pattern in `/docs/data-models/{model}.md` with ERD‐like diagrams.