Turn Your dbt Project Into an AI-Queryable Knowledge Base

Stop hunting through dbt docs and lineage graphs. This MCP server gives Claude, Cursor, and other AI assistants direct access to your dbt project metadata, so you can ask questions about your data models in natural language and get instant, accurate answers.

The Problem with dbt Documentation

You know the drill: you're debugging a data issue at 2 AM, trying to figure out which upstream model is causing problems. You're clicking through the dbt docs interface, tracing lineage manually, grep-ing through SQL files, and cross-referencing the catalog. What should take 30 seconds stretches into 20 minutes of hunting.

Or you're onboarding a new team member who keeps asking "Where does this column come from?" and "What models depend on this table?" - questions that require you to stop your work and become a human search engine for your own data warehouse.

Your dbt Project, AI-Accessible

This MCP server bridges that gap by exposing your dbt artifacts through the Model Context Protocol. Your AI assistant can now:

Search across your entire dbt project by model names, column names, or SQL code content
Trace column-level lineage from source to final model (and vice versa)
Find all upstream dependencies for any model when debugging data issues
Discover downstream impacts before making changes to existing models
Query node attributes including descriptions, tags, materialization settings, and compiled SQL

Real-World Usage Scenarios

Debugging Data Issues: "Which models feed data into the customer_metrics table?" Instead of clicking through lineage graphs, get an instant list of all upstream dependencies with their current status and last run times.

Impact Analysis: "What breaks if I change the schema of raw_orders?" Trace all downstream models that depend on specific columns before making breaking changes.

Code Discovery: "Find all models that calculate customer lifetime value." Search across compiled SQL to locate business logic scattered throughout your project.

Column Lineage Investigation: "Where does the total_revenue column in monthly_reports ultimately come from?" Trace back through all transformations to the original source tables.

New Team Member Onboarding: Instead of explaining your data architecture repeatedly, team members can ask the AI assistant directly about model relationships, business logic, and data flow.

Setup in Under 5 Minutes

The server reads your existing dbt artifacts - no additional setup required in your dbt project:

git clone https://github.com/mattijsdp/dbt-docs-mcp.git
cd dbt-docs-mcp
uv sync

Point it at your dbt artifacts and add the MCP configuration to your AI client:

{
  "mcpServers": {
    "DBT Docs MCP": {
      "command": "uv",
      "args": ["run", "--with", "networkx,mcp[cli],rapidfuzz,dbt-core,python-decouple,sqlglot,tqdm", "mcp", "run", "/path/to/dbt-docs-mcp/src/mcp_server.py"],
      "env": {
        "MANIFEST_PATH": "/path/to/your/target/manifest.json",
        "SCHEMA_MAPPING_PATH": "/path/to/schema_mapping.json",
        "MANIFEST_CL_PATH": "/path/to/manifest_column_lineage.json"
      }
    }
  }
}

Column-Level Lineage (Optional)

For column-level lineage tracing, run the included preprocessing script once:

python scripts/create_manifest_cl.py \
  --manifest-path /path/to/manifest.json \
  --catalog-path /path/to/catalog.json \
  --schema-mapping-path ./schema_mapping.json \
  --manifest-cl-path ./manifest_column_lineage.json

This parses your entire dbt project to build column dependency graphs. Depending on project size, this can take a while (potentially hours for large projects), but you only run it when your schema changes significantly.

Why This Beats Manual Documentation Hunting

Instead of context-switching between your IDE, dbt docs, and terminal to understand your data warehouse, you maintain flow state by asking natural language questions. Your AI assistant becomes your dbt project expert, giving you instant access to the institutional knowledge trapped in your manifest files.

For teams managing complex dbt projects, this eliminates the bottleneck of senior engineers constantly fielding "where does this data come from?" questions. Knowledge becomes self-service and searchable.

The server works with any MCP-compatible AI client, so you can integrate it into your existing development workflow without changing tools.