Official Model Context Protocol (MCP) server that lets AI-agents query a DataHub instance for metadata, search, lineage traversal and SQL query history.
https://github.com/acryldata/mcp-server-datahubStop context-switching between your AI conversations and DataHub when you need metadata, lineage information, or dataset details. This official MCP server from Acryl Data connects your AI agents directly to your DataHub instance, whether you're running DataHub Core or using DataHub Cloud.
You know the drill: you're working on a data pipeline, discussing schema changes with Claude, or debugging lineage issues, and you constantly need to tab over to DataHub to grab URNs, check column definitions, or trace upstream dependencies. That friction kills your flow.
This MCP server eliminates those interruptions. Your AI agent can now search your data catalog, fetch schemas, traverse lineage graphs, and even pull SQL query history - all without you leaving the conversation.
Smart Search Across Everything: Search datasets, dashboards, charts, and any other entities with arbitrary filters. No more hunting through the DataHub UI when you need to find "all Snowflake tables with 'customer' in the name."
Complete Schema Context: Pull full column-level schemas for any dataset. Perfect for schema evolution discussions, data validation planning, or understanding data types when writing transformations.
Lineage Traversal: Navigate both upstream and downstream dependencies programmatically. Your AI can trace data flow from source to consumption, identify impact analysis scenarios, or help with root cause analysis.
SQL Query History: Access historical queries associated with datasets. Invaluable for understanding usage patterns, optimization opportunities, or seeing how other teams interact with specific tables.
Data Pipeline Development: "Show me the schema for the customer_events table and trace its downstream dependencies." Your agent fetches the schema, maps the lineage, and you can discuss transformation logic with full context.
Impact Analysis: "If I modify this field in the user_profiles dataset, what will break?" The agent traverses downstream lineage and identifies all affected dashboards, pipelines, and consumers.
Schema Discovery: "Find all tables in our data warehouse that contain PII fields." The agent searches across all datasets, examines schemas, and returns a comprehensive audit.
Query Optimization: "What are the most common queries run against our product_analytics table?" Access historical query patterns to inform indexing strategies or schema optimizations.
Works with both DataHub Core and DataHub Cloud - just provide your GMS URL and a personal access token. The server integrates seamlessly with Claude Desktop, Cursor, or any MCP-compatible client.
uvx mcp-server-datahub
Configure your environment variables:
DATAHUB_GMS_URL
: Your DataHub instance URLDATAHUB_GMS_TOKEN
: Your personal access tokenThat's it. Your AI agent now has the same metadata access you do, but programmatically accessible during your conversations.
Data context is everything when building pipelines, debugging issues, or planning schema changes. Instead of bouncing between tools to gather that context, you can have informed discussions about your data architecture with all the metadata right in the conversation.
This is particularly powerful for data engineers working on complex systems where understanding relationships between datasets, tracking schema evolution, and maintaining data quality requires constant reference to your catalog.
The server is officially maintained by Acryl Data, so you're getting first-class support for DataHub's full feature set through the MCP protocol.