MCP server that gives an LLM full GUI control of the local device – screen capture/analysis, mouse & keyboard automation, scrolling and element-detection.
https://github.com/Mtehabsim/ScreenPilotStop writing brittle UI automation scripts. ScreenPilot gives your LLM direct control over your desktop—screen capture, mouse clicks, keyboard input, and visual element detection through a single MCP server.
You know the pain: fragile XPath selectors, timing issues, and automation scripts that break when the UI changes. Traditional tools like Selenium work great for web apps, but desktop automation still feels like you're fighting the OS rather than working with it.
ScreenPilot takes a different approach. Instead of trying to reverse-engineer application internals, it works exactly like a human would—by seeing the screen and interacting through standard mouse and keyboard actions.
Visual-First Automation: Your LLM can actually see what's happening on screen, not just guess based on DOM selectors or accessibility trees. It captures screenshots, analyzes them, and makes decisions based on visual feedback.
Natural Language Control: Instead of writing automation scripts, you describe what you want: "Find the Submit button and click it" or "Type my email address into the login field." The LLM handles the implementation details.
Cross-Application: Works with any desktop application—legacy enterprise software, games, native apps, or web browsers. If you can see it and click it, ScreenPilot can automate it.
UI Testing Without Brittle Scripts: Test your desktop applications by describing user workflows in natural language. The LLM adapts to minor UI changes that would break traditional automation.
Batch Processing Desktop Tasks: "Open these 50 images in Photoshop and apply the same filter to each one." The LLM handles the repetitive clicking and navigation while you focus on other work.
Legacy System Integration: Automate interactions with old enterprise applications that don't have APIs. The LLM can fill forms, extract data, and navigate through complex legacy interfaces.
Development Workflow Automation: "Set up my development environment—open VS Code, terminal, and browser, then arrange them in my preferred layout." Perfect for onboarding new team members or standardizing setups.
Works directly with Claude Desktop through MCP. Clone the repo, add it to your Claude config, and start automating immediately:
git clone https://github.com/Mtehabsim/ScreenPilot.git
cd ScreenPilot
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
Add to your Claude Desktop config and you're ready to go. No complex setup, no learning new APIs—just natural language automation that actually works.
ScreenPilot uses PyAutoGUI for cross-platform compatibility and OpenCV for computer vision. It's Python-based, so you can extend it or integrate it into your existing automation workflows.
The MCP protocol means it plays well with other tools in your development environment. Combine it with file system access, API calls, or code generation for complete workflow automation.
Repository: github.com/Mtehabsim/ScreenPilot
Note: This project is currently unlicensed—contact the maintainer if you plan to use it in production or commercial projects.