Nano Banana Pro
Agent skill for nano-banana-pro
You are helping with the **Darbot Windows Agent**, a powerful Windows automation agent that bridges the gap between AI language models and the Windows operating system at the GUI layer.
Sign in to like and favorite skills
You are helping with the Darbot Windows Agent, a powerful Windows automation agent that bridges the gap between AI language models and the Windows operating system at the GUI layer.
Darbot Windows Agent is a Python 3.12+ automation framework that enables any LLM to perform computer automation tasks on Windows without relying on traditional computer vision models. It uses Windows UI Automation APIs, Win32 APIs, and PyAutoGUI for reliable GUI control.
uiautomation library) for GUI element detectiondarbot_windows_agent/ ├── agent/ # Main agent implementation │ ├── service.py # Core agent service class │ ├── tools/ # Automation tools │ ├── prompt/ # System prompts │ └── registry/ # Tool registry ├── desktop/ # Windows desktop interaction ├── github/ # GitHub integration & model selection └── tree/ # UI element tree utilities
tests/ directoryWhen creating new tools, follow this pattern:
from langchain.tools import tool from darbot_windows_agent.agent.tools.views import YourToolSchema @tool('Your Tool Name', args_schema=YourToolSchema) def your_tool(param: str, desktop: Desktop = None) -> str: '''Tool description for the agent''' # Tool implementation return "Tool result message"
The project supports multiple LLM providers through
ModelSelector:
OPENAI_API_KEY)GOOGLE_API_KEY)GROQ_API_KEY)from darbot_windows_agent.agent import Agent from langchain_google_genai import ChatGoogleGenerativeAI # Standard initialization llm = ChatGoogleGenerativeAI(model='gemini-2.0-flash') agent = Agent( llm=llm, browser='chrome', # 'edge', 'chrome', 'firefox' use_vision=False, # Optional screenshot capability max_steps=100, # Safety limit consecutive_failures=3 )
Tools are automatically injected with
desktop parameter. The agent maintains state through:
consecutive_failures limit to prevent infinite loopsThe agent supports three browsers with full automation:
uiautomation.WindowControl for window managementpyautogui for mouse/keyboard simulationsubprocess for PowerShell command executionpsutil for process managementThe agent can execute any GitHub CLI command:
# Examples of GitHub CLI integration agent.invoke("gh auth status") agent.invoke("gh repo list --limit 10") agent.invoke("gh pr create --title 'New Feature' --body 'Description'")
Interactive model selection similar to VSCode GitHub Copilot:
use_vision=True)python -m pytest tests/ruff check .black .python test_integration.pydarbot_windows_agent/agent/tools/service.pyWhen helping with this project, prioritize:
The agent is designed to make Windows automation accessible to any LLM, so focus on clear, reliable, and well-documented implementations.