Nano Banana Pro
Agent skill for nano-banana-pro
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Sign in to like and favorite skills
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
GreenDish (formerly ConvergeFi) is a microservices-based Restaurant Menu Vegetarian Dish Analyzer that processes menu photos to identify and calculate total prices of vegetarian dishes. The system uses:
{name, price, raw_text} JSON for each dishopenai/gpt-oss-20b default) with OpenRouter fallback, surfaced through the shared api.llm router utilitiesCurrent Status: Phase 9+ Complete - All core features implemented and tested
The system is split into three microservices that communicate over HTTP:
API Service (
/api) - FastAPI REST API
parsed_menu)menu-processor agent for dish classification and RAG fallbackMCP Server (
/mcp-server) - Calculation service
Streamlit UI (
/streamlit-ui) - Testing interface
parsed_menu JSON for verificationCRITICAL: All imports within the
/api directory use relative imports, NOT absolute imports with the api. prefix:
# Correct from config import settings from models import OCRResult from services import OCRService # Wrong from api.config import settings from api.models import OCRResult
This is because the API runs from within the
/api directory. The same pattern applies to mcp-server and streamlit-ui. Subpackages should use explicit relative imports (from ..config import settings) so they remain importable from outside the service (e.g., the Streamlit chat page).
This project uses uv for fast, reliable Python package management. All dependencies are defined in
pyproject.toml files (one per service).
# Via curl (recommended) curl -LsSf https://astral.sh/uv/install.sh | sh # Via pip pip install uv # Via Homebrew (macOS) brew install uv
# Method 1: Using uv pip install (for existing environments) cd api # or streamlit-ui or mcp-server uv pip install -r pyproject.toml # Method 2: Using uv sync (recommended, creates/updates .venv) cd api # or streamlit-ui or mcp-server uv sync
# Edit pyproject.toml and add to dependencies array # Then run: uv sync # Or use uv to add directly: uv add package-name
IMPORTANT: Do NOT create
requirements.txt files. All dependencies are managed via pyproject.toml.
api/llm/ (groq_client.py, openrouter_client.py, and router_client.py) and are re-exported via api/llm/__init__.py.from llm import GroqClient, LLMRouter). From external services (Streamlit), use from api.llm import GroqClient (or OpenRouterClient) after ensuring the repo root is on sys.path.complete_json(...) enforces schema-validated responses for classification nodes; chat(...) streams plain text output for playgrounds.# API Service (from /api directory) cd api uv sync # Install/update dependencies first python main.py # Runs on http://localhost:8005 # Alternative: Using uvicorn directly with auto-reload uv run uvicorn main:app --reload --port 8005 # Streamlit UI (from /streamlit-ui directory) cd streamlit-ui uv sync # Install/update dependencies first streamlit run app.py # Runs on http://localhost:8501 # Or using uv run: uv run streamlit run app.py # MCP Server (from /mcp-server directory) - Phase 4+ cd mcp-server uv sync # Install/update dependencies first python server.py # Runs on http://localhost:8001
# Build and run all services docker-compose up --build # Run in detached mode docker-compose up -d # Stop services docker-compose down # View logs docker-compose logs -f api docker-compose logs -f streamlit
# Quick OCR test with sample menus python scripts/test_ocr.py # Run pytest suite (Phase 9+) pytest tests/ # Run specific test file pytest tests/test_ocr.py # Run with coverage pytest --cov=api --cov=mcp-server tests/
tesseract --version # Should show version 5.x.x
All configuration is centralized in
/api/config.py using Pydantic Settings. Environment variables are loaded from .env (copy from .env.example):
# Create .env from example cp .env.example .env
Key settings:
DEBUG: Enable debug mode and auto-reloadMAX_IMAGES: Maximum images per request (default: 5)TESSERACT_CMD: Path to Tesseract binary (auto-detect if None)MCP_SERVER_URL: MCP server endpoint (Phase 4+)OPENROUTER_API_KEY: For LLM classification (Phase 5+)OPENROUTER_PRIMARY_MODEL: Default deepseek/deepseek-chat-v3.1OPENROUTER_FALLBACK_MODEL: Secondary model identifier (optional; blank disables fallback)LANGCHAIN_TRACING_V2: Enable LangSmith tracing (Phase 7+)CONFIDENCE_THRESHOLD: Classification confidence threshold (default: 0.7)This project was built in 10 phases. See docs/phases/PHASE_WISE_PLAN.MD for the complete roadmap.
All API endpoints are prefixed with
/api/v1/ to allow future versioning.
GET /health
POST /api/v1/extract-text
POST /api/v1/process-menu
parsed_menu JSON (dishes + stats)All API responses use Pydantic models defined in
/api/models/schemas.py:
HealthResponseOCRResult - Single image OCR resultDish - Structured dish with name, price, classificationParsedDish / ParsedMenu - Canonical parsed menu payload returned post-OCRProcessMenuResponse - Complete menu processing resultBusiness logic is separated into service classes in
/api/services/:
OCRService - Tesseract OCR integration, image preprocessingParserService - Text parsing (Phase 2)ClassifierService - Keyword/LLM classification (Phases 3-5)RAGService - Vector search (Phase 6)Each service is instantiated once at module level in the router and reused across requests.
Agent workflows are defined in
/api/agents/:
menu_processor.py builds the LangGraph state machine that coordinates parsing output, LLM classification, RAG lookups, and MCP tool calls.classifier_node.py, rag_node.py, calculator_node.py) should remain small and composable.The MCP server implements tools following the Model Context Protocol standard:
Tools:
calculate_vegetarian_total - Sums prices of vegetarian dishes (primary tool)Communication between API/LangGraph and MCP server uses HTTP transport with JSON-RPC style requests.
The Streamlit UI uses a multi-page app structure where each page corresponds to a development phase:
streamlit-ui/ βββ app.py # Main dashboard/overview βββ pages/ βββ 1_OCR_Test.py # Phase 1 testing βββ 2_Parser_Test.py # Phase 2 testing βββ ... (more pages added per phase)
Each page should be self-contained and demonstrate the capabilities added in that phase.
Sample menu images are provided in
tests/fixtures/images/:
menu1.jpeg - Applebee's menu (complex, multi-column)menu2.png - Simple menu with pricesmenu3.webp - Cafe menu with descriptionsimage_4.webp, image_6.png - Additional menus for parser regression testsUse
scripts/test_ocr.py for quick validation that OCR is working. Full pytest suite is available in Phase 9+.
When implementing LLM classification:
openai/gpt-oss-20b as the default model, falling back to OpenRouter (deepseek/deepseek-chat-v3.1) when configured via env / Streamlit UI{is_vegetarian: bool, confidence: float, reasoning: str}ChromaDB setup (invoked from the LangGraph agent within the API service):
sentence-transformers/all-MiniLM-L6-v2 for embeddings (fast, good quality)/api/rag_db//api/data/vegetarian_db.jsonfrom api.moduleMAX_FILE_SIZE_MB/api directoryEach phase completion should result in a clear commit:
PHASE_WISE_PLAN.MD to mark phase as completeWhen adding LangSmith tracing:
@traceable decorator on key functionsrequest_id