Markdown Converter
Agent skill for markdown-converter
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Sign in to like and favorite skills
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
IMPORTANT: This project uses
uv as the package manager. Never use pip directly. All Python commands must be run with uv run prefix.
# Install dependencies uv sync # Install dev dependencies (for linting, formatting, type checking) uv sync --group dev # Environment variables required # Create .env file in root with: ANTHROPIC_API_KEY=your_anthropic_api_key_here
# Quick start using shell script chmod +x run.sh ./run.sh # Manual start cd backend && uv run uvicorn app:app --reload --port 8000 # Access points: # - Web Interface: http://localhost:8000 # - API Documentation: http://localhost:8000/docs
# Run all tests uv run pytest # Run specific test file uv run pytest backend/tests/test_rag_system.py # Run tests by marker uv run pytest -m unit # Unit tests only uv run pytest -m integration # Integration tests only uv run pytest -m api # API tests only uv run pytest -m "not slow" # Skip slow tests # Run single test uv run pytest backend/tests/test_rag_system.py::test_query_with_session
# Auto-fix formatting and imports (modifies files) ./scripts/format.sh # Check code quality without modifying (for CI/pre-commit) ./scripts/lint.sh
Always use
uv run prefix for Python commands:
uv run python script.py uv run pytest
# Production dependency (DO NOT use pip install) uv add package_name # Development dependency (DO NOT use pip install) uv add --group dev package_name
This is a Retrieval-Augmented Generation (RAG) system for course materials. The key architectural pattern is tool-based search: Claude decides when to search vs. use general knowledge.
frontend/script.js) → POST /api/query with {query, session_id}backend/app.py) → Create/retrieve session → Call rag_system.query()backend/rag_system.py) → Retrieve conversation history → Pass tools to AIbackend/ai_generator.py) → Claude API with up to 2 rounds of tool callsbackend/search_tools.py):
search_course_content with query, optional course_name, lesson_numberbackend/vector_store.py):
course_catalog collectioncourse_content collection with embeddingsWhy Two Collections? The system uses separate collections for metadata and content to enable intelligent course resolution:
: Stores course metadata for semantic name matchingcourse_catalog
title, instructor, course_link, lesson_count, lessons_json
: Stores chunked lesson content for retrievalcourse_content
{course_title}_{chunk_index}course_title, lesson_number, chunk_indexSearch Flow Example:
User: "What is MCP in lesson 3?" ↓ course_catalog.query("MCP") → course_title = "Introduction to MCP" ↓ course_content.query("What is MCP", where={ "course_title": "Introduction to MCP", "lesson_number": 3 })
Sessions are in-memory only and cleared on server restart:
session_id providedMAX_HISTORY in config)"User: {query}\nAssistant: {response}"When documents are added to
docs/ and server starts:
Parse Document (
document_processor.py):
Expected format: Course Title: [title] Course Link: [url] Course Instructor: [name] Lesson 1: [title] Lesson Link: [url] [content...]
Chunk Text: Sentence-based splitting (800 chars, 100 overlap)
"Lesson {N} content:"Store in ChromaDB:
course_catalogcourse_contentDeduplication: Existing course titles (by exact match) are skipped on reload
Multi-Round Tool Use:
Available Tools:
(CourseSearchTool):search_course_content
query (required), course_name (optional), lesson_number (optional)[Course - Lesson N] headerslast_sources and last_source_links for UI
(CourseOutlineTool):get_course_outline
course_name (required)Located in
backend/config.py:
CHUNK_SIZE: 800 charactersCHUNK_OVERLAP: 100 charactersMAX_RESULTS: 5 search results per queryMAX_HISTORY: 2 conversation pairsANTHROPIC_MODEL: claude-sonnet-4-20250514EMBEDDING_MODEL: all-MiniLM-L6-v2 (SentenceTransformers)CHROMA_PATH: ./chroma_dbTest files mirror backend structure:
conftest.py: Shared fixtures (temp directories, mock data, test courses)@pytest.mark.unit, @pytest.mark.integration, @pytest.mark.api, @pytest.mark.slowTestClient from FastAPIsearch_tools.py (parameters, description)AIGenerator.SYSTEM_PROMPT if tool usage guidelines changedocs/ directoryEMBEDDING_MODEL in configbackend/chroma_db/CourseSearchTool.execute() stores sources in self.last_sources and self.last_source_linksToolManager.get_last_sources() retrieves them after AI generationRAGSystem.query() collects and returns to APImarked.js (CDN) for markdown renderingcurrentSessionId tracked globally