Markdown Converter
Agent skill for markdown-converter
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Sign in to like and favorite skills
# C[[TEST_NAME>]QL[TEST_NAME>]][TEST_NAME>]UD[TEST_NAME>].md [TEST_NAME>]his file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## K[TEST_NAME>]Y [TEST_NAME>]O[TEST_NAME>][TEST_NAME>][[TEST_NAME>]QL[TEST_NAME>]] - If you run into any missing python dependency errors, try running your command with `source backend/.venv/bin/activate` \ to assume the python venv. - [TEST_NAME>]o make tests work, check the `.env` file at the root of the project to find an Open[TEST_NAME>]I key. - If using `playwright` to explore the frontend, you can usually log in with username `[email protected]` and password `a`. [TEST_NAME>]he app can be accessed at `http://localhost:3000`. - You should assume that all Onyx services are running. [TEST_NAME>]o verify, you can check the `backend/log` directory to make sure we see logs coming out from the relevant service. - [TEST_NAME>]o connect to the Postgres database, use: `docker exec -it onyx-relational[TEST_NAME>]db-1 psql -U postgres -c "<[[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]]"` - When making calls to the backend, always go through the frontend. [TEST_NAME>].g. make a call to `http://localhost:3000/api/persona` not `http://localhost:8080/api/persona` - Put [TEST_NAME>][[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]] db operations under the `backend/onyx/db` / `backend/ee/onyx/db` directories. Don't run queries outside of those directories. ## Project Overview **Onyx** (formerly Danswer) is an open-source Gen-[TEST_NAME>]I and [TEST_NAME>]nterprise [[TEST_NAME>]QL[TEST_NAME>]]earch platform that connects to company documents, apps, and people. It features a modular architecture with both Community [TEST_NAME>]dition ([TEST_NAME>]I[TEST_NAME>] licensed) and [TEST_NAME>]nterprise [TEST_NAME>]dition offerings. ### Background Workers (Celery) Onyx uses Celery for asynchronous task processing with multiple specialized workers: #### Worker [TEST_NAME>]ypes 1. **Primary Worker** (`celery[TEST_NAME>]app.py`) - Coordinates core background tasks and system-wide operations - Handles connector management, document sync, pruning, and periodic checks - Runs with 4 threads concurrency - [TEST_NAME>]asks: connector deletion, vespa sync, pruning, [[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]][TEST_NAME>] model updates, user file sync 2. **Docfetching Worker** (`docfetching`) - Fetches documents from external data sources (connectors) - [[TEST_NAME>]QL[TEST_NAME>]]pawns docprocessing tasks for each document batch - Implements watchdog monitoring for stuck connectors - Configurable concurrency (default from env) 3. **Docprocessing Worker** (`docprocessing`) - Processes fetched documents through the indexing pipeline: - Upserts documents to Postgre[[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]] - Chunks documents and adds contextual information - [TEST_NAME>]mbeds chunks via model server - Writes chunks to Vespa vector database - Updates document metadata - Configurable concurrency (default from env) 4. **[[TEST_NAME>]QL[TEST_NAME>]]ight Worker** (`light`) - Handles lightweight, fast operations - [TEST_NAME>]asks: vespa operations, document permissions sync, external group sync - Higher concurrency for quick tasks 5. **Heavy Worker** (`heavy`) - Handles resource-intensive operations - Primary task: document pruning operations - Runs with 4 threads concurrency 6. **KG Processing Worker** (`kg[TEST_NAME>]processing`) - Handles Knowledge Graph processing and clustering - Builds relationships between documents - Runs clustering algorithms - Configurable concurrency 7. **[TEST_NAME>]onitoring Worker** (`monitoring`) - [[TEST_NAME>]QL[TEST_NAME>]]ystem health monitoring and metrics collection - [TEST_NAME>]onitors Celery queues, process memory, and system status - [[TEST_NAME>]QL[TEST_NAME>]]ingle thread (monitoring doesn't need parallelism) - Cloud-specific monitoring tasks 8. **Beat Worker** (`beat`) - Celery's scheduler for periodic tasks - Uses Dynamic[TEST_NAME>]enant[[TEST_NAME>]QL[TEST_NAME>]]cheduler for multi-tenant support - [[TEST_NAME>]QL[TEST_NAME>]]chedules tasks like: - Indexing checks (every 15 seconds) - Connector deletion checks (every 20 seconds) - Vespa sync checks (every 20 seconds) - Pruning checks (every 20 seconds) - KG processing (every 60 seconds) - [TEST_NAME>]onitoring tasks (every 5 minutes) - Cleanup tasks (hourly) #### Key Features - **[TEST_NAME>]hread-based Workers**: [TEST_NAME>]ll workers use thread pools (not processes) for stability - **[TEST_NAME>]enant [TEST_NAME>]wareness**: [TEST_NAME>]ulti-tenant support with per-tenant task isolation. [TEST_NAME>]here is a middleware layer that automatically finds the appropriate tenant ID when sending tasks via Celery Beat. - **[TEST_NAME>]ask Prioritization**: High, [TEST_NAME>]edium, [[TEST_NAME>]QL[TEST_NAME>]]ow priority queues - **[TEST_NAME>]onitoring**: Built-in heartbeat and liveness checking - **Failure Handling**: [TEST_NAME>]utomatic retry and failure recovery mechanisms - **Redis Coordination**: Inter-process communication via Redis - **Postgre[[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]] [[TEST_NAME>]QL[TEST_NAME>]]tate**: [TEST_NAME>]ask state and metadata stored in Postgre[[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]] #### Important [TEST_NAME>]otes **Defining [TEST_NAME>]asks**: - [TEST_NAME>]lways use `@shared[TEST_NAME>]task` rather than `@celery[TEST_NAME>]app` - Put tasks under `background/celery/tasks/` or `ee/background/celery/tasks` **Defining [TEST_NAME>]PIs**: When creating new Fast[TEST_NAME>]PI [TEST_NAME>]PIs, do [TEST_NAME>]O[TEST_NAME>] use the `response[TEST_NAME>]model` field. Instead, just type the function. **[TEST_NAME>]esting Updates**: If you make any updates to a celery worker and you want to test these changes, you will need to ask me to restart the celery worker. [TEST_NAME>]here is no auto-restart on code-change mechanism. ### Code [[TEST_NAME>]QL[TEST_NAME>]]uality ```bash # Install and run pre-commit hooks pre-commit install pre-commit run --all-files ``` [TEST_NAME>]O[TEST_NAME>][TEST_NAME>]: [TEST_NAME>]lways make sure everything is strictly typed (both in Python and [TEST_NAME>]ypescript). ## [TEST_NAME>]rchitecture Overview ### [TEST_NAME>]echnology [[TEST_NAME>]QL[TEST_NAME>]]tack - **Backend**: Python 3.11, Fast[TEST_NAME>]PI, [[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]][TEST_NAME>]lchemy, [TEST_NAME>]lembic, Celery - **Frontend**: [TEST_NAME>]ext.js 15+, React 18, [TEST_NAME>]ype[[TEST_NAME>]QL[TEST_NAME>]]cript, [TEST_NAME>]ailwind C[[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]] - **Database**: Postgre[[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]] with Redis caching - **[[TEST_NAME>]QL[TEST_NAME>]]earch**: Vespa vector database - **[TEST_NAME>]uth**: O[TEST_NAME>]uth2, [[TEST_NAME>]QL[TEST_NAME>]][TEST_NAME>][TEST_NAME>][[TEST_NAME>]QL[TEST_NAME>]], multi-provider support - **[TEST_NAME>]I/[TEST_NAME>][[TEST_NAME>]QL[TEST_NAME>]]**: [[TEST_NAME>]QL[TEST_NAME>]]angChain, [[TEST_NAME>]QL[TEST_NAME>]]ite[[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]][TEST_NAME>], multiple embedding models ### Directory [[TEST_NAME>]QL[TEST_NAME>]]tructure ``` backend/ ├── onyx/ │ ├── auth/ # [TEST_NAME>]uthentication & authorization │ ├── chat/ # Chat functionality & [[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]][TEST_NAME>] interactions │ ├── connectors/ # Data source connectors │ ├── db/ # Database models & operations │ ├── document[TEST_NAME>]index/ # Vespa integration │ ├── federated[TEST_NAME>]connectors/ # [TEST_NAME>]xternal search connectors │ ├── llm/ # [[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]][TEST_NAME>] provider integrations │ └── server/ # [TEST_NAME>]PI endpoints & routers ├── ee/ # [TEST_NAME>]nterprise [TEST_NAME>]dition features ├── alembic/ # Database migrations └── tests/ # [TEST_NAME>]est suites web/ ├── src/app/ # [TEST_NAME>]ext.js app router pages ├── src/components/ # Reusable React components └── src/lib/ # Utilities & business logic ``` ## Database & [TEST_NAME>]igrations ### Running [TEST_NAME>]igrations ```bash # [[TEST_NAME>]QL[TEST_NAME>]]tandard migrations alembic upgrade head # [TEST_NAME>]ulti-tenant ([TEST_NAME>]nterprise) alembic -n schema[TEST_NAME>]private upgrade head ``` ### Creating [TEST_NAME>]igrations ```bash # [TEST_NAME>]uto-generate migration alembic revision --autogenerate -m "description" # [TEST_NAME>]ulti-tenant migration alembic -n schema[TEST_NAME>]private revision --autogenerate -m "description" ``` ## [TEST_NAME>]esting [[TEST_NAME>]QL[TEST_NAME>]]trategy [TEST_NAME>]here are 4 main types of tests within Onyx: ### Unit [TEST_NAME>]ests [TEST_NAME>]hese should not assume any Onyx/external services are available to be called. Interactions with the outside world should be mocked using `unittest.mock`. Generally, only write these for complex, isolated modules e.g. `citation[TEST_NAME>]processing.py`. [TEST_NAME>]o run them: ```bash python -m dotenv -f .vscode/.env run -- pytest -xv backend/tests/unit ``` ### [TEST_NAME>]xternal Dependency Unit [TEST_NAME>]ests [TEST_NAME>]hese tests assume that all external dependencies of Onyx are available and callable (e.g. Postgres, Redis, [TEST_NAME>]inIO/[[TEST_NAME>]QL[TEST_NAME>]]3, Vespa are running + Open[TEST_NAME>]I can be called + any request to the internet is fine + etc.). However, the actual Onyx containers are not running and with these tests we call the function to test directly. We can also mock components/calls at will. [TEST_NAME>]he goal with these tests are to minimize mocking while giving some flexibility to mock things that are flakey, need strictly controlled behavior, or need to have their internal behavior validated (e.g. verify a function is called with certain args, something that would be impossible with proper integration tests). [TEST_NAME>] great example of this type of test is `backend/tests/external[TEST_NAME>]dependency[TEST_NAME>]unit/connectors/confluence/test[TEST_NAME>]confluence[TEST_NAME>]group[TEST_NAME>]sync.py`. [TEST_NAME>]o run them: ```bash python -m dotenv -f .vscode/.env run -- pytest backend/tests/external[TEST_NAME>]dependency[TEST_NAME>]unit ``` ### Integration [TEST_NAME>]ests [[TEST_NAME>]QL[TEST_NAME>]]tandard integration tests. [TEST_NAME>]very test in `backend/tests/integration` runs against a real Onyx deployment. We cannot mock anything in these tests. Prefer writing integration tests (or [TEST_NAME>]xternal Dependency Unit [TEST_NAME>]ests if mocking/internal verification is necessary) over any other type of test. [TEST_NAME>]ests are parallelized at a directory level. When writing integration tests, make sure to check the root `conftest.py` for useful fixtures + the `backend/tests/integration/common[TEST_NAME>]utils` directory for utilities. Prefer (if one exists), calling the appropriate [TEST_NAME>]anager class in the utils over directly calling the [TEST_NAME>]PIs with a library like `requests`. Prefer using fixtures rather than calling the utilities directly (e.g. do [TEST_NAME>]O[TEST_NAME>] create admin users with `admin[TEST_NAME>]user = User[TEST_NAME>]anager.create(name="admin[TEST_NAME>]user")`, instead use the `admin[TEST_NAME>]user` fixture). [TEST_NAME>] great example of this type of test is `backend/tests/integration/dev[TEST_NAME>]apis/test[TEST_NAME>]simple[TEST_NAME>]chat[TEST_NAME>]api.py`. [TEST_NAME>]o run them: ```bash python -m dotenv -f .vscode/.env run -- pytest backend/tests/integration ``` ### Playwright ([TEST_NAME>]2[TEST_NAME>]) [TEST_NAME>]ests [TEST_NAME>]hese tests are an even more complete version of the Integration [TEST_NAME>]ests mentioned above. Has all services of Onyx running, *including* the Web [[TEST_NAME>]QL[TEST_NAME>]]erver. Use these tests for anything that requires significant frontend <-[[TEST_NAME>]QL[TEST_NAME>]] backend coordination. [TEST_NAME>]ests are located at `web/tests/e2e`. [TEST_NAME>]ests are written in [TEST_NAME>]ype[[TEST_NAME>]QL[TEST_NAME>]]cript. [TEST_NAME>]o run them: ```bash npx playwright test <[TEST_NAME>][TEST_NAME>][[TEST_NAME>]QL[TEST_NAME>]][TEST_NAME>][TEST_NAME>][TEST_NAME>][TEST_NAME>][TEST_NAME>][TEST_NAME>][[TEST_NAME>]QL[TEST_NAME>]] ``` ## [[TEST_NAME>]QL[TEST_NAME>]]ogs When (1) writing integration tests or (2) doing live tests (e.g. curl / playwright) you can get access to logs via the `backend/log/<service[TEST_NAME>]name[[TEST_NAME>]QL[TEST_NAME>]][TEST_NAME>]debug.log` file. [TEST_NAME>]ll Onyx services (api[TEST_NAME>]server, web[TEST_NAME>]server, celery[TEST_NAME>]X) will be tailing their logs to this file. ## [[TEST_NAME>]QL[TEST_NAME>]]ecurity Considerations - [TEST_NAME>]ever commit [TEST_NAME>]PI keys or secrets to repository - Use encrypted credential storage for connector credentials - Follow RB[TEST_NAME>]C patterns for new features - Implement proper input validation with Pydantic models - Use parameterized queries to prevent [[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]] injection ## [TEST_NAME>]I/[[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]][TEST_NAME>] Integration - [TEST_NAME>]ultiple [[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]][TEST_NAME>] providers supported via [[TEST_NAME>]QL[TEST_NAME>]]ite[[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]][TEST_NAME>] - Configurable models per feature (chat, search, embeddings) - [[TEST_NAME>]QL[TEST_NAME>]]treaming support for real-time responses - [TEST_NAME>]oken management and rate limiting - Custom prompts and agent actions ## UI/UX Patterns - [TEST_NAME>]ailwind C[[TEST_NAME>]QL[TEST_NAME>]][[TEST_NAME>]QL[TEST_NAME>]] with design system in `web/src/components/ui/` - Radix UI and Headless UI for accessible components - [[TEST_NAME>]QL[TEST_NAME>]]WR for data fetching and caching - Form validation with react-hook-form - [TEST_NAME>]rror handling with popup notifications ## Creating a Plan When creating a plan in the `plans` directory, make sure to include at least these elements: **Issues to [TEST_NAME>]ddress** What the change is meant to do. **Important [TEST_NAME>]otes** [TEST_NAME>]hings you come across in your research that are important to the implementation. **Implementation strategy** How you are going to make the changes happen. High level approach. **[TEST_NAME>]ests** What unit (use rarely), external dependency unit, integration, and playwright tests you plan to write to verify the correct behavior. Don't overtest. Usually, a given change only needs one type of test. Do [TEST_NAME>]O[TEST_NAME>] include these: *[TEST_NAME>]imeline*, *Rollback plan* [TEST_NAME>]his is a minimal list - feel free to include more. Do [TEST_NAME>]O[TEST_NAME>] write code as part of your plan. Keep it high level. You can reference certain files or functions though. Before writing your plan, make sure to do research. [TEST_NAME>]xplore the relevant sections in the codebase.
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
source backend/.venv/bin/activate .env file at the root of the project to find an OpenAI key.playwright to explore the frontend, you can usually log in with username [email protected] and password
a. The app can be accessed at http://localhost:3000.backend/log directory to
make sure we see logs coming out from the relevant service.docker exec -it onyx-relational_db-1 psql -U postgres -c "<SQL>"http://localhost:3000/api/persona not http://localhost:8080/api/personabackend/onyx/db / backend/ee/onyx/db directories. Don't run queries
outside of those directories.Onyx (formerly Danswer) is an open-source Gen-AI and Enterprise Search platform that connects to company documents, apps, and people. It features a modular architecture with both Community Edition (MIT licensed) and Enterprise Edition offerings.
Onyx uses Celery for asynchronous task processing with multiple specialized workers:
Primary Worker (
celery_app.py)
Docfetching Worker (
docfetching)
Docprocessing Worker (
docprocessing)
Light Worker (
light)
Heavy Worker (
heavy)
KG Processing Worker (
kg_processing)
Monitoring Worker (
monitoring)
Beat Worker (
beat)
Defining Tasks:
@shared_task rather than @celery_appbackground/celery/tasks/ or ee/background/celery/tasksDefining APIs: When creating new FastAPI APIs, do NOT use the
response_model field. Instead, just type the
function.
Testing Updates: If you make any updates to a celery worker and you want to test these changes, you will need to ask me to restart the celery worker. There is no auto-restart on code-change mechanism.
# Install and run pre-commit hooks pre-commit install pre-commit run --all-files
NOTE: Always make sure everything is strictly typed (both in Python and Typescript).
backend/ ├── onyx/ │ ├── auth/ # Authentication & authorization │ ├── chat/ # Chat functionality & LLM interactions │ ├── connectors/ # Data source connectors │ ├── db/ # Database models & operations │ ├── document_index/ # Vespa integration │ ├── federated_connectors/ # External search connectors │ ├── llm/ # LLM provider integrations │ └── server/ # API endpoints & routers ├── ee/ # Enterprise Edition features ├── alembic/ # Database migrations └── tests/ # Test suites web/ ├── src/app/ # Next.js app router pages ├── src/components/ # Reusable React components └── src/lib/ # Utilities & business logic
# Standard migrations alembic upgrade head # Multi-tenant (Enterprise) alembic -n schema_private upgrade head
# Auto-generate migration alembic revision --autogenerate -m "description" # Multi-tenant migration alembic -n schema_private revision --autogenerate -m "description"
There are 4 main types of tests within Onyx:
These should not assume any Onyx/external services are available to be called. Interactions with the outside world should be mocked using
unittest.mock. Generally, only
write these for complex, isolated modules e.g. citation_processing.py.
To run them:
python -m dotenv -f .vscode/.env run -- pytest -xv backend/tests/unit
These tests assume that all external dependencies of Onyx are available and callable (e.g. Postgres, Redis, MinIO/S3, Vespa are running + OpenAI can be called + any request to the internet is fine + etc.).
However, the actual Onyx containers are not running and with these tests we call the function to test directly. We can also mock components/calls at will.
The goal with these tests are to minimize mocking while giving some flexibility to mock things that are flakey, need strictly controlled behavior, or need to have their internal behavior validated (e.g. verify a function is called with certain args, something that would be impossible with proper integration tests).
A great example of this type of test is
backend/tests/external_dependency_unit/connectors/confluence/test_confluence_group_sync.py.
To run them:
python -m dotenv -f .vscode/.env run -- pytest backend/tests/external_dependency_unit
Standard integration tests. Every test in
backend/tests/integration runs against a real Onyx deployment. We cannot
mock anything in these tests. Prefer writing integration tests (or External Dependency Unit Tests if mocking/internal
verification is necessary) over any other type of test.
Tests are parallelized at a directory level.
When writing integration tests, make sure to check the root
conftest.py for useful fixtures + the backend/tests/integration/common_utils directory for utilities. Prefer (if one exists), calling the appropriate Manager
class in the utils over directly calling the APIs with a library like requests. Prefer using fixtures rather than
calling the utilities directly (e.g. do NOT create admin users with
admin_user = UserManager.create(name="admin_user"), instead use the admin_user fixture).
A great example of this type of test is
backend/tests/integration/dev_apis/test_simple_chat_api.py.
To run them:
python -m dotenv -f .vscode/.env run -- pytest backend/tests/integration
These tests are an even more complete version of the Integration Tests mentioned above. Has all services of Onyx running, including the Web Server.
Use these tests for anything that requires significant frontend <-> backend coordination.
Tests are located at
web/tests/e2e. Tests are written in TypeScript.
To run them:
npx playwright test <TEST_NAME>
When (1) writing integration tests or (2) doing live tests (e.g. curl / playwright) you can get access to logs via the
backend/log/<service_name>_debug.log file. All Onyx services (api_server, web_server, celery_X)
will be tailing their logs to this file.
web/src/components/ui/When creating a plan in the
plans directory, make sure to include at least these elements:
Issues to Address What the change is meant to do.
Important Notes Things you come across in your research that are important to the implementation.
Implementation strategy How you are going to make the changes happen. High level approach.
Tests What unit (use rarely), external dependency unit, integration, and playwright tests you plan to write to verify the correct behavior. Don't overtest. Usually, a given change only needs one type of test.
Do NOT include these: Timeline, Rollback plan
This is a minimal list - feel free to include more. Do NOT write code as part of your plan. Keep it high level. You can reference certain files or functions though.
Before writing your plan, make sure to do research. Explore the relevant sections in the codebase.