Coding
PromptBeginner5 minmarkdown
Markdown Converter
Agent skill for markdown-converter
7
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Sign in to like and favorite skills
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Scrapix is an enterprise-grade web crawling and content extraction platform built as a TypeScript monorepo. It provides intelligent web scraping with AI-powered extraction, designed for integration with Meilisearch search engine.
# Development yarn dev # Run all apps in development mode with hot-reload yarn dev:build # Build in watch mode # Building & Running yarn build # Build all packages yarn scrape # Run the CLI scraper (works from anywhere) yarn server # Run the API server yarn server:dev # Run server in development mode # Code Quality yarn lint # Run ESLint across all packages yarn lint:fix # Auto-fix linting errors yarn test # Run Jest tests # Specific Apps cd apps/scraper/core && yarn dev # Work on core library cd apps/scraper/server && yarn dev # Work on API server with hot-reload cd apps/proxy && yarn dev # Work on proxy server
# Start development services (Meilisearch, Redis, apps) docker-compose up # Services available: # - Meilisearch: http://localhost:7700 (master key: masterKey) # - Redis: localhost:6379 # - Scraper API: http://localhost:8080 # - Playground: http://localhost:3000
# Quick scraper usage (works from anywhere in the project) yarn scrape -p misc/tests/meilisearch/simple.json # With inline config yarn scrape -c '{"start_urls":["https://example.com"],"meilisearch_url":"http://localhost:7700","meilisearch_api_key":"masterKey","meilisearch_index_uid":"my_index"}' # With custom browser yarn scrape -p misc/config_examples/default-simple.json -b "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
# Start server on default port 8080 yarn server # Custom port yarn server -p 3000 # With Redis for job queue yarn server -r redis://localhost:6379 # With custom environment file yarn server -e .env.production # Development mode with hot-reload yarn server:dev
apps/ ├── scraper/ │ ├── core/ # Core crawling library (Crawlee-based) │ ├── server/ # REST API with Bull queue │ └── cli/ # Command-line interface ├── proxy/ # Proxy server for enterprise proxies ├── playground/ # Next.js test application └── docs/ # Mintlify documentation site
Crawler System (
apps/scraper/core/src/):
Crawler - Factory for creating crawler instancesBaseCrawler - Abstract base implementing crawling logicCheerioCrawler - Fast static HTML parsingPuppeteerCrawler - Chrome automation for JS sitesPlaywrightCrawler - Cross-browser automationFeature Pipeline (
apps/scraper/core/src/scrapers/features/):
block_split, metadata, ai_extraction, ai_summary, markdown, schema, custom_selectorsDocument Flow:
Server API (
apps/scraper/server/src/):
/crawl, /crawl/sync, /job/:id/status, /job/:id/eventsmisc/tests/# Required for AI features OPENAI_API_KEY=sk-... # Production deployment REDIS_URL=redis://... # Upstash Redis WEBHOOK_URL=https://... # Monitoring webhooks WEBHOOK_TOKEN=... # Webhook auth
Adding a New Feature:
apps/scraper/core/src/scrapers/features/Modifying Crawler Behavior:
BaseCrawler for shared logicCrawler.create()API Changes:
apps/scraper/server/src/routes/scrapix.fly.dev/health endpointdeploy-commands.md for deployment scripts# In apps/scraper/core npm version patch/minor/major yarn build npm publish