Coding
PromptBeginner5 minmarkdown
Markdown Converter
Agent skill for markdown-converter
40
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Sign in to like and favorite skills
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
source venv/bin/activate uvicorn src.api.main:app --reload --host 127.0.0.1 --port 8000
python -m pytest tests/ -v # All tests python -m pytest tests/test_endpoints.py -v # Specific test file python tests/test_with_real_urls.py # Real URL classification tests
python src/train.py \ --legitimate data/raw/legitimate.csv \ --phishing data/raw/phishing.csv \ --output data/processed \ --models xgboost rf gb \ --workers 5 \ --batch-size 100
testuser / Password: TestPassword123!URL → Content Fetching (aiohttp/BeautifulSoup) → Feature Extraction → Feature Pipeline (Imputer + StandardScaler) → Ensemble Classifier → Threat Intelligence (VirusTotal) → Risk Scoring → Classification
The ensemble classifier (
src/models/ensemble_classifier.py) uses a VotingClassifier combining XGBoost, Random Forest, and Gradient Boosting models. Models are loaded from pickle files in data/processed/models/.
src/api/main.py - All FastAPI endpoints (authentication, classification, admin)src/api/auth.py - JWT authentication with 30-minute expiry, OAuth2PasswordBearersrc/api/database.py - MongoDB async operations via Motor driversrc/predict.py - Orchestrates ML predictions with threat intelligence integrationaccess_token)get_user_from_cookie() or Authorization headeris_admin flag for role-based accessPOST /classify - Single URL analysis (public)POST /classify-batch - Batch processing (requires Bearer token in Authorization header)GET /scan-history - User's scan history (requires Authorization header)GET /api/admin/dashboard - Admin analytics (cookie-based auth, requires is_admin)Collections:
users, scan_history
MONGO_URL environment variablesrc/features/content_features.py - HTML analysis: forms, scripts, iframes, DOM structuresrc/features/enhanced_url_features.py - URL analysis: length, HTTPS, subdomains, IP detection, suspicious patternsRequired in
.env:
MONGO_URL - MongoDB connection stringDB_NAME - Database name (default: phishing_detector)VIRUSTOTAL_API_KEY - For threat intelligence lookupsGOOGLE_SAFE_BROWSING_API_KEY - OptionalThe codebase uses Pydantic v2. Custom types like
PyObjectId in src/api/utils.py use __get_pydantic_core_schema__ instead of the deprecated __get_validators__.
Models were trained with scikit-learn 1.2.2. Current environment may show
InconsistentVersionWarning when loading pickled models. If models fail to load, retrain using src/train.py.