Markdown Converter
Agent skill for markdown-converter
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Sign in to like and favorite skills
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This repository contains experiments and assignments from Harvard CS 2881: AI Safety, which I am auditing to deepen my understanding of AI safety risks and mitigation strategies. The course focuses on:
Each subdirectory represents a distinct experiment or homework assignment, with self-contained code and documentation.
ai-alignment-research/ ├── CLAUDE.md # This file - general guidance ├── .env.example # API key template (shared across experiments) ├── .env # Your API keys (gitignored, create from .env.example) ├── check_env.py # Environment verification script (shared) ├── harvard-cs-2881-hw0/ # HW0: Emergent Misalignment replication │ ├── README_EXPERIMENT.md # Experiment-specific documentation │ ├── train.py # Training scripts │ ├── generate.py # Generation scripts │ └── eval/ # Evaluation utilities ├── harvard-cs-2881-hw1-RL/ # HW1: Prompt Prefix Optimization via RL │ ├── README_EXPERIMENT.md # Experiment-specific documentation │ ├── notable_people_10k.csv # Dataset of notable people │ ├── src/ # Core modules (policy, benchmarks, training) │ └── scripts/ # Training and analysis scripts └── [future experiments]/ # Additional course experiments
When creating new experiments, consider reusing these patterns from existing work:
The
ModelQueryInterface class provides a clean abstraction for:
Reuse this for: Any experiment requiring model inference, comparative evaluation, or response generation.
Structured evaluation using LLMs to score responses on multiple dimensions:
Reuse this for: Experiments requiring qualitative assessment of model outputs, alignment metrics, or comparative studies.
Complete pipeline for parameter-efficient finetuning:
Reuse this for: Finetuning experiments, behavioral studies, preference learning tasks.
Standardize on JSONL with chat messages for training data:
{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
This format is:
Use CSV with standard fields for evaluation results:
id,question,response,[additional_scoring_columns]
This enables:
When starting a new experiment:
Create experiment directory:
mkdir harvard-cs-2881-[assignment]/
Copy reusable utilities: Instead of rewriting from scratch, copy and adapt:
# Copy model interface if you need inference cp harvard-cs-2881-hw0/eval/query_utils.py new-experiment/utils/ # Copy evaluation framework if using LLM judges cp harvard-cs-2881-hw0/eval/judge.py new-experiment/eval/ # Copy training scaffold if finetuning cp harvard-cs-2881-hw0/train.py new-experiment/train.py
Document experiment specifics: Create
README_EXPERIMENT.md with:
Maintain separation: Keep experiments self-contained (dependencies, data, models) to avoid cross-contamination.
CRITICAL: This repository uses external APIs (OpenAI, HuggingFace) that require API keys. Follow these practices to prevent accidental exposure:
Use .env files for local development:
# Copy template at repository root (shared across all experiments) cp .env.example .env # Edit with vim and add your API keys (never commit this file!) vim .env # Add these keys: # OPENAI_API_KEY=sk-your-key-here # HF_TOKEN=hf_your-token-here
Automatic loading in Python scripts:
# Add to top of scripts that need API access try: from dotenv import load_dotenv load_dotenv() except ImportError: pass # Fallback to manually set env vars
Verify setup before running experiments:
python check_env.py # Run from repository root
The repository
.gitignore protects:
.env and all variants (*.env, .env.local, etc.)secrets.json, credentials.json)Before each commit:
# Verify no sensitive files are staged git status # Check for hardcoded keys (should return nothing) grep -r "sk-[a-zA-Z0-9]" --include="*.py" --include="*.md" . grep -r "hf_[a-zA-Z0-9]" --include="*.py" --include="*.md" . # Verify .env is ignored git check-ignore .env # Should output: .env
If you accidentally commit a key:
os.getenv("KEY_NAME") to access keys.env.example templates (without real keys) for documentationcheck_env.py to verify setup.env files# Most experiments use PyTorch + Transformers + PEFT pip install torch transformers peft datasets accelerate bitsandbytes pip install openai python-dotenv # For LLM-as-judge evaluation and env management
# For memory-constrained environments, use 4-bit quantization python train.py --use_4bit # Monitor GPU usage watch -n 1 nvidia-smi
# Most experiments include a sandbox.py for interactive testing python sandbox.py # Modify paths/prompts as needed
Harvard CS 2881: AI Safety (Auditing)
See individual experiment READMEs for specific assignment details and findings.