Nano Banana Pro
Agent skill for nano-banana-pro
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Sign in to like and favorite skills
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is a DuckDB Vector Search benchmarking project focused on systematically analyzing the performance characteristics of DuckDB's VSS (Vector Similarity Search) extension for text vector search scenarios. The project follows a functional programming paradigm with immutable data structures, pure functions, and explicit effect handling.
The project uses a layered functional architecture (see plan/ directory for detailed design):
48 experimental combinations testing:
# Install Python dependencies (recommended: use uv) uv sync # Alternative: pip install pip install duckdb faker pandas numpy psutil matplotlib seaborn plotly pyrsistent # Install and verify DuckDB VSS extension python test_duckdb_vss_installation.py # Verify VSS installation python -c "import duckdb; conn = duckdb.connect(); print(conn.execute('SELECT * FROM duckdb_extensions() WHERE extension_name = \'vss\'').fetchall())" # Check DuckDB version (VSS requires compatible version) python -c "import duckdb; print(f'DuckDB version: {duckdb.__version__}')" # Run tests to verify setup (87 tests, 99% success rate) pytest tests/ -v
# Run all 48 experiment combinations (sequential) python -m src.runners.experiment_runner --all # Run all experiments in parallel (recommended) python -m src.runners.experiment_runner --all --parallel # Run with custom parallel settings python -m src.runners.experiment_runner --all --parallel --workers 6 --max-memory 8000 # Run specific experiment matrix with parallel execution python -m src.runners.experiment_runner --data-scale small --dimensions 128,256 --parallel # Resume from checkpoint python -m src.runners.experiment_runner --resume --checkpoint-dir checkpoints/ # Monitor experiment progress python -m src.tools.monitor --experiment-dir experiments/ # Run tests (87 unit tests, 99% success rate) python -m pytest tests/ -v python -m pytest tests/pure/ -v # Test only pure functions python -m pytest tests/runners/ -v # Test runners including parallel execution python -m pytest tests/effects/ -v --db-mode=mock # Test effects with mocks
-- Create HNSW index CREATE INDEX idx_name ON table_name USING HNSW(vector_column) WITH (ef_construction = 128, ef_search = 64, M = 16, metric = 'cosine'); -- Vector similarity search SELECT * FROM table_name ORDER BY array_distance(vector_column, query_vector::FLOAT[n]) LIMIT k; -- Hybrid search (vector + text) WITH vector_results AS ( SELECT id, array_distance(vector, ?::FLOAT[n]) as v_score FROM table_name ORDER BY v_score LIMIT 100 ), text_results AS ( SELECT id, fts_main_table.match_bm25(id, ?) as t_score FROM table_name WHERE text LIKE '%' || ? || '%' ) SELECT v.id, (0.7 * (1 - v.v_score)) + (0.3 * t.t_score) as score FROM vector_results v JOIN text_results t ON v.id = t.id ORDER BY score DESC LIMIT k;
@dataclass(frozen=True) for all data structurescompose and pipe# Example type definitions Vector = NewType('Vector', List[float]) DocumentId = NewType('DocumentId', str) ExperimentConfig = @dataclass(frozen=True) IO[T] = Effect wrapper for side effects Either[E, T] = Error handling without exceptions
--parallel flag for faster benchmarkingko_KR)src/ ├── types/ # Type definitions (frozen dataclasses) ├── pure/ # Pure functions (no side effects) │ ├── generators/ # Data generation │ ├── transformers/ # Data transformations │ └── calculators/ # Metrics and analysis ├── effects/ # Side effect management │ ├── db/ # Database IO operations │ ├── io/ # File IO operations │ └── metrics/ # Performance monitoring ├── pipelines/ # Function composition pipelines └── runners/ # Main entry points ├── experiment_runner.py # CLI experiment runner with parallel support ├── parallel_runner.py # Parallel execution engine (Phase 4B) ├── checkpoint.py # Checkpoint management └── monitoring.py # Resource monitoring plan/ # Functional design documentation ├── 01-functional-architecture.md ├── 02-type-definitions.md ├── 03-pure-functions.md ├── 04-effect-management.md ├── 05-pipeline-composition.md ├── 06-experiment-workflow.md ├── 07-implementation-guide.md └── 08-current-status.md # Current implementation status
The project follows a structured experiment workflow with checkpointing:
Each stage supports checkpointing for resumability. See
plan/06-experiment-workflow.md for detailed workflow design.
generate_vector(seed, dimension): Create normalized vectors deterministicallycalculate_recall_at_k(retrieved, relevant, k): Accuracy metricsbatch_documents(documents, batch_size): Split data for processingwith_db_connection(config, f): Managed database connectionsmeasure_io(io): Performance measurement wrapperparallel_map_io(f, items): Concurrent IO executionsingle_experiment_pipeline(config): Complete experiment executiondata_preparation_pipeline(config): Generate test dataanalysis_pipeline(results): Aggregate and visualize results