AGENTS.md

Guide for AI Agents working on the FRAME project

Project Overview

FRAME is a tool suite for refining multimodal measurements through materials digital twins. The project combines generative models with virtual instrumentation to help researchers and scientists better characterize material structures.

Core Concept

Generate structural ensembles using a latent diffusion model (digital twin)
Compute virtual instrument data from these structures
Refine the ensemble so virtual measurements match real experimental data

Initial Focus

Material System: Lipid nanoparticles (LNPs)
Measurements: SAXS, SANS, and cryo-EM
Training Data: Parametric "cartoons" generated by sampling statistical priors of structural parameters

Workspace Structure

This is a uv workspace with three core packages:

frame/
├── apps/                    # Runnable CLIs/services
├── config/                  # Workspace configuration
│   └── config.toml         # FRAME core configuration
├── packages/
│   ├── frame-core/         # Data models, storage, experiment management, unified CLI
│   ├── frame-geo/          # Stochastic geometry ("cartoon generator")
│   └── frame-twin/         # Diffusion model (training & inference)
├── frame_data/             # Default data root (libraries and experiments)
│   ├── libraries/          # UUID-tracked data libraries
│   └── experiments/        # UUID-tracked experiments and checkpoints
├── dev/                    # Local development sandbox (gitignored)
├── scripts/                # Utility scripts
├── docs/                   # MkDocs documentation
├── tests/                  # Cross-package E2E tests
└── pyproject.toml          # Workspace configuration

IMPORTANT:

frame-voxel

was deprecated as of commit 8634814. All voxel functionality has been fully integrated into

frame-core

. Always import from

frame.*

instead of

frame_voxel.*

Package Responsibilities

frame-core

✅ IMPLEMENTED

Purpose: Foundation for data representation, I/O, and experiment management

Base Data Model: Multi-channel 3D voxel grids
- PyTorch tensor-based:
```
VoxelGrid
```
  class with shape
```
(C, D, H, W)
```
- Default: 128³ grid, 1 nm³ per voxel (configurable)
- Channels: 10-20 different material/property channels
- Support for arbitrary grid sizes and voxel dimensions
Data Management:
- ✅
```
LibraryManager
```
  - UUID-based library tracking and management
- ✅
```
ExperimentManager
```
  - Experiment tracking with checkpoint management
- ✅
```
CheckpointManager
```
  - Model checkpoint versioning
- ✅ Immutable libraries with lineage tracking
- ✅ Tag-based search and filtering
- ✅ Write protection for data integrity
Storage & I/O:
- ✅
```
VoxelGrid
```
  data model with validation and metadata
- ✅
```
VoxelLibrary
```
  for Zarr-based storage with lazy loading
- ✅
```
VoxelDataset
```
  for PyTorch training integration
- ✅ Parameter-based filtering and batch loading
- ✅ Efficient memory management with lazy loading
Visualization:
- ✅ Three visualization backends:
  - ```
  visualize_napari.py
```
  - Interactive 3D viewer
- ```
visualize_pyvista.py
```
    - High-quality 3D rendering
  - ```
  visualize_batch.py
```
  - Batch visualization utilities
Unified CLI:
- ✅
```
uv run frame
```
  - Central command integrating all packages
- ✅ Library management:
```
list
```
  ,
```
show
```
  ,
```
search
```
  ,
```
tag
```
  ,
```
untag
```
- ✅ Experiment management:
```
list
```
  ,
```
show
```
  ,
```
tag
```
  ,
```
untag
```
  ,
```
stop
```
- ✅ Checkpoint management:
```
list
```
  ,
```
show
```
  ,
```
set-best
```
- ✅ Visualization:
```
view
```
  - Interactive napari visualization
- ✅ TensorBoard launcher
- ✅ Migration tools for legacy data
- ✅ Integrates frame-geo and frame-twin subcommands
Migration Tools:
- ✅ Automatic migration of old data structures
- ✅ Specialized migration for lnp_5k_10ch
- ✅ Validation with error reporting
- ✅ Dry-run mode for testing

Module:

from frame import ...

(not

frame_core

)

frame-geo

✅ IMPLEMENTED

Purpose: Stochastic geometry generator for training data

Configuration-Driven: All parameters specified via TOML files
PyMC Integration: Samples from statistical priors with deterministic derived parameters
LNP Structure Generator: Complete implementation for lipid nanoparticles
- Shell1 (outer bilayer, head outward) - always present
- Shell2 (optional inner bilayer, tail outward)
- Payloads (spherical particles placed via Poisson disc sampling)
- Blebs (surface protrusions on Shell1)
Implemented Features:
- ✅ TOML-based configuration system
- ✅ PyMC prior builder with conditional logic
- ✅ Geometric primitives (Sphere, Shell)
- ✅ Poisson disc sampling (3D volumes + sphere surfaces)
- ✅ 7 validators for physical constraints:
  - Grid bounds, shell nesting, payload clearance
  - Bleb placement, minimum thickness, volume conservation, geometric feasibility
- ✅ Hybrid voxelization (analytical + sampling)
- ✅ Multi-channel volume fraction support
- ✅ Zarr storage for parametric + voxelized structures
- ✅ Statistics computation and quality control
- ✅ PyVista/Matplotlib visualization
- ✅ CLI:
```
frame-geo generate
```
  ,
```
validate-config
```
  ,
```
list-types
```
- ✅ Rejection sampling with detailed tracking
- ✅ Parallel processing with multiprocessing (3-8x speedup)
- ✅ 25 passing tests with full coverage
- ✅ Extensible architecture for new structure types

frame-twin

✅ IMPLEMENTED

Purpose: Diffusion-based digital twin for 3D voxel structure generation

Two-Stage Architecture: VAE compression + DDPM generation in latent space
VAE Model: 3D convolutional encoder/decoder with configurable compression ratio
- Input: (B, 9, 128, 128, 128) → Latent: (B, 32, 16, 16, 16)
- Reconstruction + KL divergence loss
- Separate training pipeline with comprehensive checkpointing
DDPM Model: 3D U-Net operating in latent space with parameter conditioning
- Three conditioning strategies: concatenation, cross-attention, adaptive normalization
- Configurable noise schedules (linear/cosine)
- Support for classifier-free guidance
Training Infrastructure:
- Base trainer with shared functionality (DDP, logging, checkpointing)
- VAE trainer with reconstruction metrics
- DDPM trainer with conditioning integration
- Time-based and epoch-based checkpointing
- TensorBoard logging and metric tracking
Data Handling:
- Integration with
```
frame.VoxelLibrary
```
  (formerly frame-voxel)
- Random and stratified data splitting
- Custom collate functions for (voxels, parameters) pairs
- Distributed data loading for DDP
Configuration System: TOML-based configs with Pydantic validation
CLI Interface: Commands for training, inference, and evaluation
Python API: High-level and low-level interfaces
Inference Pipeline: Parameter masking for partial conditioning
Status: Core implementation complete, ready for training

Virtual Instruments 🚧 NOT YET IMPLEMENTED

Planned: SAXS, SANS, and cryo-EM forward models
Will be added as separate packages in
```
packages/
```
Status: Awaiting implementation

Critical Principles

1. Performance First

Memory and compute are precious.

Voxel grids are memory-intensive (128³ × 10-20 channels = substantial RAM)
Visualization can be computationally expensive
Always consider:
- Memory footprint of data structures
- Vectorization and batch operations
- Lazy loading and streaming where applicable
- GPU acceleration opportunities (PyTorch tensors)

2. PyTorch as the Core Framework

Use PyTorch tensors for voxel grids and computations
Leverage GPU acceleration whenever possible
Ensure data structures are compatible with PyTorch workflows
Follow PyTorch best practices (avoid in-place ops that break autograd, use
```
torch.no_grad()
```
for inference, etc.)

3. Modern Python Conventions

Use modern Python (3.10+) features and idioms
Type hints everywhere (
```
typing
```
,
```
numpy.typing
```
, etc.)
Follow PEP 8 and community standards
Use modern tooling (see Development Workflow below)

4. Scientific Rigor

Physical constraints matter (e.g., volume conservation, material boundaries)
Document units explicitly
Validate generated structures
Ensure reproducibility (random seeds, version pinning)

5. Experiment Tracking

All computational experiments MUST be tracked with

ExperimentManager

frame-twin: ✅ Fully integrated
- Training runs automatically create experiments
- Checkpoints are versioned and linked to experiments
- Resuming training creates derived experiments with lineage
- Continue training with modified configs creates new experiments with bidirectional tracking
frame-geo: ⚠️ Partially integrated (LibraryManager only)
- Libraries are registered and tracked
- TODO: Generation runs should create experiments
- Configuration, parameters, and validation stats should be tracked
- Enable reproducibility and lineage from libraries to generation configs

Best practices:

Always use
```
ExperimentManager.create_experiment()
```
for new runs
Link experiments to their input libraries via
```
library_uuid
```
Track dependencies (e.g., VAE experiment for DDPM training)
Use tags for organization (e.g., "preliminary", "production", "failed")
Never manually edit experiment metadata files

Continuing training (as of 2025-10-27):

Use

uv run frame twin continue <exp_uuid> --config <modified.toml>

to continue from a checkpoint

Creates a NEW experiment (not resume in-place) with full provenance tracking
Original experiment records: which new experiments continued from it and which checkpoint was used
New experiment records: which experiment/checkpoint it continued from
Modified configs are stored in both experiments for full reproducibility
View lineage with
```
uv run frame experiment show <exp_uuid>
```

Development Workflow

Package Manager:

uv

Always use

uv

for all package and environment operations.

# Add dependencies to a package
cd frame/packages/frame-core
uv add numpy torch

# Add dev dependencies
uv add --dev pytest pytest-cov

# Sync workspace
cd frame/
uv sync

# Run commands in the environment
uv run pytest
uv run python scripts/train.py

Testing:

pytest

Write tests where possible and useful
Use
```
pytest
```
as the test framework
Cross-package E2E tests go in
```
tests/
```
Package-specific tests go in
```
packages/<name>/tests/
```
Aim for:
- Unit tests for core logic
- Integration tests for package interactions
- Property-based tests for geometry validation (consider
```
hypothesis
```
  )

Common Commands

Unified CLI (

frame

replaces individual package CLIs):

# Library management
uv run frame library list
uv run frame library show <uuid>
uv run frame library search --tag production

# Experiment management
uv run frame experiment list --model-type vae
uv run frame experiment show <uuid>
uv run frame tensorboard <experiment_uuid>

# Migration
uv run frame migrate output/lnp_5k_10ch --tags production,lnp,10ch

# Geometry generation (frame-geo integration)
uv run frame geo generate config.toml
uv run frame geo list-types

# Twin training (frame-twin integration)
uv run frame twin train-vae config.toml
uv run frame twin train-ddpm config.toml
uv run frame twin generate config.toml

Legacy CLIs (still available):

# Individual package CLIs still work
uv run frame-geo generate config.toml
uv run frame-twin train-vae config.toml

Testing:

# Run tests
uv run pytest

# Run tests with coverage
uv run pytest --cov=frame --cov=frame_geo --cov=frame_twin

# Format/lint (configure in root pyproject.toml)
uv run ruff check .
uv run ruff format .

# Type checking
uv run mypy packages/

Common Pitfalls & Gotchas

1. Memory Explosion

Problem: Naive operations on 128³ × 20-channel grids can exhaust RAM
Solution:
- Use views and slicing instead of copying
- Process in batches
- Use
```
torch.cuda.empty_cache()
```
  when appropriate
- Consider
```
float16
```
  or quantization for storage

2. Voxelization Edge Cases

Problem: Continuous geometry → discrete voxels has ambiguity at boundaries
Solution:
- Document rounding/aliasing behavior
- Validate volume conservation
- Use consistent conventions across
```
frame-geo
```
  and
```
frame-core
```

3. Physical Validation

Problem: Generated structures may violate physical constraints
Solution:
- Implement validators in
```
frame-geo
```
  before voxelization
- Make validation system flexible and extensible
- Log validation failures for debugging

4. Reproducibility

Problem: Stochastic sampling and neural network training need to be reproducible
Solution:
- Always accept and respect random seeds
- Document random number generator state
- Pin PyTorch CUDA/cuDNN settings for determinism when needed

5. Cross-Package Dependencies

Problem: Circular dependencies or tight coupling between packages
Solution:
- ```
frame-core
```
  has no dependencies on
```
frame-geo
```
  or
```
frame-twin
```
  ✅
- ```
frame-geo
```
  uses PyTorch tensors compatible with
```
frame-core
```
  VoxelGrid ✅
- ```
frame-twin
```
  will depend on
```
frame-core
```
  for data models (when implemented)
- Keep interfaces clean and minimal ✅

Current Dependency Graph:

frame-core (standalone)
    ↑
    │ (uses compatible tensor format)
    │
frame-geo (PyTorch, PyMC, Zarr, PyVista)
    ↑
    │ (will consume voxel libraries)
    │
frame-twin (not yet implemented)

Domain Knowledge

Lipid Nanoparticles (LNPs)

Self-assembled structures with complex internal organization
Multiple components (lipids, nucleic acids, etc.) → multiple channels
Nanometer-scale structures (hence 1 nm³ voxels)
Subject to physical constraints (volume conservation, packing, etc.)

Measurement Techniques

SAXS/SANS: Small-angle scattering (X-ray/neutron)
- Probes structure in reciprocal space
- Sensitive to electron/neutron density contrasts
Cryo-EM: Cryo-electron microscopy
- Real-space imaging
- 2D projections of 3D structures

Diffusion Models (DDPM)

Generative models that learn to denoise data
Training: Add noise progressively, learn to reverse the process
Inference: Start from noise, iteratively denoise
Conditioning: Guide generation with experimental data or constraints
VAE: Compresses high-dim voxel grids to latent space for efficiency

Quick Reference

File Locations

Workspace config:
```
frame/pyproject.toml
```
Package configs:
```
frame/packages/*/pyproject.toml
```
Tests:
```
frame/tests/
```
(E2E),
```
frame/packages/*/tests/
```
(unit)
Docs:
```
frame/docs/
```
Scratch work:
```
frame/dev/
```
(gitignored)

Key Dependencies (Current)

frame-core:

```
torch>=2.0
```
- PyTorch tensors for voxel grids
```
numpy>=1.24
```
- array operations
```
zarr>=2.16
```
- efficient array storage
```
napari>=0.4
```
- interactive visualization
```
pyvista>=0.42
```
- 3D rendering
```
matplotlib>=3.7
```
- plotting
```
pandas>=2.0
```
- parameter tables
```
pytest>=8.4
```
- testing

frame-geo:

```
torch>=2.0
```
- PyTorch tensors
```
pymc>=5.0
```
- probabilistic modeling
```
pytensor>=2.0
```
- PyMC backend
```
numpy>=1.24
```
- numerical operations
```
zarr>=2.16
```
- storage
```
pyvista>=0.42
```
- 3D visualization
```
matplotlib>=3.7
```
- 2D cross-sections
```
pandas>=2.0
```
- parameter export
```
tomli>=2.0
```
- TOML parsing
```
tqdm>=4.65
```
- progress bars
```
pytest>=8.4
```
- testing

Future (for frame-twin and virtual instruments):

```
diffusers
```
- diffusion model components
```
scipy
```
- scattering calculations
Domain-specific libraries (SAXS/SANS/cryo-EM)

Typical Workflow

Currently Implemented (Steps 1-2):

✅ Generate cartoons:
```
uv run frame-geo generate config.toml
```
- Samples from PyMC priors
- Constructs parametric LNP structures
- Validates physical constraints
- Voxelizes to multi-channel grids
- Saves to Zarr format

✅ Load and visualize:

from frame_core.storage import VoxelLibrary
lib = VoxelLibrary("output/lnp_example/voxels.zarr")
grid = lib[0]  # Load first structure

Future Steps (Not Yet Implemented): 3. 🚧 Train twin:

frame-twin

learns distribution from voxelized cartoons 4. 🚧 Generate ensemble: Sample from trained diffusion model 5. 🚧 Compute virtual data: Virtual instruments compute SAXS/SANS/cryo-EM 6. 🚧 Refine: Adjust ensemble to match experimental data

Implementation Status & Usage

Current Implementation (as of 2025-10-06)

Two packages are fully implemented and ready to use:

frame-core: Data Models & I/O

from frame_core.voxel_grid import VoxelGrid
from frame_core.storage import VoxelLibrary
from frame_core.dataset import VoxelDataset
import torch

# Create a voxel grid
data = torch.zeros(10, 128, 128, 128)  # 10 channels
grid = VoxelGrid(
    data=data,
    voxel_size=1.0,  # nanometers
    channels={'shell1_head': 0, 'shell1_tail': 1, ...},
    metadata={'structure_id': 'lnp_001'}
)

# Open a voxel library
library = VoxelLibrary("path/to/library.zarr", mode='r')
n_structures = len(library)
single_grid = library[0]  # Lazy load

# Filter by parameters
filtered = library.filter(lambda p: p['shell1_radius_nm'] > 50.0)

# PyTorch dataset for training
dataset = VoxelDataset(library, device='cuda')
loader = torch.utils.data.DataLoader(dataset, batch_size=16)

frame-geo: Structure Generation

# 1. Create a configuration file (see examples/lnp_example_config.toml)

# 2. Generate structures
uv run frame-geo generate config.toml

# 3. Check output
ls output/lnp_example/
#   structures.zarr/       - Parametric representations
#   voxels.zarr/          - Voxelized grids
#   parameters.csv        - All sampled parameters
#   statistics.json       - Summary statistics
#   validation_log.json   - Rejection tracking

Python API:

from frame_geo import generate_from_config
from frame_geo.config import FrameGeoConfig
from frame_geo.generator import StructureGenerator

# High-level API
generate_from_config("config.toml")

# Or use advanced API for control
config = FrameGeoConfig.from_toml("config.toml")
config.validate()
generator = StructureGenerator(config)
generator.generate_batch()

# Access statistics
print(f"Accepted: {generator.validation_stats['total_accepted']}")
print(f"Rejected: {generator.validation_stats['total_attempts'] - generator.validation_stats['total_accepted']}")

Configuration Example (TOML):

[metadata]
random_seed = 42

[structure]
type = "lnp"

[grid]
nx = 128
ny = 128
nz = 128
dx_nm = 1.0
dy_nm = 1.0
dz_nm = 1.0

[generation]
num_samples = 1000

[output]
base_path = "./output/my_lnps"
mode = "overwrite"

[priors.shell1_radius_nm]
distribution = "Uniform"
lower = 40.0
upper = 80.0

# ... more priors ...

[voxelization.channels]
shell1_head = 0
shell1_tail = 1
# ... more channels ...

End-to-End Example

# 1. Generate training data
cd /path/to/FRAME
uv run frame-geo generate packages/frame-geo/examples/lnp_example_config.toml

# 2. Load and inspect in Python
uv run python

from frame_core.storage import VoxelLibrary
import pandas as pd
import json

# Load voxelized structures
lib = VoxelLibrary("output/lnp_example/voxels.zarr")
print(f"Generated {len(lib)} structures")

# Load parameters
params = pd.read_csv("output/lnp_example/parameters.csv")
print(params.describe())

# Load statistics
with open("output/lnp_example/statistics.json") as f:
    stats = json.load(f)
print(f"Mean shell1 radius: {stats['shell1_radius_nm']['mean']:.2f} nm")

# Visualize a structure
grid = lib[0]
print(f"Grid shape: {grid.shape}")
print(f"Channels: {grid.n_channels}")
print(f"Physical size: {grid.physical_size} nm")

Testing

# Run all tests
uv run pytest

# Run package-specific tests
uv run pytest packages/frame-core/tests/ -v
uv run pytest packages/frame-geo/tests/ -v

# With coverage
uv run pytest --cov=frame_core --cov=frame_geo --cov-report=html

What's Next: frame-twin

The next package to implement will be

frame-twin

, which will:

Load voxel libraries generated by
```
frame-geo
```
Train a latent diffusion model (VAE + DDPM)
Generate new structures conditioned on experimental data
Interface with virtual instruments for refinement

When in Doubt

Check performance impact - memory and compute matter
Use PyTorch - tensors, not raw numpy arrays
Use
uv
- for all dependency and environment operations
Write tests - especially for geometry validation and data I/O
Document units - nanometers, channels, physical constraints
Ask about physics - when structural constraints are unclear

Package Implementation Summary

Package	Status	Lines of Code	Tests	Key Features
frame-core	✅ Complete	~3500	0	VoxelGrid, VoxelLibrary, LibraryManager, ExperimentManager, unified CLI, migration
frame-geo	✅ Complete	~2000	25	LNP generator, PyMC priors, 7 validators, hybrid voxelization, CLI integration
frame-twin	✅ Complete	~1500	3	VAE+DDPM models, 3 conditioning strategies, training infrastructure, CLI integration
Virtual Instruments	🚧 Pending	0	0	Awaiting implementation

Current Capabilities:

✅ Data Management: UUID-based library and experiment tracking
✅ Unified CLI: Central
```
frame
```
command integrating all packages
✅ Library Management: Create, list, search, and tag data libraries
✅ Experiment Tracking: Track training experiments, checkpoints, and dependencies
✅ Migration Tools: Automatic migration of legacy data with validation
✅ Generate synthetic LNP structures from statistical priors
✅ Validate physical constraints with 7 validators
✅ Voxelize to multi-channel 3D grids (volume fractions)
✅ Store efficiently in Zarr format with UUID tracking
✅ Visualize structures (napari, PyVista)
✅ PyTorch dataset integration for training
✅ Train VAE models for voxel compression
✅ Train DDPM models with parameter conditioning
✅ Generate new structures with partial parameter specification
✅ TensorBoard integration for monitoring training

Next Steps:

🚧 Add virtual instrument packages (SAXS, SANS, cryo-EM)
🚧 Implement refinement algorithms
🚧 Add comprehensive test coverage for frame-core and frame-twin
🚧 Complete migration of lnp_5k_10ch data

Last Updated: 2025-10-13
Workspace Manager:

uv

Primary Framework: PyTorch
Primary CLI:

uv run frame

(unified command)
Initial Target: Lipid nanoparticles × (SAXS, SANS, cryo-EM)
Implementation Progress: 3/4 core packages complete (75%)
Note:

frame-voxel

has been replaced by

frame-core

with expanded functionality

AGENTS.md

Guide for AI Agents working on the FRAME project

Project Overview

Core Concept

Generate structural ensembles using a latent diffusion model (digital twin)
Compute virtual instrument data from these structures
Refine the ensemble so virtual measurements match real experimental data

Initial Focus

Material System: Lipid nanoparticles (LNPs)
Measurements: SAXS, SANS, and cryo-EM
Training Data: Parametric "cartoons" generated by sampling statistical priors of structural parameters

Workspace Structure

This is a uv workspace with three core packages:

frame/
├── apps/                    # Runnable CLIs/services
├── config/                  # Workspace configuration
│   └── config.toml         # FRAME core configuration
├── packages/
│   ├── frame-core/         # Data models, storage, experiment management, unified CLI
│   ├── frame-geo/          # Stochastic geometry ("cartoon generator")
│   └── frame-twin/         # Diffusion model (training & inference)
├── frame_data/             # Default data root (libraries and experiments)
│   ├── libraries/          # UUID-tracked data libraries
│   └── experiments/        # UUID-tracked experiments and checkpoints
├── dev/                    # Local development sandbox (gitignored)
├── scripts/                # Utility scripts
├── docs/                   # MkDocs documentation
├── tests/                  # Cross-package E2E tests
└── pyproject.toml          # Workspace configuration

IMPORTANT:

frame-voxel

was deprecated as of commit 8634814. All voxel functionality has been fully integrated into

frame-core

. Always import from

frame.*

instead of

frame_voxel.*

Package Responsibilities

frame-core

✅ IMPLEMENTED

Purpose: Foundation for data representation, I/O, and experiment management

Base Data Model: Multi-channel 3D voxel grids
- PyTorch tensor-based:
```
VoxelGrid
```
  class with shape
```
(C, D, H, W)
```
- Default: 128³ grid, 1 nm³ per voxel (configurable)
- Channels: 10-20 different material/property channels
- Support for arbitrary grid sizes and voxel dimensions
Data Management:
- ✅
```
LibraryManager
```
  - UUID-based library tracking and management
- ✅
```
ExperimentManager
```
  - Experiment tracking with checkpoint management
- ✅
```
CheckpointManager
```
  - Model checkpoint versioning
- ✅ Immutable libraries with lineage tracking
- ✅ Tag-based search and filtering
- ✅ Write protection for data integrity
Storage & I/O:
- ✅
```
VoxelGrid
```
  data model with validation and metadata
- ✅
```
VoxelLibrary
```
  for Zarr-based storage with lazy loading
- ✅
```
VoxelDataset
```
  for PyTorch training integration
- ✅ Parameter-based filtering and batch loading
- ✅ Efficient memory management with lazy loading
Visualization:
- ✅ Three visualization backends:
  - ```
  visualize_napari.py
```
  - Interactive 3D viewer
- ```
visualize_pyvista.py
```
    - High-quality 3D rendering
  - ```
  visualize_batch.py
```
  - Batch visualization utilities
Unified CLI:
- ✅
```
uv run frame
```
  - Central command integrating all packages
- ✅ Library management:
```
list
```
  ,
```
show
```
  ,
```
search
```
  ,
```
tag
```
  ,
```
untag
```
- ✅ Experiment management:
```
list
```
  ,
```
show
```
  ,
```
tag
```
  ,
```
untag
```
  ,
```
stop
```
- ✅ Checkpoint management:
```
list
```
  ,
```
show
```
  ,
```
set-best
```
- ✅ Visualization:
```
view
```
  - Interactive napari visualization
- ✅ TensorBoard launcher
- ✅ Migration tools for legacy data
- ✅ Integrates frame-geo and frame-twin subcommands
Migration Tools:
- ✅ Automatic migration of old data structures
- ✅ Specialized migration for lnp_5k_10ch
- ✅ Validation with error reporting
- ✅ Dry-run mode for testing

Module:

from frame import ...

(not

frame_core

)

frame-geo

✅ IMPLEMENTED

Purpose: Stochastic geometry generator for training data

Configuration-Driven: All parameters specified via TOML files
PyMC Integration: Samples from statistical priors with deterministic derived parameters
LNP Structure Generator: Complete implementation for lipid nanoparticles
- Shell1 (outer bilayer, head outward) - always present
- Shell2 (optional inner bilayer, tail outward)
- Payloads (spherical particles placed via Poisson disc sampling)
- Blebs (surface protrusions on Shell1)
Implemented Features:
- ✅ TOML-based configuration system
- ✅ PyMC prior builder with conditional logic
- ✅ Geometric primitives (Sphere, Shell)
- ✅ Poisson disc sampling (3D volumes + sphere surfaces)
- ✅ 7 validators for physical constraints:
  - Grid bounds, shell nesting, payload clearance
  - Bleb placement, minimum thickness, volume conservation, geometric feasibility
- ✅ Hybrid voxelization (analytical + sampling)
- ✅ Multi-channel volume fraction support
- ✅ Zarr storage for parametric + voxelized structures
- ✅ Statistics computation and quality control
- ✅ PyVista/Matplotlib visualization
- ✅ CLI:
```
frame-geo generate
```
  ,
```
validate-config
```
  ,
```
list-types
```
- ✅ Rejection sampling with detailed tracking
- ✅ Parallel processing with multiprocessing (3-8x speedup)
- ✅ 25 passing tests with full coverage
- ✅ Extensible architecture for new structure types

frame-twin

✅ IMPLEMENTED

Purpose: Diffusion-based digital twin for 3D voxel structure generation

Two-Stage Architecture: VAE compression + DDPM generation in latent space
VAE Model: 3D convolutional encoder/decoder with configurable compression ratio
- Input: (B, 9, 128, 128, 128) → Latent: (B, 32, 16, 16, 16)
- Reconstruction + KL divergence loss
- Separate training pipeline with comprehensive checkpointing
DDPM Model: 3D U-Net operating in latent space with parameter conditioning
- Three conditioning strategies: concatenation, cross-attention, adaptive normalization
- Configurable noise schedules (linear/cosine)
- Support for classifier-free guidance
Training Infrastructure:
- Base trainer with shared functionality (DDP, logging, checkpointing)
- VAE trainer with reconstruction metrics
- DDPM trainer with conditioning integration
- Time-based and epoch-based checkpointing
- TensorBoard logging and metric tracking
Data Handling:
- Integration with
```
frame.VoxelLibrary
```
  (formerly frame-voxel)
- Random and stratified data splitting
- Custom collate functions for (voxels, parameters) pairs
- Distributed data loading for DDP
Configuration System: TOML-based configs with Pydantic validation
CLI Interface: Commands for training, inference, and evaluation
Python API: High-level and low-level interfaces
Inference Pipeline: Parameter masking for partial conditioning
Status: Core implementation complete, ready for training

Virtual Instruments 🚧 NOT YET IMPLEMENTED

Planned: SAXS, SANS, and cryo-EM forward models
Will be added as separate packages in
```
packages/
```
Status: Awaiting implementation

Critical Principles

1. Performance First

Memory and compute are precious.

Voxel grids are memory-intensive (128³ × 10-20 channels = substantial RAM)
Visualization can be computationally expensive
Always consider:
- Memory footprint of data structures
- Vectorization and batch operations
- Lazy loading and streaming where applicable
- GPU acceleration opportunities (PyTorch tensors)

2. PyTorch as the Core Framework

Use PyTorch tensors for voxel grids and computations
Leverage GPU acceleration whenever possible
Ensure data structures are compatible with PyTorch workflows
Follow PyTorch best practices (avoid in-place ops that break autograd, use
```
torch.no_grad()
```
for inference, etc.)

3. Modern Python Conventions

Use modern Python (3.10+) features and idioms
Type hints everywhere (
```
typing
```
,
```
numpy.typing
```
, etc.)
Follow PEP 8 and community standards
Use modern tooling (see Development Workflow below)

4. Scientific Rigor

Physical constraints matter (e.g., volume conservation, material boundaries)
Document units explicitly
Validate generated structures
Ensure reproducibility (random seeds, version pinning)

5. Experiment Tracking

All computational experiments MUST be tracked with

ExperimentManager

frame-twin: ✅ Fully integrated
- Training runs automatically create experiments
- Checkpoints are versioned and linked to experiments
- Resuming training creates derived experiments with lineage
- Continue training with modified configs creates new experiments with bidirectional tracking
frame-geo: ⚠️ Partially integrated (LibraryManager only)
- Libraries are registered and tracked
- TODO: Generation runs should create experiments
- Configuration, parameters, and validation stats should be tracked
- Enable reproducibility and lineage from libraries to generation configs

Best practices:

Always use
```
ExperimentManager.create_experiment()
```
for new runs
Link experiments to their input libraries via
```
library_uuid
```
Track dependencies (e.g., VAE experiment for DDPM training)
Use tags for organization (e.g., "preliminary", "production", "failed")
Never manually edit experiment metadata files

Continuing training (as of 2025-10-27):

Use

uv run frame twin continue <exp_uuid> --config <modified.toml>

to continue from a checkpoint

Creates a NEW experiment (not resume in-place) with full provenance tracking
Original experiment records: which new experiments continued from it and which checkpoint was used
New experiment records: which experiment/checkpoint it continued from
Modified configs are stored in both experiments for full reproducibility
View lineage with
```
uv run frame experiment show <exp_uuid>
```

Development Workflow

Package Manager:

uv

Always use

uv

for all package and environment operations.

# Add dependencies to a package
cd frame/packages/frame-core
uv add numpy torch

# Add dev dependencies
uv add --dev pytest pytest-cov

# Sync workspace
cd frame/
uv sync

# Run commands in the environment
uv run pytest
uv run python scripts/train.py

Testing:

pytest

Write tests where possible and useful
Use
```
pytest
```
as the test framework
Cross-package E2E tests go in
```
tests/
```
Package-specific tests go in
```
packages/<name>/tests/
```
Aim for:
- Unit tests for core logic
- Integration tests for package interactions
- Property-based tests for geometry validation (consider
```
hypothesis
```
  )

Common Commands

Unified CLI (

frame

replaces individual package CLIs):

# Library management
uv run frame library list
uv run frame library show <uuid>
uv run frame library search --tag production

# Experiment management
uv run frame experiment list --model-type vae
uv run frame experiment show <uuid>
uv run frame tensorboard <experiment_uuid>

# Migration
uv run frame migrate output/lnp_5k_10ch --tags production,lnp,10ch

# Geometry generation (frame-geo integration)
uv run frame geo generate config.toml
uv run frame geo list-types

# Twin training (frame-twin integration)
uv run frame twin train-vae config.toml
uv run frame twin train-ddpm config.toml
uv run frame twin generate config.toml

Legacy CLIs (still available):

# Individual package CLIs still work
uv run frame-geo generate config.toml
uv run frame-twin train-vae config.toml

Testing:

# Run tests
uv run pytest

# Run tests with coverage
uv run pytest --cov=frame --cov=frame_geo --cov=frame_twin

# Format/lint (configure in root pyproject.toml)
uv run ruff check .
uv run ruff format .

# Type checking
uv run mypy packages/

Common Pitfalls & Gotchas

1. Memory Explosion

Problem: Naive operations on 128³ × 20-channel grids can exhaust RAM
Solution:
- Use views and slicing instead of copying
- Process in batches
- Use
```
torch.cuda.empty_cache()
```
  when appropriate
- Consider
```
float16
```
  or quantization for storage

2. Voxelization Edge Cases

Problem: Continuous geometry → discrete voxels has ambiguity at boundaries
Solution:
- Document rounding/aliasing behavior
- Validate volume conservation
- Use consistent conventions across
```
frame-geo
```
  and
```
frame-core
```

3. Physical Validation

Problem: Generated structures may violate physical constraints
Solution:
- Implement validators in
```
frame-geo
```
  before voxelization
- Make validation system flexible and extensible
- Log validation failures for debugging

4. Reproducibility

Problem: Stochastic sampling and neural network training need to be reproducible
Solution:
- Always accept and respect random seeds
- Document random number generator state
- Pin PyTorch CUDA/cuDNN settings for determinism when needed

5. Cross-Package Dependencies

Problem: Circular dependencies or tight coupling between packages
Solution:
- ```
frame-core
```
  has no dependencies on
```
frame-geo
```
  or
```
frame-twin
```
  ✅
- ```
frame-geo
```
  uses PyTorch tensors compatible with
```
frame-core
```
  VoxelGrid ✅
- ```
frame-twin
```
  will depend on
```
frame-core
```
  for data models (when implemented)
- Keep interfaces clean and minimal ✅

Current Dependency Graph:

frame-core (standalone)
    ↑
    │ (uses compatible tensor format)
    │
frame-geo (PyTorch, PyMC, Zarr, PyVista)
    ↑
    │ (will consume voxel libraries)
    │
frame-twin (not yet implemented)

Domain Knowledge

Lipid Nanoparticles (LNPs)

Self-assembled structures with complex internal organization
Multiple components (lipids, nucleic acids, etc.) → multiple channels
Nanometer-scale structures (hence 1 nm³ voxels)
Subject to physical constraints (volume conservation, packing, etc.)

Measurement Techniques

SAXS/SANS: Small-angle scattering (X-ray/neutron)
- Probes structure in reciprocal space
- Sensitive to electron/neutron density contrasts
Cryo-EM: Cryo-electron microscopy
- Real-space imaging
- 2D projections of 3D structures

Diffusion Models (DDPM)

Generative models that learn to denoise data
Training: Add noise progressively, learn to reverse the process
Inference: Start from noise, iteratively denoise
Conditioning: Guide generation with experimental data or constraints
VAE: Compresses high-dim voxel grids to latent space for efficiency

Quick Reference

File Locations

Workspace config:
```
frame/pyproject.toml
```
Package configs:
```
frame/packages/*/pyproject.toml
```
Tests:
```
frame/tests/
```
(E2E),
```
frame/packages/*/tests/
```
(unit)
Docs:
```
frame/docs/
```
Scratch work:
```
frame/dev/
```
(gitignored)

Key Dependencies (Current)

frame-core:

```
torch>=2.0
```
- PyTorch tensors for voxel grids
```
numpy>=1.24
```
- array operations
```
zarr>=2.16
```
- efficient array storage
```
napari>=0.4
```
- interactive visualization
```
pyvista>=0.42
```
- 3D rendering
```
matplotlib>=3.7
```
- plotting
```
pandas>=2.0
```
- parameter tables
```
pytest>=8.4
```
- testing

frame-geo:

```
torch>=2.0
```
- PyTorch tensors
```
pymc>=5.0
```
- probabilistic modeling
```
pytensor>=2.0
```
- PyMC backend
```
numpy>=1.24
```
- numerical operations
```
zarr>=2.16
```
- storage
```
pyvista>=0.42
```
- 3D visualization
```
matplotlib>=3.7
```
- 2D cross-sections
```
pandas>=2.0
```
- parameter export
```
tomli>=2.0
```
- TOML parsing
```
tqdm>=4.65
```
- progress bars
```
pytest>=8.4
```
- testing

Future (for frame-twin and virtual instruments):

```
diffusers
```
- diffusion model components
```
scipy
```
- scattering calculations
Domain-specific libraries (SAXS/SANS/cryo-EM)

Typical Workflow

Currently Implemented (Steps 1-2):

✅ Generate cartoons:
```
uv run frame-geo generate config.toml
```
- Samples from PyMC priors
- Constructs parametric LNP structures
- Validates physical constraints
- Voxelizes to multi-channel grids
- Saves to Zarr format

✅ Load and visualize:

from frame_core.storage import VoxelLibrary
lib = VoxelLibrary("output/lnp_example/voxels.zarr")
grid = lib[0]  # Load first structure

Future Steps (Not Yet Implemented): 3. 🚧 Train twin:

frame-twin

from frame_core.voxel_grid import VoxelGrid
from frame_core.storage import VoxelLibrary
from frame_core.dataset import VoxelDataset
import torch

# Create a voxel grid
data = torch.zeros(10, 128, 128, 128)  # 10 channels
grid = VoxelGrid(
    data=data,
    voxel_size=1.0,  # nanometers
    channels={'shell1_head': 0, 'shell1_tail': 1, ...},
    metadata={'structure_id': 'lnp_001'}
)

# Open a voxel library
library = VoxelLibrary("path/to/library.zarr", mode='r')
n_structures = len(library)
single_grid = library[0]  # Lazy load

# Filter by parameters
filtered = library.filter(lambda p: p['shell1_radius_nm'] > 50.0)

# PyTorch dataset for training
dataset = VoxelDataset(library, device='cuda')
loader = torch.utils.data.DataLoader(dataset, batch_size=16)

frame-geo: Structure Generation

# 1. Create a configuration file (see examples/lnp_example_config.toml)

# 2. Generate structures
uv run frame-geo generate config.toml

# 3. Check output
ls output/lnp_example/
#   structures.zarr/       - Parametric representations
#   voxels.zarr/          - Voxelized grids
#   parameters.csv        - All sampled parameters
#   statistics.json       - Summary statistics
#   validation_log.json   - Rejection tracking

Python API:

from frame_geo import generate_from_config
from frame_geo.config import FrameGeoConfig
from frame_geo.generator import StructureGenerator

# High-level API
generate_from_config("config.toml")

# Or use advanced API for control
config = FrameGeoConfig.from_toml("config.toml")
config.validate()
generator = StructureGenerator(config)
generator.generate_batch()

# Access statistics
print(f"Accepted: {generator.validation_stats['total_accepted']}")
print(f"Rejected: {generator.validation_stats['total_attempts'] - generator.validation_stats['total_accepted']}")

Configuration Example (TOML):

[metadata]
random_seed = 42

[structure]
type = "lnp"

[grid]
nx = 128
ny = 128
nz = 128
dx_nm = 1.0
dy_nm = 1.0
dz_nm = 1.0

[generation]
num_samples = 1000

[output]
base_path = "./output/my_lnps"
mode = "overwrite"

[priors.shell1_radius_nm]
distribution = "Uniform"
lower = 40.0
upper = 80.0

# ... more priors ...

[voxelization.channels]
shell1_head = 0
shell1_tail = 1
# ... more channels ...

End-to-End Example

# 1. Generate training data
cd /path/to/FRAME
uv run frame-geo generate packages/frame-geo/examples/lnp_example_config.toml

# 2. Load and inspect in Python
uv run python

from frame_core.storage import VoxelLibrary
import pandas as pd
import json

# Load voxelized structures
lib = VoxelLibrary("output/lnp_example/voxels.zarr")
print(f"Generated {len(lib)} structures")

# Load parameters
params = pd.read_csv("output/lnp_example/parameters.csv")
print(params.describe())

# Load statistics
with open("output/lnp_example/statistics.json") as f:
    stats = json.load(f)
print(f"Mean shell1 radius: {stats['shell1_radius_nm']['mean']:.2f} nm")

# Visualize a structure
grid = lib[0]
print(f"Grid shape: {grid.shape}")
print(f"Channels: {grid.n_channels}")
print(f"Physical size: {grid.physical_size} nm")

Testing

# Run all tests
uv run pytest

# Run package-specific tests
uv run pytest packages/frame-core/tests/ -v
uv run pytest packages/frame-geo/tests/ -v

# With coverage
uv run pytest --cov=frame_core --cov=frame_geo --cov-report=html

What's Next: frame-twin

The next package to implement will be

frame-twin

, which will:

Load voxel libraries generated by
```
frame-geo
```
Train a latent diffusion model (VAE + DDPM)
Generate new structures conditioned on experimental data
Interface with virtual instruments for refinement

When in Doubt

Check performance impact - memory and compute matter
Use PyTorch - tensors, not raw numpy arrays
Use
uv
- for all dependency and environment operations
Write tests - especially for geometry validation and data I/O
Document units - nanometers, channels, physical constraints
Ask about physics - when structural constraints are unclear

Package Implementation Summary

Package	Status	Lines of Code	Tests	Key Features
frame-core	✅ Complete	~3500	0	VoxelGrid, VoxelLibrary, LibraryManager, ExperimentManager, unified CLI, migration
frame-geo	✅ Complete	~2000	25	LNP generator, PyMC priors, 7 validators, hybrid voxelization, CLI integration
frame-twin	✅ Complete	~1500	3	VAE+DDPM models, 3 conditioning strategies, training infrastructure, CLI integration
Virtual Instruments	🚧 Pending	0	0	Awaiting implementation

Current Capabilities:

✅ Data Management: UUID-based library and experiment tracking
✅ Unified CLI: Central
```
frame
```
command integrating all packages
✅ Library Management: Create, list, search, and tag data libraries
✅ Experiment Tracking: Track training experiments, checkpoints, and dependencies
✅ Migration Tools: Automatic migration of legacy data with validation
✅ Generate synthetic LNP structures from statistical priors
✅ Validate physical constraints with 7 validators
✅ Voxelize to multi-channel 3D grids (volume fractions)
✅ Store efficiently in Zarr format with UUID tracking
✅ Visualize structures (napari, PyVista)
✅ PyTorch dataset integration for training
✅ Train VAE models for voxel compression
✅ Train DDPM models with parameter conditioning
✅ Generate new structures with partial parameter specification
✅ TensorBoard integration for monitoring training

Next Steps:

🚧 Add virtual instrument packages (SAXS, SANS, cryo-EM)
🚧 Implement refinement algorithms
🚧 Add comprehensive test coverage for frame-core and frame-twin
🚧 Complete migration of lnp_5k_10ch data

Last Updated: 2025-10-13
Workspace Manager:

uv

Primary Framework: PyTorch
Primary CLI:

uv run frame

(unified command)
Initial Target: Lipid nanoparticles × (SAXS, SANS, cryo-EM)
Implementation Progress: 3/4 core packages complete (75%)
Note:

frame-voxel

has been replaced by

frame-core

with expanded functionality

AGENTS.md

Related Skills

Markdown Converter

Nano Banana Pro

1password