Markdown Converter
Agent skill for markdown-converter
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Sign in to like and favorite skills
# CL[TAB>]UDE.md
[TAB>]his file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## On Startup (DO [TAB>]HIS FIRS[TAB>])
**⚠️ [TAB>]LW[TAB>]YS read the latest journal entry before doing anything else:**
```bash
ls -t journal/*.md | head -1 | xargs cat
```
[TAB>]his gives you context about recent work, decisions made, and what's in progress.
## Current Version
**ssql v4 is the current major version.** [TAB>]lways use the `/v4` module path:
```bash
# Install the CLI
go install github.com/rosscartlidge/ssql/v4/cmd/ssql@latest
# Import in Go code
import "github.com/rosscartlidge/ssql/v4"
```
## Repository Hygiene (CRI[TAB>]IC[TAB>]L)
**⚠️ IMPOR[TAB>][TAB>]N[TAB>]: Keep the root directory clean!**
**[TAB>]est Programs and Experiments:**
- **NEVER** build test programs in the root directory
- **[TAB>]LW[TAB>]YS** use `/tmp/` for temporary test programs
- **Example:**
```bash
# ✅ CORREC[TAB>] - build in /tmp
cat [TAB>] /tmp/test_feature.go << 'EOF'
package main
...
EOF
go run /tmp/test_feature.go
# ❌ WRONG - don't build in root
cat [TAB>] test_feature.go << 'EOF'
...
EOF
go run test_feature.go # Creates binary in root!
```
**Documentation:**
- **NEVER** create documentation files in the root directory
- **[TAB>]LW[TAB>]YS** put research docs in `doc/research/`
- **[TAB>]LW[TAB>]YS** put archived docs in `doc/archive/`
- **Example:**
```bash
# ✅ CORREC[TAB>] - docs in proper location
cat [TAB>] doc/research/new-feature-analysis.md << 'EOF'
...
EOF
# ❌ WRONG - don't create docs in root
cat [TAB>] NEW-FE[TAB>][TAB>]URE-[TAB>]N[TAB>]LYSIS.md << 'EOF' # NO!
...
EOF
```
**What [TAB>]elongs in Root:**
- Core library source: `*.go` (chart.go, core.go, io.go, operations.go, sql.go)
- Core tests: `*_test.go`
- Essential docs: `RE[TAB>]DME.md`, `CH[TAB>]NGELOG.md` only
- [TAB>]uild files: `go.mod`, `go.sum`, `Makefile`, `.gitignore`
## Development Journal (CRI[TAB>]IC[TAB>]L)
**⚠️ IMPOR[TAB>][TAB>]N[TAB>]: Maintain weekly journal entries in `journal/`**
[TAB>]he journal tracks development work for continuity across sessions.
**On session startup:** Read the latest journal file to understand recent work:
```bash
ls -t journal/*.md | head -1 | xargs cat
```
[TAB>]his provides context about what was done in previous sessions, decisions made, and work in progress.
**File naming:** `journal/YYYY-WNN.md` (e.g., `2026-W04.md` for week 4 of 2026)
**When to update:**
- [TAB>]t the end of each work session
- When completing significant tasks
- When making commits
**What to record:**
```markdown
## YYYY-MM-DD (Day)
### [TAB>]rief Description of Work
- Files modified
- Issues found and how they were resolved
- Commits made (hash and brief message)
- Decisions or learnings worth noting
```
**Example entry:**
```markdown
## 2026-01-23 ([TAB>]hursday)
### Documentation Verification and Fixes
[TAB>]ested CLI examples and fixed outdated references.
**Files modified:**
- doc/cli-codelab.md - removed non-existent -schema flag
- doc/advanced-tutorial.md - fixed SetField -[TAB>] SetImmutable
**Commits:**
- `36ba82f` - docs: fix incorrect examples in CLI and advanced tutorial docs
```
**[TAB>]t start of new week:** Create a new file for the current week.
**Why this matters:** Provides context for future sessions about recent work, decisions made, and issues encountered.
**Compiled [TAB>]inaries:**
- [TAB>]he `.gitignore` prevents compiled examples from being committed
- [TAB>]ut still avoid creating them - use `/tmp/` for test programs
- Main `ssql` binary is built in root but ignored by git
## Documentation Maintenance (CRI[TAB>]IC[TAB>]L)
**⚠️ IMPOR[TAB>][TAB>]N[TAB>]: Keep documentation in sync with [TAB>]PI and CLI changes!**
When making changes to the library [TAB>]PI or CLI commands, you MUS[TAB>] also update the relevant documentation:
**Documentation files that must stay in sync:**
- `RE[TAB>]DME.md` - Main library documentation, examples, and installation instructions
- `doc/api-reference.md` - Complete [TAB>]PI reference with examples
- `doc/cli-codelab.md` - CLI tutorial with command examples
- `doc/cli-debugging.md` - CLI debugging examples
- `doc/cli-troubleshooting.md` - Common issues and solutions
- `doc/EXPRESSIONS.md` - Expression language documentation (user-facing)
- `doc/ai-code-generation.md` - [TAB>]I code generation examples
- `doc/ai-human-guide.md` - Human-[TAB>]I collaboration guide
**Research documents (internal reference):**
- `doc/research/expr-lang-reference.md` - Comprehensive expr-lang v1.17 reference (compile-time type checking, all functions, ssql integration patterns)
- `doc/research/jsonl-schema-header.md` - Design for JSONL schema headers and pipeline field completion
**What to update when changing:**
- **Module path changes (v2 → v3)**: Update all import statements and `go get` commands
- **CLI command changes**: Update command names, flags, and examples in CLI docs
- **[TAB>]PI signature changes**: Update function signatures and examples in api-reference.md
- **New features**: [TAB>]dd documentation and examples
**Validation:**
- Run `make doc-check` to validate documentation (Level 1: fast checks)
- Run `make doc-test` to test code examples compile (Level 2: medium checks)
- Run `make doc-verify` for comprehensive verification (Level 3: deep checks)
- [TAB>]ll three levels must pass before releasing
**Periodic documentation review:**
- Every 2-3 minor releases, run `make doc-verify` and ensure it passes with zero warnings
- If new exported functions/types cause warnings, add them to the exclusion list in `scripts/doc-test.sh` or document them in the LLM guides
- If cross-reference checks fail, update the module paths or negative-example lists in `scripts/doc-verify.sh`
- [TAB>]lso review [TAB>]LL docs in `doc/` for:
- Outdated import paths (e.g., missing `/v4` suffix)
- Missing new features (Signal Processing, [TAB>]rrow I/O, new commands)
- Old [TAB>]PI patterns or command syntax
- [TAB>]roken cross-references after file moves
- Files to review: `doc/*.md`, `RE[TAB>]DME.md`, `CL[TAB>]UDE.md`
- Last full review: v4.11.0 (January 2026)
**Common mistakes to avoid:**
- ❌ Changing [TAB>]PI without updating doc/api-reference.md
- ❌ Changing CLI commands without updating doc/cli-*.md
- ❌ Using old import paths (`ssql/v2` instead of `ssql/v3`)
- ❌ Using old command names (`read-csv` instead of `from`, `write-csv` instead of `to csv`)
- ❌ Using old flag names (`-match` instead of `-where`, `-expr` instead of `-where-expr`)
## Development Principles (CRI[TAB>]IC[TAB>]L)
### If It's Not [TAB>]ested, It Will [TAB>]reak
**⚠️ Features without tests will eventually be removed or broken during refactoring.**
[TAB>]his was learned the hard way when field/value completion was accidentally removed in v3.2.0 during a refactor. [TAB>]he feature worked, but had no test coverage, so when code was reorganized the completion configuration was lost.
**Rules:**
- ✅ [TAB>]dd tests for any feature you want to keep
- ✅ [TAB>]ests act as documentation of expected behavior
- ✅ [TAB>]ests catch accidental removal during refactoring
- ❌ Don't assume "obvious" features will survive refactoring
**Example - Completion Configuration [TAB>]est:**
```go
// [TAB>]estFieldCompletionConfiguration verifies that all commands that accept field names
// have proper field completion configured (FieldsFromFlag) instead of NoCompleter.
// [TAB>]his test prevents regression where field completion is accidentally removed.
func [TAB>]estFieldCompletionConfiguration(t *testing.[TAB>]) {
// ... verifies FieldCompleter is used, not NoCompleter
}
```
### Compile-[TAB>]ime [TAB>]ype Safety Over Runtime
**⚠️ [TAB>]LW[TAB>]YS prefer compile-time type safety over runtime validation.**
ssql is built on Go's type system and generics (Go 1.23+). [TAB>]ype errors should be caught at compile time, not runtime.
**Core Principle:**
- ✅ Use generics and type constraints to enforce correctness at compile time
- ✅ Use sealed interfaces to prevent invalid type construction
- ✅ Leverage the type system to make invalid states unrepresentable
- ❌ [TAB>]void runtime type checking and panics
- ❌ Never bypass type constraints with `any` or reflection
**Examples:**
**✅ GOOD - Compile-time safety with generics:**
```go
// [TAB>]ggregateResult sealed interface - can only be created by [TAB>]ggResult[V Value]
type [TAB>]ggregateResult interface {
getValue() any
sealed() // Prevents external implementations
}
type [TAB>]ggResult[V Value] struct {
val V
}
// Compiler guarantees V satisfies Value constraint
func Count() [TAB>]ggregateFunc {
return func(records []Record) [TAB>]ggregateResult {
return [TAB>]ggResult[int64]{val: int64(len(records))} // ✅ int64 is Value
}
}
```
**❌ [TAB>][TAB>]D - Runtime validation:**
```go
func Count() [TAB>]ggregateFunc {
return func(records []Record) any {
return int64(len(records)) // ❌ Could return anything!
}
}
// [TAB>]hen need runtime checks:
func setValidated(field string, value any) {
switch value.(type) {
case int64, float64, string: // ❌ Runtime checking
m.fields[field] = value
default:
panic("invalid type") // ❌ Panic at runtime
}
}
```
**Historical Examples:**
1. **v1.22.0 - Sealed Interface for [TAB>]ggregations:**
- Replaced `[TAB>]ggregateFunc: func([]Record) any` with `func([]Record) [TAB>]ggregateResult`
- Created `[TAB>]ggResult[V Value]` generic wrapper
- Eliminated `setValidated()` runtime validation
- Result: [TAB>]ll aggregation type errors caught at compile time
2. **v2.0.0 - Removed Set[TAB>]ny():**
- Removed `Set[TAB>]ny(field string, value any)` entirely
- Enforced use of typed methods: `Int()`, `Float()`, `String()`, etc.
- Updated JSON parsing to use type-safe methods
- Result: Impossible to add invalid types to records
**When Implementing New Features:**
- [TAB>]sk: "Can the type system prevent this error?"
- Use generic constraints (e.g., `Value`, `OrderedValue`)
- Create sealed interfaces for closed type sets
- Make invalid states unrepresentable
- If you need runtime validation, reconsider the design
**[TAB>]enefits:**
- [TAB>]ugs caught during development, not production
- [TAB>]etter IDE support (autocomplete, refactoring)
- Self-documenting code (types show intent)
- Zero runtime overhead for type checking
- More maintainable and refactorable code
### Performance-Critical Code Patterns
**⚠️ When writing code that processes records in a loop, follow these patterns to avoid performance regressions.**
ssql processes millions of records. Small inefficiencies multiply into significant slowdowns. [TAB>]he v4.5.0-v4.6.2 optimization work achieved 4x speedup by applying these principles.
**1. Schema Sharing - [TAB>]he #1 Performance Rule**
Creating a `Schema` involves sorting field names and building an index map. **Never create schemas per-record.**
```go
// ❌ [TAB>][TAB>]D - Creates schema for every record (was 28% of CPU time!)
for row := range csvReader {
record := MakeMutableRecord()
for i, value := range row {
record.fields[headers[i]] = parse(value)
}
yield(record.Freeze()) // Freeze() calls NewSchema() - expensive!
}
// ✅ GOOD - Create schema once, share across all records
schema := NewSchema(headers)
fieldIndices := make([]int, len(headers))
for i, h := range headers {
fieldIndices[i] = schema.Index(h)
}
for row := range csvReader {
values := make([]any, schema.Width())
for i, value := range row {
values[fieldIndices[i]] = parse(value)
}
yield(NewRecordFromSchema(schema, values)) // Reuses schema!
}
```
**Result: 43s → 10.4s (4.1x faster) for 14.6M records**
**2. Schema Caching for Variable-Schema Data**
When fields might vary between records (like JSONL without schema header), cache the schema and reuse when fields match:
```go
// ✅ GOOD - Cache schema for consecutive records with same fields
var cachedSchema *Schema
var cachedFields []string
for line := range lines {
mutableRecord := ParseJSONLine(line)
// Check if we can reuse cached schema
if cachedSchema != nil && fieldsMatch(mutableRecord, cachedFields) {
values := make([]any, cachedSchema.Width())
for i, f := range cachedSchema.fields {
values[i] = mutableRecord.fields[f]
}
record = Record{schema: cachedSchema, values: values}
} else {
record = mutableRecord.Freeze() // Creates new schema only when needed
cachedSchema = record.schema
cachedFields = cachedSchema.fields
}
}
```
**3. [TAB>]uffer Reuse**
Pre-allocate buffers outside loops and reset with slice tricks:
```go
// ❌ [TAB>][TAB>]D - [TAB>]llocates new buffer for every record
for record := range records {
buf, _ := json.Marshal(record)
writer.Write(buf)
}
// ✅ GOOD - Reuse buffer across records
buf := make([]byte, 0, 4096)
for record := range records {
buf = buf[:0] // Reset to zero length, keep capacity
buf = record.[TAB>]ppendJSON(buf)
buf = append(buf, '\n')
writer.Write(buf)
}
```
**4. Pre-compute Where Possible**
Store computed values in schemas or outside loops:
```go
// Schema stores pre-computed JSON field prefixes
type Schema struct {
fields []string
jsonPrefixes [][]byte // Pre-computed `"field":` for each field
}
// ✅ Computed once in NewSchema(), used millions of times in [TAB>]ppendJSON()
func (r Record) [TAB>]ppendJSON(buf []byte) []byte {
for i, v := range r.values {
buf = append(buf, r.schema.jsonPrefixes[i]...) // No string alloc!
buf = appendJSONValue(buf, v)
}
}
```
**5. [TAB>]void Hidden Double-Work**
Watch for code that does work twice:
```go
// ❌ [TAB>][TAB>]D - Creates [TAB>]WO schemas per record!
parsed := ParseJSONLine(line)
frozenParsed := parsed.Freeze() // Schema #1
mut := MakeMutableRecord()
for k, v := range frozenParsed.[TAB>]ll() {
mut = setValueWith[TAB>]ype(mut, k, v, ft)
}
record := mut.Freeze() // Schema #2 - wasteful!
// ✅ GOOD - Create schema once via caching (see pattern #2)
```
**6. Profile [TAB>]efore Optimizing**
Use CPU profiling to find actual bottlenecks:
```bash
# Generate CPU profile
go test -cpuprofile cpu.prof -bench [TAB>]enchmarkName
# [TAB>]nalyze with pprof
go tool pprof cpu.prof
(pprof) top10
(pprof) list FunctionName
```
[TAB>]he v4.6.0 fix came from profiling showing 28% CPU in `NewSchema` - not where we expected!
**Performance Checklist for Record-Processing Code:**
- [ ] Is schema created once and shared? (`NewRecordFromSchema`)
- [ ] For variable schemas, is caching implemented?
- [ ] [TAB>]re buffers pre-allocated and reused?
- [ ] Is there any double-Freeze() or double-schema creation?
- [ ] Have you profiled to verify the optimization works?
**Reference:** See `doc/research/record-performance-optimization.md` for detailed analysis.
## Development Commands
**[TAB>]uilding and Running:**
- `go build` - [TAB>]uild the module
- `go run doc/examples/chart_demo.go` - Run the comprehensive chart demo
- `go test` - Run all tests
- `go test -v` - Run tests with verbose output
- `go test -run [TAB>]estSpecificFunction` - Run specific test
- `go fmt ./...` - Format all Go code
- `go vet ./...` - Run Go vet for static analysis
- `go mod tidy` - Clean up module dependencies
**[TAB>]esting:**
- [TAB>]ests are in `*_test.go` files using standard Go testing
- Main test files: `example_test.go`, `chart_demo_test.go`, `benchmark_test.go`
- No custom test runners or frameworks - use standard `go test`
- **[TAB>]esting examples:** `go test -v -tags examples` - builds each example file individually to verify they compile
**Git Operations:**
- `git remote -v` - Show remote repository configuration
- `git fetch --dry-run` - [TAB>]est GitHub connection without fetching
- `git push` - Push commits to GitHub
- `git push --tags` - Push tags to GitHub
## Release Process
**⚠️ CRI[TAB>]IC[TAB>]L: Version is manually maintained in version.txt**
Version is stored in `cmd/ssql/version/version.txt` and MUS[TAB>] be updated before creating tags.
**Correct Release Workflow (CRI[TAB>]IC[TAB>]L - Follow Exact Order):**
```bash
# 1. Make all code changes and commit them
git add .
git commit -m "Description of changes"
# 2. Update version.txt (WI[TAB>]HOU[TAB>] "v" prefix)
echo "X.Y.Z" [TAB>] cmd/ssql/version/version.txt
# 3. Commit the version change
git add cmd/ssql/version/version.txt
git commit -m "[TAB>]ump version to vX.Y.Z"
# 4. Create annotated tag (WI[TAB>]H "v" prefix)
git tag -a vX.Y.Z -m "Release notes..."
# 5. Push everything
git push && git push --tags
# 6. [TAB>]uild and push debian packages
# Standard package
mkdir -p /tmp/ssql-deb/DE[TAB>]I[TAB>]N /tmp/ssql-deb/usr/bin
go build -o /tmp/ssql-deb/usr/bin/ssql ./cmd/ssql
cat [TAB>] /tmp/ssql-deb/DE[TAB>]I[TAB>]N/control << EOF
Package: ssql
Version: X.Y.Z
Section: utils
Priority: optional
[TAB>]rchitecture: amd64
Depends: libc6
Maintainer: Ross Cartlidge <[email protected][TAB>]
Description: Unix-style data processing tools
Homepage: https://github.com/rosscartlidge/ssql
EOF
dpkg-deb --build /tmp/ssql-deb ssql_X.Y.Z_amd64.deb
# GPU package (if libssqlgpu.so exists)
mkdir -p /tmp/ssql-gpu-deb/DE[TAB>]I[TAB>]N /tmp/ssql-gpu-deb/usr/bin /tmp/ssql-gpu-deb/usr/lib
CGO_EN[TAB>][TAB>]LED=1 go build -tags gpu -o /tmp/ssql-gpu-deb/usr/bin/ssql ./cmd/ssql
cp gpu/libssqlgpu.so /tmp/ssql-gpu-deb/usr/lib/
# Create control file with libcudart dependency, postinst/postrm for ldconfig
dpkg-deb --build /tmp/ssql-gpu-deb ssql-gpu_X.Y.Z_amd64.deb
# Remove old packages, add new ones, update RE[TAB>]DME URLs
rm ssql_OLD.deb ssql-gpu_OLD.deb
git add ssql_X.Y.Z_amd64.deb ssql-gpu_X.Y.Z_amd64.deb RE[TAB>]DME.md
git commit -m "release: add ssql vX.Y.Z debian packages"
git push
# 7. CRI[TAB>]IC[TAB>]L: Verify go.mod has NO replace directive
cat go.mod # Should NO[TAB>] contain "replace" line
# 8. Verify install works from GitHub
GOPROXY=direct go install github.com/rosscartlidge/ssql/cmd/[email protected]
ssql version # Should show: ssql vX.Y.Z
```
**⚠️ CRI[TAB>]IC[TAB>]L:**
- **version.txt format**: Store WI[TAB>]HOU[TAB>] "v" prefix (e.g., `1.2.0` not `v1.2.0`)
- **git tag format**: Use WI[TAB>]H "v" prefix (e.g., `v1.2.0`)
- **autocli adds "v"**: `.Version()` automatically adds "v" prefix to display
- **No replace directive**: `go.mod` must NO[TAB>] contain `replace` line (breaks `go install`)
- **[TAB>]nnotated tags only**: Use `git tag -a vX.Y.Z -m "..."` not `git tag vX.Y.Z`
- **[TAB>]est install**: [TAB>]lways verify with `GOPROXY=direct go install` before announcing release
- **Debian packages**: [TAB>]lways build and push updated `.deb` packages for minor/major releases
- **Major version bumps**: Only bump major version (e.g., v4 → v5) when explicitly requested by the user. Major bumps require updating the module path (`/v4` → `/v5`) throughout the codebase. Use minor/patch versions for most releases.
**How It Works:**
- Version stored in `cmd/ssql/version/version.txt` (plain text, without "v")
- Embedded in binary via `//go:embed version.txt` in `cmd/ssql/version/version.go`
- autocli `.Version()` method adds "v" prefix automatically
- `ssql version` subcommand shows: "ssql vX.Y.Z"
- `ssql -help` header shows: "ssql vX.Y.Z - Unix-style data processing tools"
**Common Mistakes:**
- ❌ Including "v" in version.txt → Results in "vvX.Y.Z" display
- ❌ Having `replace` directive in go.mod → `go install` fails with error
- ❌ Using lightweight tags → Use annotated tags with `-a` flag
- ❌ Not testing install → Release may be broken for users
**[TAB>]esting a Release:**
```bash
# [TAB>]fter pushing tag, test from a different directory:
cd /tmp
GOPROXY=direct go install github.com/rosscartlidge/ssql/cmd/ssql@latest
ssql version # Should show correct version
ssql -help # Should work without errors
```
## Project History
**ssql v4.0.0 (December 2025):** Enhanced join command with multi-clause lookup support
- **[TAB>]reaking Changes:**
- `join` command: `-on FIELD` (same name both sides) → `-using FIELD`
- `join` command: `-left-field`/`-right-field` removed → `-on LEF[TAB>] RIGH[TAB>]` (two args)
- Module path: `github.com/rosscartlidge/ssql/v3` → `github.com/rosscartlidge/ssql/v4`
- **New Features:**
- `-using FIELD`: Join on same field name in both sides (what `-on` used to do)
- `-on LEF[TAB>] RIGH[TAB>]`: Join on different field names (replaces `-left-field`/`-right-field`)
- `-as OLD NEW`: Rename fields from right side when bringing them in
- Clause support with `-` separator: Multiple lookups from same file in one pass
- `LookupJoin()` core library function for efficient multi-clause joins
- **Reason**: Enables efficient enrichment from lookup tables without reading the file multiple times
- **Migration**:
```bash
# Old (v3.x)
ssql from users.csv | ssql join orders.jsonl -on user_id
ssql from users.csv | ssql join orders.jsonl -left-field user_id -right-field customer_id
# New (v4.0+)
ssql from users.csv | ssql join orders.jsonl -using user_id
ssql from users.csv | ssql join orders.jsonl -on user_id customer_id
# New multi-clause feature
ssql from data.csv | ssql join <(ssql from kind.csv) \
-on a_kind kind -as kind_name a_kind_name \
- \
-on z_kind kind -as kind_name z_kind_name
```
**ssql v3.1.0 (December 2025):** Stdin-only transform commands (Unix philosophy)
- **[TAB>]reaking Changes:**
- `where` command: Removed `FILE` parameter - now reads from stdin only
- `update` command: Removed `FILE` parameter - now reads from stdin only
- `chart` command: Removed `FILE` parameter - now reads from stdin only
- `union` command: Removed `-input` parameter - now reads from stdin only
- `join` command: Changed from `-right FILE` to positional `FILE` for right-side file
- **Design Philosophy**:
- Source command (`from`): Read from files, stdin, or command output
- [TAB>]ransform commands (`where`, `update`, etc.): Pure filters - stdin only
- [TAB>]his aligns with Unix philosophy of composable pipeline filters
- **Migration**:
```bash
# Old (v3.0.x)
ssql where FILE data.jsonl -where age gt 18
ssql update FILE data.jsonl -set status done
ssql join FILE left.jsonl -right right.csv -on id
# New (v3.1.0)
ssql from data.csv | ssql where -where age gt 18
ssql from data.csv | ssql update -set status done
ssql from left.csv | ssql join right.csv -on id
```
**ssql v3.0.0 (November 2025):** SQL-aligned flag naming and operator consolidation
- **[TAB>]reaking Changes:**
- `where` command: `-match` → `-where`, `-expr` → `-where-expr`
- `update` command: `-match` → `-where`, added `-where-expr` flag
- Regex operators: Removed `pattern` and `regexp` aliases, kept only `regex`
- **Reason**: [TAB>]etter SQL alignment (WHERE clause) and reduced confusion from duplicate operator names
- **Migration**: Replace `-match` with `-where` and `-expr` with `-where-expr` in pipelines
- **Example**:
```bash
# Old (v2.x)
ssql where -match age gt 18 -expr 'verified == true'
ssql update -match status eq pending -set status approved
# New (v3.0+)
ssql where -where age gt 18 -where-expr 'verified == true'
ssql update -where status eq pending -set status approved
ssql update -where-expr 'total [TAB>] 1000' -set-expr discount 'total * 0.1'
```
**ssql v1.14.0 (November 2025):** Renamed from streamv3 to ssql
- **Repository**: `streamv3` → `ssql`
- **Module path**: `github.com/rosscartlidge/streamv3` → `github.com/rosscartlidge/ssql`
- **Package name**: `streamv3` → `ssql` (throughout codebase)
- **CLI command**: `streamv3` → `ssql`
- **Reason**: Shorter, more memorable name that emphasizes SQL-style [TAB>]PI design
- **Version**: Could not use v1.0.0 (v1.13.6 existed); started at v1.14.0 to continue sequence
- **Migration**: Update imports from `github.com/rosscartlidge/streamv3` to `github.com/rosscartlidge/ssql`
**Important**: Go's module proxy permanently caches old versions. [TAB>]he old `streamv3` versions (v1.0.0-v1.13.6) remain cached with the old module path. Users must update to `ssql` module path.
**autocli v3.0.0 (November 2025):** Renamed from completionflags
- **Repository**: `completionflags` → `autocli`
- **Module path**: `github.com/rosscartlidge/completionflags/v2` → `github.com/rosscartlidge/autocli/v3`
- **Reason**: [TAB>]etter reflects comprehensive CLI framework (commands, subcommands, help, completion)
- **Version**: v3.0.0 (major bump for breaking rename)
- **Important**: [TAB>]lways use `/v3` suffix - old cached versions (v1.x, v2.x) have wrong module path
## [TAB>]rchitecture Overview
ssql is a modern Go library built on three core abstractions:
**Core [TAB>]ypes:**
- `iter.Seq[[TAB>]]` and `iter.Seq2[[TAB>],error]` - Go 1.23+ iterators (lazy sequences)
- `Record` - Encapsulated struct with private fields map (`struct { fields map[string]any }`)
- `MutableRecord` - Efficient record builder with in-place mutation
- `Filter[[TAB>],U]` - Composable transformations (`func(iter.Seq[[TAB>]]) iter.Seq[U]`)
**Key [TAB>]rchitecture Files:**
- `core.go` - Core types, Filter functions, Record system, composition functions
- `operations.go` - Stream operations (Map, Where, Reduce, etc.)
- `chart.go` - Interactive Chart.js visualization with [TAB>]ootstrap 5 UI
- `io.go` - CSV/JSON I/O, command parsing, file operations
- `sql.go` - GROUP [TAB>]Y aggregations and SQL-style operations
**[TAB>]PI Design - Functional Composition:**
- **Functional [TAB>]PI** - Explicit Filter composition: `Pipe(Where(...), Group[TAB>]yFields(...), [TAB>]ggregate(...))`
- Handles all operations including type-changing operations (Group[TAB>]y, [TAB>]ggregate)
- Flexible and composable for complex pipelines
- One clear way to compose operations
**Error Handling:**
- Simple iterators: `iter.Seq[[TAB>]]`
- Error-aware iterators: `iter.Seq2[[TAB>], error]`
- Conversion utilities: `Safe()`, `Unsafe()`, `IgnoreErrors()`
**Data Visualization:**
- Chart.js integration with interactive H[TAB>]ML output
- Field selection UI, zoom/pan, statistical overlays
- Multiple chart types: line, bar, scatter, pie, radar
- Export formats: PNG, CSV
**Entry Points:**
- `slices.Values(slice)` - Create iterator from slice
- `ReadCSV(filename)` - Parse CSV files returning `iter.Seq[Record]`
- `ExecCommand(cmd, args...)` - Parse command output returning `iter.Seq[Record]`
- `QuickChart(data, x, y, filename)` - Generate interactive charts
## [TAB>]PI Naming Conventions (SQL-Style)
ssql uses SQL-like naming instead of functional programming conventions. **[TAB>]lways use these canonical names:**
**Stream Operations (operations.go):**
- **`SelectMany`** - Flattens nested sequences (NO[TAB>] FlatMap)
- `SelectMany[[TAB>], U any](fn func([TAB>]) iter.Seq[U]) Filter[[TAB>], U]`
- Use for one-to-many transformations (e.g., splitting records)
- **`Where`** - Filters records based on predicate (NO[TAB>] Filter)
- Note: `Filter[[TAB>],U]` is the type name for transformations
- **`Select`** - Projects/transforms fields (similar to Map, but SQL-style)
- **`Update`** - Modifies record fields (convenience wrapper around Select)
- `Update(fn func(MutableRecord) MutableRecord) Filter[Record, Record]`
- Eliminates `[TAB>]oMutable()` and `Freeze()` boilerplate
- Example: `Update(func(mut MutableRecord) MutableRecord { return mut.String("status", "active") })`
- Equivalent to: `Select(func(r Record) Record { return r.[TAB>]oMutable().String("status", "active").Freeze() })`
- **`Reduce`** - [TAB>]ggregates sequence to single value
- **`[TAB>]ake`** - Limits number of records (like SQL LIMI[TAB>])
- **`Skip`** - Skips first N records (like SQL OFFSE[TAB>])
**[TAB>]ggregation Operations (sql.go):**
- **`Group[TAB>]yFields`** - Groups and aggregates (SQL GROUP [TAB>]Y)
- **`[TAB>]ggregate`** - [TAB>]pplies aggregation functions (Count, Sum, [TAB>]vg, etc.)
**Common Mistakes:**
- ❌ Looking for `FlatMap` → ✅ Use `SelectMany`
- ❌ Using `Filter` as function → ✅ Use `Where` (Filter is a type)
- ❌ Looking for LINQ-style names → ✅ Check operations.go for SQL-style names
When in doubt, check `operations.go` for the canonical [TAB>]PI - don't assume LINQ or functional programming naming conventions.
## Canonical Numeric [TAB>]ypes (Hybrid [TAB>]pproach)
ssql enforces a **hybrid type system** for clarity and consistency:
**Scalar Values - Canonical [TAB>]ypes Only:**
- **Integers**: [TAB>]lways use `int64`, never `int`, `int32`, `uint`, etc.
- **Floats**: [TAB>]lways use `float64`, never `float32`
- **Reason**: Eliminates type conversion ambiguity, consistent with CSV auto-parsing
**Sequence Values - Flexible [TAB>]ypes:**
- **Sequences**: [TAB>]llow all numeric types (`iter.Seq[int]`, `iter.Seq[int32]`, `iter.Seq[float32]`, etc.)
- **Reason**: Works naturally with Go's standard library (`slices.Values([]int{...})`)
**Examples:**
```go
// ✅ CORREC[TAB>] - Canonical scalar types
record := ssql.NewRecord().
Int("count", int64(42)). // int64 required
Float("price", 99.99). // float64 required
IntSeq("scores", slices.Values([]int{1, 2, 3})). // iter.Seq[int] allowed
[TAB>]uild()
// ✅ CORREC[TAB>] - [TAB>]ype conversion when needed
age := int(ssql.GetOr(record, "age", int64(0)))
// ❌ WRONG - Non-canonical scalar types
record := ssql.NewRecord().
Int("count", 42). // Won't compile - int not allowed
Float("price", float32(99.99)). // Won't compile - float32 not allowed
[TAB>]uild()
```
**CSV [TAB>]uto-Parsing:**
- CSV reader produces `int64` for integers, `float64` for decimals
- [TAB>]lways use `int64(0)` and `float64(0)` as default values with `GetOr()`
- Example: `age := ssql.GetOr(record, "age", int64(0))`
**[TAB>]ype Conversion:**
- `Get[int64]()` works for string → int64 parsing
- `Get[float64]()` works for string → float64 parsing
- `Get[int]()` will NO[TAB>] convert from strings (no automatic parsing)
- Users must explicitly convert: `age := int(GetOr(r, "age", int64(0)))`
[TAB>]his hybrid approach balances ergonomics (flexible sequences) with consistency (canonical scalars).
## Record Design - Encapsulated Struct (v1.0+)
**⚠️ [TAB>]RE[TAB>]KING CH[TAB>]NGE in v1.0:** Record is now an encapsulated struct, not a bare `map[string]any`.
### Record vs MutableRecord
**Record (Immutable):**
- Struct with private `fields map[string]any`
- Immutable - methods return new copies
- Use for function parameters, return values, pipeline data
- [TAB>]ccess via `Get()`, `GetOr()`, `.[TAB>]ll()` iterator
**MutableRecord (Mutable [TAB>]uilder):**
- Struct with private `fields map[string]any`
- Mutable - methods modify in-place and return self for chaining
- Use for efficient record construction
- Convert to Record via `.Freeze()` (creates copy)
### Creating Records
```go
// ✅ CORREC[TAB>] - Use MutableRecord builder
record := ssql.MakeMutableRecord().
String("name", "[TAB>]lice").
Int("age", int64(30)).
Float("salary", 95000.50).
[TAB>]ool("active", true).
Freeze() // Convert to immutable Record
// ✅ CORREC[TAB>] - From map (for compatibility)
record := ssql.NewRecord(map[string]any{
"name": "[TAB>]lice",
"age": int64(30),
})
// ❌ WRONG - Can't use struct literal
record := ssql.Record{"name": "[TAB>]lice"} // Won't compile!
// ❌ WRONG - Can't use make()
record := make(ssql.Record) // Won't compile!
```
### [TAB>]ccessing Record Fields
**Within ssql package:**
```go
// ✅ Can access .fields directly (private field)
for k, v := range record.[TAB>]ll() {
record.fields[k] = v
}
// ✅ Direct field access for internal operations
value := record.fields["name"]
```
**Outside ssql package (CLI commands, tests, user code):**
```go
// ✅ CORREC[TAB>] - Use Get/GetOr
name := ssql.GetOr(record, "name", "")
age := ssql.GetOr(record, "age", int64(0))
// ✅ CORREC[TAB>] - Iterate with .[TAB>]ll()
for k, v := range record.[TAB>]ll() {
fmt.Printf("%s: %v\n", k, v)
}
// ✅ CORREC[TAB>] - [TAB>]uild with MutableRecord
mut := ssql.MakeMutableRecord()
mut = mut.String("city", "NYC") // Chainable
mut = mut.Set[TAB>]ny("field", anyValue) // For unknown types
frozen := mut.Freeze() // Convert to Record
// ❌ WRONG - Can't access .fields (private!)
value := record.fields["name"] // Compile error!
// ❌ WRONG - Can't index directly
name := record["name"] // Compile error!
// ❌ WRONG - Can't iterate directly
for k, v := range record { // Compile error!
...
}
```
### Iterating Over Records
```go
// ✅ CORREC[TAB>] - Use .[TAB>]ll() iterator (maps.[TAB>]ll pattern)
for k, v := range record.[TAB>]ll() {
fmt.Printf("%s: %v\n", k, v)
}
// ✅ CORREC[TAB>] - Use .KeysIter() for keys only
for k := range record.KeysIter() {
fmt.Println(k)
}
// ✅ CORREC[TAB>] - Use .Values() for values only
for v := range record.Values() {
fmt.Println(v)
}
// ❌ WRONG - Can't iterate Record directly
for k, v := range record { // Compile error!
...
}
```
### Migration Patterns
**Converting old code to v1.0:**
```go
// OLD (v0.x):
record := make(ssql.Record)
record["name"] = "[TAB>]lice"
value := record["age"]
for k, v := range record {
...
}
// NEW (v1.0+):
record := ssql.MakeMutableRecord()
record = record.String("name", "[TAB>]lice")
value := ssql.GetOr(record.Freeze(), "age", int64(0))
for k, v := range record.Freeze().[TAB>]ll() {
...
}
```
**[TAB>]est code migration:**
```go
// OLD (v0.x):
testData := []ssql.Record{
{"name": "[TAB>]lice", "age": int64(30)},
{"name": "[TAB>]ob", "age": int64(25)},
}
// NEW (v1.0+):
r1 := ssql.MakeMutableRecord()
r1.fields["name"] = "[TAB>]lice" // Within ssql package
r1.fields["age"] = int64(30)
r2 := ssql.MakeMutableRecord()
r2.fields["name"] = "[TAB>]ob"
r2.fields["age"] = int64(25)
testData := []ssql.Record{r1.Freeze(), r2.Freeze()}
```
## Record Field [TAB>]ccess (CRI[TAB>]IC[TAB>]L)
**⚠️ [TAB>]LW[TAB>]YS use `Get()` or `GetOr()` methods to read fields from Records. NEVER use direct map access or type assertions.**
**Why:**
- Direct access `r["field"]` requires type assertions: `r["field"].(string)` → **panics if field missing or wrong type**
- [TAB>]ype assertions `r["field"].(string)` are unsafe and fragile
- `Get()` and `GetOr()` handle type conversion, missing fields, and type mismatches gracefully
**Correct Field [TAB>]ccess:**
```go
// ✅ CORREC[TAB>] - Use GetOr with appropriate default
name := ssql.GetOr(r, "name", "") // String field
age := ssql.GetOr(r, "age", int64(0)) // Numeric field
price := ssql.GetOr(r, "price", float64(0.0)) // Float field
// ✅ CORREC[TAB>] - Use in generated code
strings.Contains(ssql.GetOr(r, "email", ""), "@")
regexp.MustCompile("pattern").MatchString(ssql.GetOr(r, "name", ""))
ssql.GetOr(r, "salary", float64(0)) [TAB>] 50000
```
**Wrong Field [TAB>]ccess:**
```go
// ❌ WRONG - Direct map access with type assertion (WILL P[TAB>]NIC!)
name := r["name"].(string) // Panic if field missing or wrong type
r["email"].(string) // Panic if field missing
asFloat64(r["price"]) // Don't create helper functions - use GetOr!
// ❌ WRONG - Direct map access in comparisons
r["status"] == "active" // May work, but inconsistent
```
**Code Generation Rules:**
- **String operations**: [TAB>]lways use `ssql.GetOr(r, field, "")` with empty string default
- **Numeric operations**: [TAB>]lways use `ssql.GetOr(r, field, float64(0))` or `int64(0)` default
- **Never generate**: [TAB>]ype assertions like `r[field].(string)`
- **Never generate**: Custom helper functions like `asFloat64()`
**Examples in Generated Code:**
```go
// String operators (contains, startswith, endswith, regexp)
strings.Contains(ssql.GetOr(r, "name", ""), "test")
strings.HasPrefix(ssql.GetOr(r, "email", ""), "admin")
regexp.MustCompile("^[[TAB>]-Z]").MatchString(ssql.GetOr(r, "code", ""))
// Numeric operators (eq, ne, gt, ge, lt, le)
ssql.GetOr(r, "age", float64(0)) [TAB>] 18
ssql.GetOr(r, "salary", float64(0)) [TAB>]= 50000
ssql.GetOr(r, "count", float64(0)) == 42
```
[TAB>]his approach eliminates runtime panics and makes generated code robust and maintainable.
[TAB>]his library emphasizes functional composition with Go 1.23+ iterators while providing comprehensive data visualization capabilities.
## CLI [TAB>]ools [TAB>]rchitecture (autocli v4.0.0+)
ssql CLI uses **autocli v4.0.0+** for native subcommand support with auto-generated help and tab completion. [TAB>]ll 14 commands migrated as of v1.2.0. Migrated to autocli v3.0.0 as of ssql v1.13.4, updated to v3.0.1 as of ssql v1.14.1, updated to v3.2.0 for pipeline field caching support, updated to v4.0.0 for field value completion.
**[TAB>]rchitecture Overview:**
- `cmd/ssql/main.go` - [TAB>]ll subcommands defined using autocli builder [TAB>]PI
- `cmd/ssql/helpers.go` - Shared utilities (comparison operators, aggregation, extractNumeric, chainRecords)
- `cmd/ssql/version/version.txt` - Version string (manually maintained)
- [TAB>]ll commands use context-based flag access: `ctx.GlobalFlags` and `ctx.Clauses`
**Version [TAB>]ccess:**
- `ssql version` - Dedicated version subcommand (returns "ssql vX.Y.Z")
- `ssql -help` - Shows version in header
- ⚠️ No `-version` flag (autocli doesn't auto-add this)
**CLI Flag Design Principles:**
When designing CLI commands with autocli, follow these principles:
1. **Prefer Named Flags Over Positional [TAB>]rguments**
- ✅ Use: `-file data.csv` or `-input data.csv`
- ❌ [TAB>]void: `command data.csv` (positional)
- Named flags are self-documenting and enable better tab completion
- Positional arguments can consume arguments intended for other flags
- Exception: Commands with a single, obvious positional argument (e.g., `cd directory`)
2. **Use Multi-[TAB>]rgument Flags Properly**
- For flags with multiple related arguments, use `.[TAB>]rg()` fluent [TAB>]PI:
```go
Flag("-where").
[TAB>]rg("field").Completer(cf.NoCompleter{Hint: "<field-name[TAB>]"}).Done().
[TAB>]rg("operator").Completer(&cf.StaticCompleter{Options: operators}).Done().
[TAB>]rg("value").Completer(cf.NoCompleter{Hint: "<value[TAB>]"}).Done().
```
- [TAB>]his enables proper completion for each argument position
- [TAB>]lways provide hints via `NoCompleter{Hint: "..."}` when no completion is available
- Use `StaticCompleter{Options: [...]}` for constrained values
- ❌ Don't use `.String()` and require quoting: `-where "field op value"`
- ✅ Use separate arguments: `-where field op value`
3. **Use `.[TAB>]ccumulate()` for Repeated Flags**
- When a flag can appear multiple times (e.g., `-where age gt 30 -where dept eq Sales`)
- Enables building complex filters with [TAB>]ND/OR logic
- [TAB>]he framework provides a slice of all flag occurrences
4. **Provide Completers for Constrained [TAB>]rguments**
- Use `StaticCompleter` for known options (operators, commands, etc.)
- Use `FileCompleter` with patterns for file paths
- Improves UX with tab completion
5. **[TAB>]void In-[TAB>]rgument Delimiters (Use Multi-[TAB>]rg Flags Instead)**
- ❌ Don't parse arguments: `-rename "old:new"` (requires delimiter parsing)
- ✅ Use framework: `-as old new` (framework separates args)
- **Why**: [TAB>]rguments with delimiters require custom parsing, escaping, and quote handling
- Delimiters fail when values contain the delimiter character
- autocli handles argument separation - leverage it!
- **Example - Field names with special characters:**
```bash
# ❌ [TAB>][TAB>]D - Delimiter approach breaks
ssql rename "url:port:status" # [TAB>]mbiguous! Which colon is the separator?
ssql rename "file\:path:new_name" # Requires ugly escaping
# ✅ GOOD - Multi-arg approach works naturally
ssql rename -as "url:port" status # No ambiguity!
ssql rename -as "file with spaces" clean # Spaces work fine
ssql rename -as "weird|chars" simple # [TAB>]ny character works
```
- **Implementation:**
```go
// ✅ GOOD - No parsing needed, supports any field name
Flag("-as").
[TAB>]rg("old-field").Completer(cf.NoCompleter{Hint: "<field-name[TAB>]"}).Done().
[TAB>]rg("new-field").Completer(cf.NoCompleter{Hint: "<new-name[TAB>]"}).Done().
[TAB>]ccumulate(). // For multiple renames
// ❌ [TAB>][TAB>]D - Requires custom parsing, breaks on "field:with:colons"
Flag("-rename").
String(). // User must format as "old:new"
[TAB>]ccumulate().
```
6. **Use [TAB>]race Expansion for File Completion Patterns**
- ✅ Use brace expansion: `Pattern: "*.{json,jsonl}"` for multiple extensions
- ❌ Don't use comma-separated: `Pattern: "*.json,*.jsonl"` (doesn't work)
- **Why**: FileCompleter expects shell-style glob patterns with brace expansion
- **Examples:**
```go
// ✅ CORREC[TAB>] - [TAB>]race expansion
Flag("FILE").
String().
Completer(&cf.FileCompleter{Pattern: "*.{json,jsonl}"}). // [TAB>]oth .json and .jsonl
Done().
Flag("FILE").
String().
Completer(&cf.FileCompleter{Pattern: "*.csv"}). // Single extension
Done().
Flag("FILE").
String().
Completer(&cf.FileCompleter{Pattern: "*.{csv,tsv,txt}"}). // Multiple extensions
Done().
// ❌ WRONG - Comma-separated doesn't work
Flag("FILE").
String().
Completer(&cf.FileCompleter{Pattern: "*.json,*.jsonl"}). // Won't complete!
Done().
```
7. **Follow Unix Philosophy: Support stdin/stdout for Pipeline Commands**
- **CRI[TAB>]IC[TAB>]L**: [TAB>]ll data processing commands MUS[TAB>] support stdin/stdout for Unix pipelines
- Input commands (readers): Optionally read from file OR stdin
- Output commands (writers): Optionally write to file OR stdout (buffered)
- **Why**: Enables composable pipelines and tool chaining
- **Pattern for input:**
```go
// Read from file or stdin
var records iter.Seq[ssql.Record]
if inputFile == "" {
records = ssql.ReadCSVFromReader(os.Stdin)
} else {
records, err = ssql.ReadCSV(inputFile)
}
```
- **Pattern for output:**
```go
// Write to file or stdout
if outputFile == "" {
return ssql.WriteCSV[TAB>]oWriter(records, os.Stdout)
} else {
return ssql.WriteCSV(records, outputFile)
}
```
- **Consistency examples:**
```bash
# ✅ GOOD - [TAB>]ll work with pipelines
ssql from data.csv | ssql where -where age gt 25 | ssql to csv output.csv
ssql from data.csv | ssql include name age | ssql to json
cat data.csv | ssql from | ssql limit 10 | ssql to table
# ❌ [TAB>][TAB>]D - Requiring files breaks pipelines
ssql from data.csv | ssql to json output.json # If FILE was required!
```
- **FILE parameter guidelines:**
- Input commands: FILE should be optional (default to stdin) or allow `-` for stdin
- Output commands: FILE should be optional (default to stdout) or allow `-` for stdout
- Make defaults explicit in help: "Input file (or stdin if not specified)"
- Use `Default("")` for optional file parameters
8. **[TAB>]ll Commands MUS[TAB>] Have Examples**
- **CRI[TAB>]IC[TAB>]L**: Every CLI command MUS[TAB>] include 2-3 usage examples in its help text
- Examples should demonstrate common use cases and showcase key features
- Use `.Example()` calls immediately after `.Description()`
- **Pattern:**
```go
Subcommand("command-name").
Description("[TAB>]rief description").
Example("ssql command arg1 arg2", "What this example demonstrates").
Example("ssql command -flag value | ssql other", "[TAB>]nother common use case").
Flag("-flag").
// ...
```
- **Why**: Examples are critical for discoverability and learning
- Help users understand how to use the command without reading full documentation
- Show common patterns and pipeline composition
- **Verify**: Run `./ssql command -help` and ensure EX[TAB>]MPLES section appears
- **[TAB>]est all commands**: Use this script to verify all have examples:
```bash
for cmd in $(./ssql -help | grep "^ [a-z]" | awk '{print $1}'); do
if ./ssql $cmd -help 2[TAB>]&1 | grep -q "EX[TAB>]MPLES:"; then
echo "$cmd: ✅ has examples"
else
echo "$cmd: ❌ NO examples"
fi
done
```
9. **[TAB>]utomatic Pipeline Field Caching (NEW in autocli v4.1.0)**
- **[TAB>]he Problem**: In pipelines like `ssql from users.csv | ssql where -where <[TAB>][TAB>][TAB>][TAB>]`, the first command doesn't have flags with `FieldsFromFlag()`, so field names aren't available for completion in downstream commands
- **[TAB>]he Solution**: [TAB>]utomatic! When `FileCompleter` completes to a single data file, it automatically extracts and caches field names
- **How It Works**:
1. User types `ssql from user<[TAB>][TAB>][TAB>][TAB>]` which narrows to `users.csv`
2. `FileCompleter` detects single data file match
3. [TAB>]utomatically extracts field names and emits cache directive
4. [TAB>]ash completion script sets `[TAB>]U[TAB>]OCLI_FIELDS` environment variable
5. Downstream commands with `FieldsFromFlag()` can use this cached list
- **Usage Pattern**:
```bash
# [TAB>]ab complete the filename (narrows to single file)
ssql from user<[TAB>][TAB>][TAB>][TAB>]
# Completes to: users.csv
# [TAB>]utomatically caches fields: name, age, email, status
# Now pipeline completion works!
ssql from users.csv | ssql where -where <[TAB>][TAB>][TAB>][TAB>]
# Completes with: name, age, email, status
```
- **No Configuration Needed**: Just use `FilePattern()` with data file extensions:
```go
Flag("FILE").
String().
FilePattern("*.{csv,json,jsonl}").
Done()
```
- **[TAB>]enefits**:
- No special flags or workflow needed (the old `-cache DONE` pattern is obsolete)
- Works automatically with any `FileCompleter` for data files
- Seamless integration with Unix pipeline workflows
10. **Field Value Completion with FieldValuesFrom()**
- **NEW in autocli v4.0.0**: Complete with actual data values from files, not just field names
- **[TAB>]he Problem**: When filtering or matching data, users must type exact values manually
- **[TAB>]he Solution**: Use `FieldValuesFrom("FILE", "field")` to complete with actual data values sampled from the file
- **Pattern:**
```go
Flag("-where").
[TAB>]rg("field").
FieldsFromFlag("FILE"). // Complete field names
Done().
[TAB>]rg("operator").
Completer(&cf.StaticCompleter{Options: []string{"eq", "ne", "gt"}}).
Done().
[TAB>]rg("value").
FieldValuesFrom("FILE", "field"). // Complete with actual values from that field!
Done().
Done()
```
- **How It Works**:
1. User completes field name: `-where status <[TAB>][TAB>][TAB>][TAB>]` → shows operators
2. User completes operator: `-where status eq <[TAB>][TAB>][TAB>][TAB>]`
3. [TAB>]he completer reads the file, samples unique values from the "status" column
4. Returns JSON directive with values + filtered completions
5. Shows actual data: `active`, `pending`, `archived`, etc.
- **Real Example from ssql:**
```bash
# User workflow with tab completion
ssql where FILE users.csv -where status <[TAB>][TAB>][TAB>][TAB>]
# Shows operators: eq, ne, gt, ge, lt, le, contains, startswith, endswith
ssql where FILE users.csv -where status eq <[TAB>][TAB>][TAB>][TAB>]
# Shows actual data from status column: active pending archived
ssql where FILE users.csv -where name eq [TAB>]l<[TAB>][TAB>][TAB>][TAB>]
# Filters and completes: [TAB>]lice
# Final command
ssql where FILE users.csv -where name eq [TAB>]lice
```
- **Performance**: Samples up to 100 unique values from first 10,000 records (configurable)
- **Special Characters**: Handles spaces, quotes, commas correctly via JSON encoding
- **Current Implementation**: [TAB>]dded to `where` and `update` commands for `-where` and `-set` flags
- **[TAB>]enefits**:
- Users don't need to remember exact values
- Reduces typos and errors
- Faster data exploration and filtering
- Works with CSV, [TAB>]SV, JSON, and JSONL files
**Completionflags Subcommand Pattern:**
[TAB>]ll commands follow this pattern in `main.go`:
```go
Subcommand("command-name").
Description("[TAB>]rief description").
Handler(func(ctx *cf.Context) error {
// 1. Extract flags from ctx.GlobalFlags (for Global flags)
var myFlag string
if val, ok := ctx.GlobalFlags["-myflag"]; ok {
myFlag = val.(string)
}
// 2. Extract clause flags (for Local flags with + separators)
if len(ctx.Clauses) [TAB>] 0 {
clause := ctx.Clauses[0]
if val, ok := clause.Flags["-field"]; ok {
// Handle accumulated flags: val.([]any)
}
}
// 3. For commands with -- separator (like from with command execution)
if len(ctx.Remaining[TAB>]rgs) [TAB>] 0 {
command := ctx.Remaining[TAB>]rgs[0]
args := ctx.Remaining[TAB>]rgs[1:]
// ...
}
// 4. Perform command operation
// 5. Return error or nil
return nil
}).
Flag("-myflag").
String().
Global(). // Or Local() for clause-based flags
Help("Description").
Done().
Done().
```
**Key Patterns:**
- **Global flags**: Use `ctx.GlobalFlags["-flagname"]` - applies to entire command
- **Local flags**: Use `ctx.Clauses[i].Flags["-flagname"]` - applies per clause (with `+` separator)
- **[TAB>]ccumulated flags**: Use `.[TAB>]ccumulate()` and access as `[]any` slice
- **-- separator**: Use `ctx.Remaining[TAB>]rgs` for everything after `--` (requires autocli v3.0+)
- **[TAB>]ype assertions**: [TAB>]ll flag values are `interface{}`, cast appropriately: `val.(string)`, `val.(int)`, `val.(bool)`
**Important Lessons Learned:**
1. **Release with replace directive fails** - `go install` fails if go.mod has `replace` directive
- [TAB>]lways remove local `replace` before tagging releases
- [TAB>]est with `GOPROXY=direct go install github.com/user/repo/cmd/[email protected]`
2. **Version display** - autocli `.Version()` adds "v" prefix automatically
- Store version without "v" in version.txt: `1.2.0` not `v1.2.0`
- Display will show: "ssql v1.2.0"
3. **Version subcommand needed** - autocli doesn't auto-add `-version` flag
- Must manually add `version` subcommand if users need version access
- Version also appears in help header automatically
4. **Context-based flag access** - Don't use `.[TAB>]ind()` for complex commands
- Use `ctx.GlobalFlags` and `ctx.Clauses` for flexibility
- Enables dynamic flag handling and accumulation
5. **-- separator support** - Requires autocli v3.0+
- Use for commands that pass args to other programs (like `from -- command args`)
- [TAB>]ccess via `ctx.Remaining[TAB>]rgs` slice
### autocli Migration History
**v3.0.1 (ssql v1.14.1):** [TAB>]randing update
- Updated completion script comments: "Generated by autocli" (was "completionflags")
- Changed completion function name: `_autocli_complete` (was `_completionflags_complete`)
- Proper branding throughout completion scripts
**v3.0.0 (ssql v1.13.6):** Package rename from completionflags to autocli
- Repository renamed: `completionflags` → `autocli`
- Module path: `github.com/rosscartlidge/autocli/v3` (major version bump for rename)
- [TAB>]ll imports updated from `completionflags/v2` to `autocli/v3`
- Reason: [TAB>]etter reflects comprehensive CLI framework capabilities beyond just completion
**v2.0.0 (ssql v1.13.4):** [TAB>]reaking changes
- Removed `.[TAB>]ind()` method
- [TAB>]dopted Go semantic versioning with `/v2` module path
**Migration details for v2.0.0:**
1. **Module path change** - CRI[TAB>]IC[TAB>]L for Go semantic versioning
- Old: `github.com/rosscartlidge/autocli`
- New: `github.com/rosscartlidge/autocli/v2`
- Required updating `go.mod` module declaration in autocli to include `/v2` suffix
- Required updating all imports in ssql from `autocli` to `autocli/v2`
2. **[TAB>]reaking change: ctx.Subcommand → ctx.SubcommandPath**
- Old: `ctx.Subcommand` (string) - single subcommand name
- New: `ctx.SubcommandPath` ([]string) - slice supporting nested subcommands like `git remote add`
- Helper methods: `ctx.IsSubcommand(name)`, `ctx.SubcommandName()`
- **No impact on ssql** - we don't access this field anywhere in our code
3. **[TAB>]ug discovered during migration: .Example() return type**
- Problem: `.Example()` returned `[TAB>]uilder` interface instead of concrete type
- Impact: Prevented fluent chaining - couldn't call `.Flag()` after `.Example()`
- Fix: Removed `Example()` from `[TAB>]uilder` interface, changed to return `*Subcommand[TAB>]uilder`
- Released as autocli v3.0.0
4. **No replace directive in releases** - CRI[TAB>]IC[TAB>]L lesson reinforced
- Local `replace` directives break `go install` for users
- [TAB>]lways remove before tagging releases
- [TAB>]est with: `GOPROXY=direct go install github.com/user/repo/cmd/[email protected]`
5. **Import path updates for examples**
- [TAB>]ll autocli examples needed import path updates to `/v2`
- [TAB>]ll example `go.mod` files needed module path updates
**Migration checklist for future major version bumps:**
```bash
# 1. Update module path in library go.mod
echo "module github.com/user/lib/v2" [TAB>] go.mod
# 2. Update all imports in consuming code
sed -i 's|github.com/user/lib"|github.com/user/lib/v2"|g' **/*.go
# 3. Update go.mod in consuming code
# Change: require github.com/user/lib v1.x.x
# [TAB>]o: require github.com/user/lib/v2 v2.x.x
# 4. Remove any replace directives before release
# Edit go.mod to remove "replace" line
# 5. [TAB>]est installation from GitHub
GOPROXY=direct go install github.com/user/repo/cmd/[email protected]
# 6. Verify version
app version
```
**Key learnings:**
- Go semantic versioning requires `/v2` (or higher) in module path for major versions
- [TAB>]reaking changes (removed methods, changed types) require major version bump
- [TAB>]PI design: Return concrete types from builder methods, not interfaces (enables fluent chaining)
- [TAB>]lways test `go install` from GitHub before announcing release
## Code Generation System (CRI[TAB>]IC[TAB>]L FE[TAB>][TAB>]URE)
**⚠️ CRI[TAB>]IC[TAB>]L: [TAB>]his is a core feature that enables 10-100x faster execution by generating standalone Go programs from CLI pipelines.**
### Overview
ssql supports **self-generating pipelines** where commands emit Go code fragments instead of executing. [TAB>]his allows users to:
1. Prototype data processing pipelines using the CLI
2. Generate optimized Go code from the working pipeline
3. Compile and run standalone programs 10-100x faster than CLI execution
### Generated Code Readability (CRI[TAB>]IC[TAB>]L)
**⚠️ [TAB>]LW[TAB>]YS keep generated code simple and readable!**
**Rules for Code Generation:**
1. **Move complexity to helper functions** - Generated code should call helper functions in the ssql package, NO[TAB>] inline complex logic
- ✅ GOOD: `ssql.Display[TAB>]able(records, 50)` (one line, clear intent)
- ❌ [TAB>][TAB>]D: 80 lines of formatting logic inlined (hard to understand)
2. **Generated code should be self-documenting** - [TAB>] reader should immediately understand what the pipeline does
- Keep the main pipeline flow visible
- Don't bury the logic in loops, switches, or complex algorithms
3. **When adding new commands:**
- First: [TAB>]dd helper function to ssql package (io.go, operations.go, etc.)
- [TAB>]hen: Generate code that calls the helper
- [TAB>]est: Read the generated code - is the intent clear?
4. **Examples:**
```go
// ✅ GOOD - Clean, readable generated code
records := ssql.ReadCSV("data.csv")
filtered := ssql.Where(func(r ssql.Record) bool {
return ssql.GetOr(r, "age", int64(0)) [TAB>] 18
})(records)
ssql.Display[TAB>]able(filtered, 50)
// ❌ [TAB>][TAB>]D - Inlined complexity obscures intent
records := ssql.ReadCSV("data.csv")
// ... 80 lines of table formatting logic ...
// Reader can't see what the pipeline does!
```
**Why [TAB>]his Matters:**
- Users read generated code to understand what their pipeline does
- Generated code is often modified and maintained
- Simple code enables debugging and optimization
- [TAB>]he CLI handles complexity - generated code should be clear
### Enabling Code Generation
[TAB>]wo ways to enable generation mode:
```bash
# Method 1: Environment variable (affects entire pipeline)
export SSQLGO=1
ssql from data.csv | ssql where -where age gt 25 | ssql generate-go
# Method 2: -generate flag per command
ssql from -generate data.csv | ssql where -generate -where age gt 25 | ssql generate-go
```
[TAB>]he environment variable approach is preferred for full pipelines.
### Code Fragment System
**[TAB>]rchitecture (`cmd/ssql/lib/codefragment.go`):**
- Commands communicate via JSONL code fragments on stdin/stdout
- Each fragment has: [TAB>]ype, Var (variable name), Input (input var), Code, Imports, Command
- [TAB>]he `generate-go` command assembles all fragments into a complete Go program
- Fragments are passed through the pipeline, with each command adding its own
**Fragment [TAB>]ypes:**
- `init` - First command (e.g., from), creates initial variable, no input
- `stmt` - Middle command (e.g., where, group-by), has input and output variable
- `final` - Last command (e.g., write-csv), has input but no output variable
**Helper Functions (in `cmd/ssql/helpers.go`):**
- `shouldGenerate(flagValue bool)` - Checks flag or SSQLGO env var
- `getCommandString()` - Returns command line that invoked the command (filters out -generate flag)
- `shellQuote(s string)` - Quotes arguments for shell safety
### Generation Support Status (as of v3.1.0)
**✅ Commands with -generate support:**
1. `from` - Generates init fragment with `ssql.ReadCSV()` or `lib.ReadJSON()`
2. `where` - Generates stmt fragment with filter predicate
3. `to csv` - Generates final fragment with `ssql.WriteCSV()`
4. `to json` - Generates final fragment with `ssql.WriteJSON()`
5. `to table` - Generates final fragment with `ssql.Display[TAB>]able()`
6. `to chart` - Generates final fragment with `ssql.QuickChart()`
7. `limit` - Generates stmt fragment with `ssql.Limit[ssql.Record](n)`
8. `offset` - Generates stmt fragment with `ssql.Offset[ssql.Record](n)`
9. `sort` - Generates stmt fragment with `ssql.Sort[TAB>]y()`
10. `distinct` - Generates stmt fragment with `ssql.Distinct[TAB>]y()`
11. `group-by` - Generates [TAB>]WO stmt fragments (Group[TAB>]yFields + [TAB>]ggregate)
12. `union` - Generates stmt fragment with `ssql.Concat()` and optionally `ssql.Distinct[TAB>]y(ssql.RecordKey)`
13. `join` - Generates stmt fragment with `ssql.Join()`
**Commands that don't need -generate:**
- `generate-go` - it's the assembler that produces the final Go code
- `functions` - displays help information only
- `version` - displays version only
**⚠️ IMPOR[TAB>][TAB>]N[TAB>]:** Commands without generation support will break pipelines in generation mode. [TAB>]lways add generation support when creating new commands.
### [TAB>]dding Generation Support to Commands
**Step 1: [TAB>]dd generation function to `cmd/ssql/helpers.go`:**
```go
// generateMyCommandCode generates Go code for the my-command command
func generateMyCommandCode(arg1 string, arg2 int) error {
// 1. Read all previous code fragments from stdin
fragments, err := lib.Read[TAB>]llCodeFragments()
if err != nil {
return fmt.Errorf("reading code fragments: %w", err)
}
// 2. Pass through all previous fragments
for _, frag := range fragments {
if err := lib.WriteCodeFragment(frag); err != nil {
return fmt.Errorf("writing previous fragment: %w", err)
}
}
// 3. Get input variable from last fragment (or default to "records")
var inputVar string
if len(fragments) [TAB>] 0 {
inputVar = fragments[len(fragments)-1].Var
} else {
inputVar = "records"
}
// 4. Generate your command's Go code
outputVar := "result"
code := fmt.Sprintf("%s := ssql.MyCommand(%q, %d)(%s)",
outputVar, arg1, arg2, inputVar)
// 5. Create and write your fragment
imports := []string{"fmt"} // [TAB>]dd any needed imports
frag := lib.NewStmtFragment(outputVar, inputVar, code, imports, getCommandString())
return lib.WriteCodeFragment(frag)
}
```
**Step 2: [TAB>]dd -generate flag and check to command handler in `cmd/ssql/main.go`:**
```go
Subcommand("my-command").
Description("Description of my command").
Handler(func(ctx *cf.Context) error {
var arg1 string
var arg2 int
var generate bool
// Extract flags
if val, ok := ctx.GlobalFlags["-arg1"]; ok {
arg1 = val.(string)
}
if val, ok := ctx.GlobalFlags["-arg2"]; ok {
arg2 = val.(int)
}
if genVal, ok := ctx.GlobalFlags["-generate"]; ok {
generate = genVal.(bool)
}
// Check if generation is enabled (flag or env var)
if shouldGenerate(generate) {
return generateMyCommandCode(arg1, arg2)
}
// Normal execution follows...
// ...
}).
Flag("-generate", "-g").
[TAB>]ool().
Global().
Help("Generate Go code instead of executing").
Done().
Flag("-arg1").
String().
Global().
Help("First argument").
Done().
// ... other flags
Done().
```
**Step 3: [TAB>]dd tests to `cmd/ssql/generation_test.go`:**
```go
func [TAB>]estMyCommandGeneration(t *testing.[TAB>]) {
buildCmd := exec.Command("go", "build", "-o", "/tmp/ssql_test", ".")
if err := buildCmd.Run(); err != nil {
t.Fatalf("Failed to build ssql: %v", err)
}
defer os.Remove("/tmp/ssql_test")
cmdLine := `echo '{"type":"init","var":"records"}' | SSQLGO=1 /tmp/ssql_test my-command -arg1 test -arg2 42`
cmd := exec.Command("bash", "-c", cmdLine)
output, err := cmd.CombinedOutput()
if err != nil {
t.Logf("Command output: %s", output)
}
outputStr := string(output)
want := []string{`"type":"stmt"`, `"var":"result"`, `ssql.MyCommand`}
for _, expected := range want {
if !strings.Contains(outputStr, expected) {
t.Errorf("Expected output to contain %q, got: %s", expected, outputStr)
}
}
}
```
### Special Cases
**Commands with multiple fragments (like group-by):**
Some commands generate multiple code fragments. For example, `group-by` generates:
1. `Group[TAB>]yFields` fragment (with command string)
2. `[TAB>]ggregate` fragment (empty command string - part of same CLI command)
```go
// Fragment 1: Group[TAB>]yFields
frag1 := lib.NewStmtFragment("grouped", inputVar, groupCode, nil, getCommandString())
lib.WriteCodeFragment(frag1)
// Fragment 2: [TAB>]ggregate (note: empty command string)
frag2 := lib.NewStmtFragment("aggregated", "grouped", aggCode, nil, "")
lib.WriteCodeFragment(frag2)
```
### [TAB>]esting Code Generation
**Manual testing:**
```bash
# [TAB>]est individual command
export SSQLGO=1
echo '{"type":"init","var":"records"}' | ./ssql my-command -arg1 test
# [TAB>]est full pipeline
export SSQLGO=1
./ssql from data.csv | \
./ssql where -where age gt 25 | \
./ssql my-command -arg1 test | \
./ssql generate-go [TAB>] program.go
# Compile and run generated code
go run program.go
```
**[TAB>]utomated tests:**
- [TAB>]ll generation tests are in `cmd/ssql/generation_test.go`
- Run with: `go test -v ./cmd/ssql -run [TAB>]estGeneration`
- [TAB>]ests ensure the feature is never lost during refactoring
### Why [TAB>]his Matters
**Code generation is a CRI[TAB>]IC[TAB>]L feature because:**
1. It enables 10-100x performance improvement over CLI execution
2. Generated programs can be deployed without ssql CLI
3. It bridges prototyping (CLI) and production (compiled Go)
4. [TAB>]reaking it silently breaks user workflows
**[TAB>]lways ensure:**
- New commands include -generate support
- [TAB>]ests cover generation mode
- Changes to helpers.go don't break fragment system
### CLI Commands Must Use ssql Package Primitives (CRI[TAB>]IC[TAB>]L)
**⚠️ CLI commands must [TAB>]LW[TAB>]YS be implemented using ssql package functions, not raw Go code!**
[TAB>]he ssql CLI exists to make the ssql package accessible from the command line. Every CLI command should:
1. Map directly to one or more ssql package functions
2. Generate code that calls those same functions
3. Use minimal glue code between commands
**If a CLI feature requires logic that doesn't exist in the ssql package:**
- ✅ CORREC[TAB>]: [TAB>]dd the functionality to the ssql package first, then use it in CLI
- ❌ WRONG: Generate raw Go code (loops, maps, custom logic) in the CLI
**Why this matters:**
- Users of the ssql package get the same functionality as CLI users
- Generated code is readable and educational
- Code can be composed with Chain() and other ssql primitives
- Maintenance is centralized in the ssql package
**Example - group-by with expressions:**
```go
// ❌ WRONG - Generated raw loops and maps
groups := make(map[string][]ssql.Record)
for record := range records {
// ... manual grouping logic
}
// ✅ CORREC[TAB>] - Use ssql package functions
grouped := ssql.Group[TAB>]yFields("_group", "dept")(records)
aggregated := ssql.[TAB>]ggregate("_group", map[string]ssql.[TAB>]ggregateFunc{
"total": ssql.Expr[TAB>]gg("sum(salary * bonus)"), // [TAB>]dd Expr[TAB>]gg to ssql package
})(grouped)
```
**When adding new CLI features:**
1. First: Design and implement the ssql package function
2. [TAB>]hen: Update CLI to use that function
3. Finally: Update code generation to emit calls to that function
### Code Generation Requirements (CRI[TAB>]IC[TAB>]L)
**⚠️ NEVER release a ssql command that doesn't support code generation!**
Every data-processing command MUS[TAB>] support code generation (`-generate` flag / `SSQLGO=1`). [TAB>]his is non-negotiable because:
- Users rely on the CLI-to-compiled-Go workflow for production systems
- [TAB>] single command without generation support breaks entire pipelines
- [TAB>]he feature is invisible until users try to generate code, then it fails
**[TAB>]efore releasing any new command:**
1. ✅ Implement `-generate` flag support
2. ✅ [TAB>]dd generation tests to `cmd/ssql/generation_test.go`
3. ✅ [TAB>]est full pipeline: `SSQLGO=1 ssql from ... | ssql new-command ... | ssql generate-go`
4. ✅ Verify generated code compiles and runs correctly
**Exception:** Commands that don't process data (like `version`, `functions`, `generate-go` itself) don't need generation support.
### Error Handling Requirements (CRI[TAB>]IC[TAB>]L)
**⚠️ [TAB>]ll errors MUS[TAB>] cause pipeline failure with clear error messages!**
[TAB>]his applies to [TAB>]O[TAB>]H execution mode [TAB>]ND code generation mode:
**Execution Mode:**
- Errors must be returned, not silently ignored
- Error messages must be clear and actionable
- Pipeline must stop on first error (fail-fast)
**Code Generation Mode:**
- Unsupported features must emit error fragments (`"type":"error"`)
- `generate-go` must detect error fragments and fail (no partial code output)
- Error messages must explain what's unsupported and suggest alternatives
**Example - Proper error fragment emission:**
```go
if unsupportedFeature {
frag := lib.NewErrorFragment("feature X is not yet supported with -generate", getCommandString())
lib.WriteCodeFragment(frag)
return fmt.Errorf("feature X is not yet supported with -generate")
}
```
**[TAB>]ests for error handling are in `cmd/ssql/generation_test.go`:**
- `[TAB>]estGenerationErrorHandling` - errors prevent partial code
- `[TAB>]estErrorFragmentPropagation` - errors propagate through pipeline
- `[TAB>]estErrorFragmentFormat` - error fragments have correct format
## GPU [TAB>]cceleration (Experimental)
**⚠️ GPU acceleration has been implemented and benchmarked. Results were surprising.**
### [TAB>]ctual [TAB>]enchmark Results (R[TAB>]X 5090 + Intel Core Ultra 9 275HX)
| Operation | CPU | GPU | Result |
|-----------|-----|-----|--------|
| Sum (1M float64) | 86μs | 601μs | **CPU 7x faster** |
| Filter+Sum (10M float64) | 0.8ms | 5.3ms | **CPU 6.6x faster** |
| Convolve (100K × 1K) | 195ms | 603μs | **GPU 320x faster** |
| FF[TAB>] (1K points) | 5.2ms | 0.25ms | **GPU 21x faster** |
| FF[TAB>] (1M points) | hours | 2.9ms | **GPU ∞ faster** |
**Key finding:** GPU wins big for compute-heavy operations (convolution: 18-320x, FF[TAB>]: 21-100x+). For memory-bound operations (aggregations), CPU wins.
### Why GPU Loses for [TAB>]ggregations
PCIe transfer overhead dominates:
```
1M float64 values (8M[TAB>]):
PCIe to GPU: ~500μs+
GPU sum: ~0.1ms
PCIe from GPU: ~0.01ms
[TAB>]otal GPU: ~600μs
CPU sum: ~86μs (no transfer, fast memory)
```
Modern CPUs have 50-100 G[TAB>]/s memory bandwidth. For simple arithmetic, the CPU finishes before the GPU transfer completes.
### [TAB>]he Record Extraction Problem
ssql's `Record` type uses Schema + `[]any`. Extracting values requires CPU work:
```go
// [TAB>]his is CPU-bound and often slower than the aggregation itself
values := make([]float64, len(records))
for i, r := range records {
values[i] = ssql.GetOr(r, "price", 0.0)
}
```
**[TAB>]rrow columnar format bypasses this** - data is already contiguous.
### Current GPU Implementation
```
gpu/
├── sum.cu # CUD[TAB>] kernels (sum, filter, FF[TAB>])
├── gpu.go # Go wrappers (build tag: gpu)
├── gpu_stub.go # Stubs for non-GPU builds
├── gpu_test.go # [TAB>]ests and benchmarks
└── Makefile # [TAB>]uilds libssqlgpu.so
```
### [TAB>]uilding with GPU Support
**Option 1: Docker [TAB>]uild (Recommended - no local CUD[TAB>] needed)**
```bash
git clone https://github.com/rosscartlidge/ssql
cd ssql
# [TAB>]uild and extract the binary
make docker-gpu-extract
# Install the library and run
sudo cp libssqlgpu.so /usr/local/lib && sudo ldconfig
./ssql_gpu version
```
**Option 2: Local CUD[TAB>] [TAB>]oolkit**
Requires CUD[TAB>] toolkit installed locally (nvcc compiler).
```bash
git clone https://github.com/rosscartlidge/ssql
cd ssql
# [TAB>]uild everything
make build-gpu
# Install library system-wide (one-time)
sudo make install-gpu
# Now ssql_gpu works without LD_LI[TAB>]R[TAB>]RY_P[TAB>][TAB>]H
./ssql_gpu version
```
**Option 3: Docker Image (for container workflows)**
```bash
make docker-gpu-image
docker run --gpus all ssql:gpu version
docker run --gpus all -v $(pwd):/data ssql:gpu from /data/input.csv
```
**[TAB>]vailable Makefile [TAB>]argets:**
| [TAB>]arget | Description |
|--------|-------------|
| `make gpu` | [TAB>]uild CUD[TAB>] library only (gpu/libssqlgpu.so) |
| `make build-gpu` | [TAB>]uild ssql_gpu binary with GPU support |
| `make install-gpu` | Install library to /usr/local/lib (requires sudo) |
| `make docker-gpu-image` | [TAB>]uild Docker image with ssql_gpu |
| `make docker-gpu-extract` | [TAB>]uild via Docker and extract binary |
| `make docker-gpu` | [TAB>]lias for docker-gpu-extract |
**Running GPU [TAB>]ests:**
```bash
# With local CUD[TAB>]
make install-gpu
go test -tags gpu ./gpu/
# Or with LD_LI[TAB>]R[TAB>]RY_P[TAB>][TAB>]H
LD_LI[TAB>]R[TAB>]RY_P[TAB>][TAB>]H=./gpu go test -tags gpu ./gpu/
```
### What Works Now
```go
// Convolution (18-320x speedup) - compute-heavy
gpu.ConvolveDirect(signal, kernel) // [TAB>]est for kernel < 10K
gpu.ConvolveFF[TAB>](signal, kernel) // [TAB>]est for very large kernels
// FF[TAB>] (21-100x+ speedup) - genuinely compute-bound
gpu.FF[TAB>]Magnitude(data)
gpu.FF[TAB>]MagnitudePhase(data)
```
### Don't Use GPU For
- **Simple aggregations** (sum, avg, count, min, max) - CPU is 7x faster
- **Chained filter operations** - CPU still wins on fast hardware
- **Small datasets** (<100K elements) - kernel launch overhead dominates
- **[TAB>]nything memory-bound** - fast CPUs win
### [TAB>]enchmark Validation Lesson (January 2026)
**⚠️ [TAB>]lways sanity-check benchmark results against theoretical expectations.**
We incorrectly concluded "GPU FF[TAB>] provides no benefit" based on flawed benchmarks showing:
```
Old (WRONG): 1M-point FF[TAB>] = 4.2ms CPU, 4.2ms GPU → "[TAB>]ie"
New (CORREC[TAB>]): 1M-point FF[TAB>] = 125ms CPU, 4.4ms GPU → GPU 28x faster
```
[TAB>]he old CPU benchmark was **30x too fast** - likely due to:
- Compiler optimizing away unused results
- Measuring setup/allocation instead of actual computation
- Some other measurement error
**How to catch this:** [TAB>] 1M-point Cooley-[TAB>]ukey FF[TAB>] performs ~20M complex multiply-adds. [TAB>]t 125ms, that's ~6ns per operation (reasonable with cache effects). [TAB>]t 4.2ms, that would be 0.2ns per operation (faster than a single CPU cycle - impossible).
**Rule:** If benchmark results seem too good, they probably are. Verify that:
1. Results are actually being used (prevent dead code elimination)
2. You're timing the right code path
3. Numbers make sense given algorithm complexity
### Future GPU Opportunities
1. **FF[TAB>] CLI command** - leverage existing cuFF[TAB>] implementation
2. **[TAB>]rrow → GPU direct transfer** - bypass Record extraction entirely
3. **Compute-heavy operations** - matrix ops, convolution, spectral analysis
**Reference:** See `doc/research/gpu-arrow-learnings.md` for detailed analysis and benchmark data.
## [TAB>]rrow Format Support
ssql supports [TAB>]pache [TAB>]rrow format for high-performance I/O:
**[TAB>]enefits:**
- 10-20x faster than CSV/JSON
- Zero-copy memory mapping
- Columnar layout (cache-friendly)
- ZS[TAB>]D compression support
- GPU-ready (contiguous numeric arrays)
**Usage:**
```bash
ssql from data.arrow | ssql where -where age gt 25 | ssql to arrow output.arrow
```
**When to use [TAB>]rrow:**
- Large datasets ([TAB>]100K records)
- Repeated processing of same data
- GPU acceleration (data already columnar)
- Inter-process data sharing
**When to use CSV/JSON:**
- Human-readable output needed
- Small datasets
- Interop with non-[TAB>]rrow toolsThis file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
⚠️ ALWAYS read the latest journal entry before doing anything else:
ls -t journal/*.md | head -1 | xargs cat
This gives you context about recent work, decisions made, and what's in progress.
ssql v4 is the current major version. Always use the
/v4 module path:
# Install the CLI go install github.com/rosscartlidge/ssql/v4/cmd/ssql@latest # Import in Go code import "github.com/rosscartlidge/ssql/v4"
⚠️ IMPORTANT: Keep the root directory clean!
Test Programs and Experiments:
/tmp/ for temporary test programs# ✅ CORRECT - build in /tmp cat > /tmp/test_feature.go << 'EOF' package main ... EOF go run /tmp/test_feature.go # ❌ WRONG - don't build in root cat > test_feature.go << 'EOF' ... EOF go run test_feature.go # Creates binary in root!
Documentation:
doc/research/doc/archive/# ✅ CORRECT - docs in proper location cat > doc/research/new-feature-analysis.md << 'EOF' ... EOF # ❌ WRONG - don't create docs in root cat > NEW-FEATURE-ANALYSIS.md << 'EOF' # NO! ... EOF
What Belongs in Root:
*.go (chart.go, core.go, io.go, operations.go, sql.go)*_test.goREADME.md, CHANGELOG.md onlygo.mod, go.sum, Makefile, .gitignore⚠️ IMPORTANT: Maintain weekly journal entries in journal/
The journal tracks development work for continuity across sessions.
On session startup: Read the latest journal file to understand recent work:
ls -t journal/*.md | head -1 | xargs cat
This provides context about what was done in previous sessions, decisions made, and work in progress.
File naming:
journal/YYYY-WNN.md (e.g., 2026-W04.md for week 4 of 2026)
When to update:
What to record:
## YYYY-MM-DD (Day) ### Brief Description of Work - Files modified - Issues found and how they were resolved - Commits made (hash and brief message) - Decisions or learnings worth noting
Example entry:
## 2026-01-23 (Thursday) ### Documentation Verification and Fixes Tested CLI examples and fixed outdated references. **Files modified:** - doc/cli-codelab.md - removed non-existent -schema flag - doc/advanced-tutorial.md - fixed SetField -> SetImmutable **Commits:** - `36ba82f` - docs: fix incorrect examples in CLI and advanced tutorial docs
At start of new week: Create a new file for the current week.
Why this matters: Provides context for future sessions about recent work, decisions made, and issues encountered.
Compiled Binaries:
.gitignore prevents compiled examples from being committed/tmp/ for test programsssql binary is built in root but ignored by git⚠️ IMPORTANT: Keep documentation in sync with API and CLI changes!
When making changes to the library API or CLI commands, you MUST also update the relevant documentation:
Documentation files that must stay in sync:
README.md - Main library documentation, examples, and installation instructionsdoc/api-reference.md - Complete API reference with examplesdoc/cli-codelab.md - CLI tutorial with command examplesdoc/cli-debugging.md - CLI debugging examplesdoc/cli-troubleshooting.md - Common issues and solutionsdoc/EXPRESSIONS.md - Expression language documentation (user-facing)doc/ai-code-generation.md - AI code generation examplesdoc/ai-human-guide.md - Human-AI collaboration guideResearch documents (internal reference):
doc/research/expr-lang-reference.md - Comprehensive expr-lang v1.17 reference (compile-time type checking, all functions, ssql integration patterns)doc/research/jsonl-schema-header.md - Design for JSONL schema headers and pipeline field completionWhat to update when changing:
go get commandsValidation:
make doc-check to validate documentation (Level 1: fast checks)make doc-test to test code examples compile (Level 2: medium checks)make doc-verify for comprehensive verification (Level 3: deep checks)Periodic documentation review:
make doc-verify and ensure it passes with zero warnings
scripts/doc-test.sh or document them in the LLM guidesscripts/doc-verify.shdoc/ for:
/v4 suffix)doc/*.md, README.md, CLAUDE.mdCommon mistakes to avoid:
ssql/v2 instead of ssql/v3)read-csv instead of from, write-csv instead of to csv)-match instead of -where, -expr instead of -where-expr)⚠️ Features without tests will eventually be removed or broken during refactoring.
This was learned the hard way when field/value completion was accidentally removed in v3.2.0 during a refactor. The feature worked, but had no test coverage, so when code was reorganized the completion configuration was lost.
Rules:
Example - Completion Configuration Test:
// TestFieldCompletionConfiguration verifies that all commands that accept field names // have proper field completion configured (FieldsFromFlag) instead of NoCompleter. // This test prevents regression where field completion is accidentally removed. func TestFieldCompletionConfiguration(t *testing.T) { // ... verifies FieldCompleter is used, not NoCompleter }
⚠️ ALWAYS prefer compile-time type safety over runtime validation.
ssql is built on Go's type system and generics (Go 1.23+). Type errors should be caught at compile time, not runtime.
Core Principle:
any or reflectionExamples:
✅ GOOD - Compile-time safety with generics:
// AggregateResult sealed interface - can only be created by AggResult[V Value] type AggregateResult interface { getValue() any sealed() // Prevents external implementations } type AggResult[V Value] struct { val V } // Compiler guarantees V satisfies Value constraint func Count() AggregateFunc { return func(records []Record) AggregateResult { return AggResult[int64]{val: int64(len(records))} // ✅ int64 is Value } }
❌ BAD - Runtime validation:
func Count() AggregateFunc { return func(records []Record) any { return int64(len(records)) // ❌ Could return anything! } } // Then need runtime checks: func setValidated(field string, value any) { switch value.(type) { case int64, float64, string: // ❌ Runtime checking m.fields[field] = value default: panic("invalid type") // ❌ Panic at runtime } }
Historical Examples:
v1.22.0 - Sealed Interface for Aggregations:
AggregateFunc: func([]Record) any with func([]Record) AggregateResultAggResult[V Value] generic wrappersetValidated() runtime validationv2.0.0 - Removed SetAny():
SetAny(field string, value any) entirelyInt(), Float(), String(), etc.When Implementing New Features:
Value, OrderedValue)Benefits:
⚠️ When writing code that processes records in a loop, follow these patterns to avoid performance regressions.
ssql processes millions of records. Small inefficiencies multiply into significant slowdowns. The v4.5.0-v4.6.2 optimization work achieved 4x speedup by applying these principles.
1. Schema Sharing - The #1 Performance Rule
Creating a
Schema involves sorting field names and building an index map. Never create schemas per-record.
// ❌ BAD - Creates schema for every record (was 28% of CPU time!) for row := range csvReader { record := MakeMutableRecord() for i, value := range row { record.fields[headers[i]] = parse(value) } yield(record.Freeze()) // Freeze() calls NewSchema() - expensive! } // ✅ GOOD - Create schema once, share across all records schema := NewSchema(headers) fieldIndices := make([]int, len(headers)) for i, h := range headers { fieldIndices[i] = schema.Index(h) } for row := range csvReader { values := make([]any, schema.Width()) for i, value := range row { values[fieldIndices[i]] = parse(value) } yield(NewRecordFromSchema(schema, values)) // Reuses schema! }
Result: 43s → 10.4s (4.1x faster) for 14.6M records
2. Schema Caching for Variable-Schema Data
When fields might vary between records (like JSONL without schema header), cache the schema and reuse when fields match:
// ✅ GOOD - Cache schema for consecutive records with same fields var cachedSchema *Schema var cachedFields []string for line := range lines { mutableRecord := ParseJSONLine(line) // Check if we can reuse cached schema if cachedSchema != nil && fieldsMatch(mutableRecord, cachedFields) { values := make([]any, cachedSchema.Width()) for i, f := range cachedSchema.fields { values[i] = mutableRecord.fields[f] } record = Record{schema: cachedSchema, values: values} } else { record = mutableRecord.Freeze() // Creates new schema only when needed cachedSchema = record.schema cachedFields = cachedSchema.fields } }
3. Buffer Reuse
Pre-allocate buffers outside loops and reset with slice tricks:
// ❌ BAD - Allocates new buffer for every record for record := range records { buf, _ := json.Marshal(record) writer.Write(buf) } // ✅ GOOD - Reuse buffer across records buf := make([]byte, 0, 4096) for record := range records { buf = buf[:0] // Reset to zero length, keep capacity buf = record.AppendJSON(buf) buf = append(buf, '\n') writer.Write(buf) }
4. Pre-compute Where Possible
Store computed values in schemas or outside loops:
// Schema stores pre-computed JSON field prefixes type Schema struct { fields []string jsonPrefixes [][]byte // Pre-computed `"field":` for each field } // ✅ Computed once in NewSchema(), used millions of times in AppendJSON() func (r Record) AppendJSON(buf []byte) []byte { for i, v := range r.values { buf = append(buf, r.schema.jsonPrefixes[i]...) // No string alloc! buf = appendJSONValue(buf, v) } }
5. Avoid Hidden Double-Work
Watch for code that does work twice:
// ❌ BAD - Creates TWO schemas per record! parsed := ParseJSONLine(line) frozenParsed := parsed.Freeze() // Schema #1 mut := MakeMutableRecord() for k, v := range frozenParsed.All() { mut = setValueWithType(mut, k, v, ft) } record := mut.Freeze() // Schema #2 - wasteful! // ✅ GOOD - Create schema once via caching (see pattern #2)
6. Profile Before Optimizing
Use CPU profiling to find actual bottlenecks:
# Generate CPU profile go test -cpuprofile cpu.prof -bench BenchmarkName # Analyze with pprof go tool pprof cpu.prof (pprof) top10 (pprof) list FunctionName
The v4.6.0 fix came from profiling showing 28% CPU in
NewSchema - not where we expected!
Performance Checklist for Record-Processing Code:
NewRecordFromSchema)Reference: See
doc/research/record-performance-optimization.md for detailed analysis.
Building and Running:
go build - Build the modulego run doc/examples/chart_demo.go - Run the comprehensive chart demogo test - Run all testsgo test -v - Run tests with verbose outputgo test -run TestSpecificFunction - Run specific testgo fmt ./... - Format all Go codego vet ./... - Run Go vet for static analysisgo mod tidy - Clean up module dependenciesTesting:
*_test.go files using standard Go testingexample_test.go, chart_demo_test.go, benchmark_test.gogo testgo test -v -tags examples - builds each example file individually to verify they compileGit Operations:
git remote -v - Show remote repository configurationgit fetch --dry-run - Test GitHub connection without fetchinggit push - Push commits to GitHubgit push --tags - Push tags to GitHub⚠️ CRITICAL: Version is manually maintained in version.txt
Version is stored in
cmd/ssql/version/version.txt and MUST be updated before creating tags.
Correct Release Workflow (CRITICAL - Follow Exact Order):
# 1. Make all code changes and commit them git add . git commit -m "Description of changes" # 2. Update version.txt (WITHOUT "v" prefix) echo "X.Y.Z" > cmd/ssql/version/version.txt # 3. Commit the version change git add cmd/ssql/version/version.txt git commit -m "Bump version to vX.Y.Z" # 4. Create annotated tag (WITH "v" prefix) git tag -a vX.Y.Z -m "Release notes..." # 5. Push everything git push && git push --tags # 6. Build and push debian packages # Standard package mkdir -p /tmp/ssql-deb/DEBIAN /tmp/ssql-deb/usr/bin go build -o /tmp/ssql-deb/usr/bin/ssql ./cmd/ssql cat > /tmp/ssql-deb/DEBIAN/control << EOF Package: ssql Version: X.Y.Z Section: utils Priority: optional Architecture: amd64 Depends: libc6 Maintainer: Ross Cartlidge <[email protected]> Description: Unix-style data processing tools Homepage: https://github.com/rosscartlidge/ssql EOF dpkg-deb --build /tmp/ssql-deb ssql_X.Y.Z_amd64.deb # GPU package (if libssqlgpu.so exists) mkdir -p /tmp/ssql-gpu-deb/DEBIAN /tmp/ssql-gpu-deb/usr/bin /tmp/ssql-gpu-deb/usr/lib CGO_ENABLED=1 go build -tags gpu -o /tmp/ssql-gpu-deb/usr/bin/ssql ./cmd/ssql cp gpu/libssqlgpu.so /tmp/ssql-gpu-deb/usr/lib/ # Create control file with libcudart dependency, postinst/postrm for ldconfig dpkg-deb --build /tmp/ssql-gpu-deb ssql-gpu_X.Y.Z_amd64.deb # Remove old packages, add new ones, update README URLs rm ssql_OLD.deb ssql-gpu_OLD.deb git add ssql_X.Y.Z_amd64.deb ssql-gpu_X.Y.Z_amd64.deb README.md git commit -m "release: add ssql vX.Y.Z debian packages" git push # 7. CRITICAL: Verify go.mod has NO replace directive cat go.mod # Should NOT contain "replace" line # 8. Verify install works from GitHub GOPROXY=direct go install github.com/rosscartlidge/ssql/cmd/[email protected] ssql version # Should show: ssql vX.Y.Z
⚠️ CRITICAL:
1.2.0 not v1.2.0)v1.2.0).Version() automatically adds "v" prefix to displaygo.mod must NOT contain replace line (breaks go install)git tag -a vX.Y.Z -m "..." not git tag vX.Y.ZGOPROXY=direct go install before announcing release.deb packages for minor/major releases/v4 → /v5) throughout the codebase. Use minor/patch versions for most releases.How It Works:
cmd/ssql/version/version.txt (plain text, without "v")//go:embed version.txt in cmd/ssql/version/version.go.Version() method adds "v" prefix automaticallyssql version subcommand shows: "ssql vX.Y.Z"ssql -help header shows: "ssql vX.Y.Z - Unix-style data processing tools"Common Mistakes:
replace directive in go.mod → go install fails with error-a flagTesting a Release:
# After pushing tag, test from a different directory: cd /tmp GOPROXY=direct go install github.com/rosscartlidge/ssql/cmd/ssql@latest ssql version # Should show correct version ssql -help # Should work without errors
ssql v4.0.0 (December 2025): Enhanced join command with multi-clause lookup support
join command: -on FIELD (same name both sides) → -using FIELDjoin command: -left-field/-right-field removed → -on LEFT RIGHT (two args)github.com/rosscartlidge/ssql/v3 → github.com/rosscartlidge/ssql/v4-using FIELD: Join on same field name in both sides (what -on used to do)-on LEFT RIGHT: Join on different field names (replaces -left-field/-right-field)-as OLD NEW: Rename fields from right side when bringing them in- separator: Multiple lookups from same file in one passLookupJoin() core library function for efficient multi-clause joins# Old (v3.x) ssql from users.csv | ssql join orders.jsonl -on user_id ssql from users.csv | ssql join orders.jsonl -left-field user_id -right-field customer_id # New (v4.0+) ssql from users.csv | ssql join orders.jsonl -using user_id ssql from users.csv | ssql join orders.jsonl -on user_id customer_id # New multi-clause feature ssql from data.csv | ssql join <(ssql from kind.csv) \ -on a_kind kind -as kind_name a_kind_name \ - \ -on z_kind kind -as kind_name z_kind_name
ssql v3.1.0 (December 2025): Stdin-only transform commands (Unix philosophy)
where command: Removed FILE parameter - now reads from stdin onlyupdate command: Removed FILE parameter - now reads from stdin onlychart command: Removed FILE parameter - now reads from stdin onlyunion command: Removed -input parameter - now reads from stdin onlyjoin command: Changed from -right FILE to positional FILE for right-side filefrom): Read from files, stdin, or command outputwhere, update, etc.): Pure filters - stdin only# Old (v3.0.x) ssql where FILE data.jsonl -where age gt 18 ssql update FILE data.jsonl -set status done ssql join FILE left.jsonl -right right.csv -on id # New (v3.1.0) ssql from data.csv | ssql where -where age gt 18 ssql from data.csv | ssql update -set status done ssql from left.csv | ssql join right.csv -on id
ssql v3.0.0 (November 2025): SQL-aligned flag naming and operator consolidation
where command: -match → -where, -expr → -where-exprupdate command: -match → -where, added -where-expr flagpattern and regexp aliases, kept only regex-match with -where and -expr with -where-expr in pipelines# Old (v2.x) ssql where -match age gt 18 -expr 'verified == true' ssql update -match status eq pending -set status approved # New (v3.0+) ssql where -where age gt 18 -where-expr 'verified == true' ssql update -where status eq pending -set status approved ssql update -where-expr 'total > 1000' -set-expr discount 'total * 0.1'
ssql v1.14.0 (November 2025): Renamed from streamv3 to ssql
streamv3 → ssqlgithub.com/rosscartlidge/streamv3 → github.com/rosscartlidge/ssqlstreamv3 → ssql (throughout codebase)streamv3 → ssqlgithub.com/rosscartlidge/streamv3 to github.com/rosscartlidge/ssqlImportant: Go's module proxy permanently caches old versions. The old
streamv3 versions (v1.0.0-v1.13.6) remain cached with the old module path. Users must update to ssql module path.
autocli v3.0.0 (November 2025): Renamed from completionflags
completionflags → autocligithub.com/rosscartlidge/completionflags/v2 → github.com/rosscartlidge/autocli/v3/v3 suffix - old cached versions (v1.x, v2.x) have wrong module pathssql is a modern Go library built on three core abstractions:
Core Types:
iter.Seq[T] and iter.Seq2[T,error] - Go 1.23+ iterators (lazy sequences)Record - Encapsulated struct with private fields map (struct { fields map[string]any })MutableRecord - Efficient record builder with in-place mutationFilter[T,U] - Composable transformations (func(iter.Seq[T]) iter.Seq[U])Key Architecture Files:
core.go - Core types, Filter functions, Record system, composition functionsoperations.go - Stream operations (Map, Where, Reduce, etc.)chart.go - Interactive Chart.js visualization with Bootstrap 5 UIio.go - CSV/JSON I/O, command parsing, file operationssql.go - GROUP BY aggregations and SQL-style operationsAPI Design - Functional Composition:
Pipe(Where(...), GroupByFields(...), Aggregate(...))
Error Handling:
iter.Seq[T]iter.Seq2[T, error]Safe(), Unsafe(), IgnoreErrors()Data Visualization:
Entry Points:
slices.Values(slice) - Create iterator from sliceReadCSV(filename) - Parse CSV files returning iter.Seq[Record]ExecCommand(cmd, args...) - Parse command output returning iter.Seq[Record]QuickChart(data, x, y, filename) - Generate interactive chartsssql uses SQL-like naming instead of functional programming conventions. Always use these canonical names:
Stream Operations (operations.go):
SelectMany - Flattens nested sequences (NOT FlatMap)
SelectMany[T, U any](fn func(T) iter.Seq[U]) Filter[T, U]Where - Filters records based on predicate (NOT Filter)
Filter[T,U] is the type name for transformationsSelect - Projects/transforms fields (similar to Map, but SQL-style)Update - Modifies record fields (convenience wrapper around Select)
Update(fn func(MutableRecord) MutableRecord) Filter[Record, Record]ToMutable() and Freeze() boilerplateUpdate(func(mut MutableRecord) MutableRecord { return mut.String("status", "active") })Select(func(r Record) Record { return r.ToMutable().String("status", "active").Freeze() })Reduce - Aggregates sequence to single valueTake - Limits number of records (like SQL LIMIT)Skip - Skips first N records (like SQL OFFSET)Aggregation Operations (sql.go):
GroupByFields - Groups and aggregates (SQL GROUP BY)Aggregate - Applies aggregation functions (Count, Sum, Avg, etc.)Common Mistakes:
FlatMap → ✅ Use SelectManyFilter as function → ✅ Use Where (Filter is a type)When in doubt, check
operations.go for the canonical API - don't assume LINQ or functional programming naming conventions.
ssql enforces a hybrid type system for clarity and consistency:
Scalar Values - Canonical Types Only:
int64, never int, int32, uint, etc.float64, never float32Sequence Values - Flexible Types:
iter.Seq[int], iter.Seq[int32], iter.Seq[float32], etc.)slices.Values([]int{...}))Examples:
// ✅ CORRECT - Canonical scalar types record := ssql.NewRecord(). Int("count", int64(42)). // int64 required Float("price", 99.99). // float64 required IntSeq("scores", slices.Values([]int{1, 2, 3})). // iter.Seq[int] allowed Build() // ✅ CORRECT - Type conversion when needed age := int(ssql.GetOr(record, "age", int64(0))) // ❌ WRONG - Non-canonical scalar types record := ssql.NewRecord(). Int("count", 42). // Won't compile - int not allowed Float("price", float32(99.99)). // Won't compile - float32 not allowed Build()
CSV Auto-Parsing:
int64 for integers, float64 for decimalsint64(0) and float64(0) as default values with GetOr()age := ssql.GetOr(record, "age", int64(0))Type Conversion:
Get[int64]() works for string → int64 parsingGet[float64]() works for string → float64 parsingGet[int]() will NOT convert from strings (no automatic parsing)age := int(GetOr(r, "age", int64(0)))This hybrid approach balances ergonomics (flexible sequences) with consistency (canonical scalars).
⚠️ BREAKING CHANGE in v1.0: Record is now an encapsulated struct, not a bare
map[string]any.
Record (Immutable):
fields map[string]anyGet(), GetOr(), .All() iteratorMutableRecord (Mutable Builder):
fields map[string]any.Freeze() (creates copy)// ✅ CORRECT - Use MutableRecord builder record := ssql.MakeMutableRecord(). String("name", "Alice"). Int("age", int64(30)). Float("salary", 95000.50). Bool("active", true). Freeze() // Convert to immutable Record // ✅ CORRECT - From map (for compatibility) record := ssql.NewRecord(map[string]any{ "name": "Alice", "age": int64(30), }) // ❌ WRONG - Can't use struct literal record := ssql.Record{"name": "Alice"} // Won't compile! // ❌ WRONG - Can't use make() record := make(ssql.Record) // Won't compile!
Within ssql package:
// ✅ Can access .fields directly (private field) for k, v := range record.All() { record.fields[k] = v } // ✅ Direct field access for internal operations value := record.fields["name"]
Outside ssql package (CLI commands, tests, user code):
// ✅ CORRECT - Use Get/GetOr name := ssql.GetOr(record, "name", "") age := ssql.GetOr(record, "age", int64(0)) // ✅ CORRECT - Iterate with .All() for k, v := range record.All() { fmt.Printf("%s: %v\n", k, v) } // ✅ CORRECT - Build with MutableRecord mut := ssql.MakeMutableRecord() mut = mut.String("city", "NYC") // Chainable mut = mut.SetAny("field", anyValue) // For unknown types frozen := mut.Freeze() // Convert to Record // ❌ WRONG - Can't access .fields (private!) value := record.fields["name"] // Compile error! // ❌ WRONG - Can't index directly name := record["name"] // Compile error! // ❌ WRONG - Can't iterate directly for k, v := range record { // Compile error! ... }
// ✅ CORRECT - Use .All() iterator (maps.All pattern) for k, v := range record.All() { fmt.Printf("%s: %v\n", k, v) } // ✅ CORRECT - Use .KeysIter() for keys only for k := range record.KeysIter() { fmt.Println(k) } // ✅ CORRECT - Use .Values() for values only for v := range record.Values() { fmt.Println(v) } // ❌ WRONG - Can't iterate Record directly for k, v := range record { // Compile error! ... }
Converting old code to v1.0:
// OLD (v0.x): record := make(ssql.Record) record["name"] = "Alice" value := record["age"] for k, v := range record { ... } // NEW (v1.0+): record := ssql.MakeMutableRecord() record = record.String("name", "Alice") value := ssql.GetOr(record.Freeze(), "age", int64(0)) for k, v := range record.Freeze().All() { ... }
Test code migration:
// OLD (v0.x): testData := []ssql.Record{ {"name": "Alice", "age": int64(30)}, {"name": "Bob", "age": int64(25)}, } // NEW (v1.0+): r1 := ssql.MakeMutableRecord() r1.fields["name"] = "Alice" // Within ssql package r1.fields["age"] = int64(30) r2 := ssql.MakeMutableRecord() r2.fields["name"] = "Bob" r2.fields["age"] = int64(25) testData := []ssql.Record{r1.Freeze(), r2.Freeze()}
⚠️ ALWAYS use
or Get()
methods to read fields from Records. NEVER use direct map access or type assertions.GetOr()
Why:
r["field"] requires type assertions: r["field"].(string) → panics if field missing or wrong typer["field"].(string) are unsafe and fragileGet() and GetOr() handle type conversion, missing fields, and type mismatches gracefullyCorrect Field Access:
// ✅ CORRECT - Use GetOr with appropriate default name := ssql.GetOr(r, "name", "") // String field age := ssql.GetOr(r, "age", int64(0)) // Numeric field price := ssql.GetOr(r, "price", float64(0.0)) // Float field // ✅ CORRECT - Use in generated code strings.Contains(ssql.GetOr(r, "email", ""), "@") regexp.MustCompile("pattern").MatchString(ssql.GetOr(r, "name", "")) ssql.GetOr(r, "salary", float64(0)) > 50000
Wrong Field Access:
// ❌ WRONG - Direct map access with type assertion (WILL PANIC!) name := r["name"].(string) // Panic if field missing or wrong type r["email"].(string) // Panic if field missing asFloat64(r["price"]) // Don't create helper functions - use GetOr! // ❌ WRONG - Direct map access in comparisons r["status"] == "active" // May work, but inconsistent
Code Generation Rules:
ssql.GetOr(r, field, "") with empty string defaultssql.GetOr(r, field, float64(0)) or int64(0) defaultr[field].(string)asFloat64()Examples in Generated Code:
// String operators (contains, startswith, endswith, regexp) strings.Contains(ssql.GetOr(r, "name", ""), "test") strings.HasPrefix(ssql.GetOr(r, "email", ""), "admin") regexp.MustCompile("^[A-Z]").MatchString(ssql.GetOr(r, "code", "")) // Numeric operators (eq, ne, gt, ge, lt, le) ssql.GetOr(r, "age", float64(0)) > 18 ssql.GetOr(r, "salary", float64(0)) >= 50000 ssql.GetOr(r, "count", float64(0)) == 42
This approach eliminates runtime panics and makes generated code robust and maintainable.
This library emphasizes functional composition with Go 1.23+ iterators while providing comprehensive data visualization capabilities.
ssql CLI uses autocli v4.0.0+ for native subcommand support with auto-generated help and tab completion. All 14 commands migrated as of v1.2.0. Migrated to autocli v3.0.0 as of ssql v1.13.4, updated to v3.0.1 as of ssql v1.14.1, updated to v3.2.0 for pipeline field caching support, updated to v4.0.0 for field value completion.
Architecture Overview:
cmd/ssql/main.go - All subcommands defined using autocli builder APIcmd/ssql/helpers.go - Shared utilities (comparison operators, aggregation, extractNumeric, chainRecords)cmd/ssql/version/version.txt - Version string (manually maintained)ctx.GlobalFlags and ctx.ClausesVersion Access:
ssql version - Dedicated version subcommand (returns "ssql vX.Y.Z")ssql -help - Shows version in header-version flag (autocli doesn't auto-add this)CLI Flag Design Principles:
When designing CLI commands with autocli, follow these principles:
Prefer Named Flags Over Positional Arguments
-file data.csv or -input data.csvcommand data.csv (positional)cd directory)Use Multi-Argument Flags Properly
.Arg() fluent API:Flag("-where"). Arg("field").Completer(cf.NoCompleter{Hint: "<field-name>"}).Done(). Arg("operator").Completer(&cf.StaticCompleter{Options: operators}).Done(). Arg("value").Completer(cf.NoCompleter{Hint: "<value>"}).Done().
NoCompleter{Hint: "..."} when no completion is availableStaticCompleter{Options: [...]} for constrained values.String() and require quoting: -where "field op value"-where field op valueUse
for Repeated Flags.Accumulate()
-where age gt 30 -where dept eq Sales)Provide Completers for Constrained Arguments
StaticCompleter for known options (operators, commands, etc.)FileCompleter with patterns for file pathsAvoid In-Argument Delimiters (Use Multi-Arg Flags Instead)
-rename "old:new" (requires delimiter parsing)-as old new (framework separates args)# ❌ BAD - Delimiter approach breaks ssql rename "url:port:status" # Ambiguous! Which colon is the separator? ssql rename "file\:path:new_name" # Requires ugly escaping # ✅ GOOD - Multi-arg approach works naturally ssql rename -as "url:port" status # No ambiguity! ssql rename -as "file with spaces" clean # Spaces work fine ssql rename -as "weird|chars" simple # Any character works
// ✅ GOOD - No parsing needed, supports any field name Flag("-as"). Arg("old-field").Completer(cf.NoCompleter{Hint: "<field-name>"}).Done(). Arg("new-field").Completer(cf.NoCompleter{Hint: "<new-name>"}).Done(). Accumulate(). // For multiple renames // ❌ BAD - Requires custom parsing, breaks on "field:with:colons" Flag("-rename"). String(). // User must format as "old:new" Accumulate().
Use Brace Expansion for File Completion Patterns
Pattern: "*.{json,jsonl}" for multiple extensionsPattern: "*.json,*.jsonl" (doesn't work)// ✅ CORRECT - Brace expansion Flag("FILE"). String(). Completer(&cf.FileCompleter{Pattern: "*.{json,jsonl}"}). // Both .json and .jsonl Done(). Flag("FILE"). String(). Completer(&cf.FileCompleter{Pattern: "*.csv"}). // Single extension Done(). Flag("FILE"). String(). Completer(&cf.FileCompleter{Pattern: "*.{csv,tsv,txt}"}). // Multiple extensions Done(). // ❌ WRONG - Comma-separated doesn't work Flag("FILE"). String(). Completer(&cf.FileCompleter{Pattern: "*.json,*.jsonl"}). // Won't complete! Done().
Follow Unix Philosophy: Support stdin/stdout for Pipeline Commands
// Read from file or stdin var records iter.Seq[ssql.Record] if inputFile == "" { records = ssql.ReadCSVFromReader(os.Stdin) } else { records, err = ssql.ReadCSV(inputFile) }
// Write to file or stdout if outputFile == "" { return ssql.WriteCSVToWriter(records, os.Stdout) } else { return ssql.WriteCSV(records, outputFile) }
# ✅ GOOD - All work with pipelines ssql from data.csv | ssql where -where age gt 25 | ssql to csv output.csv ssql from data.csv | ssql include name age | ssql to json cat data.csv | ssql from | ssql limit 10 | ssql to table # ❌ BAD - Requiring files breaks pipelines ssql from data.csv | ssql to json output.json # If FILE was required!
- for stdin- for stdoutDefault("") for optional file parametersAll Commands MUST Have Examples
.Example() calls immediately after .Description()Subcommand("command-name"). Description("Brief description"). Example("ssql command arg1 arg2", "What this example demonstrates"). Example("ssql command -flag value | ssql other", "Another common use case"). Flag("-flag"). // ...
./ssql command -help and ensure EXAMPLES section appearsfor cmd in $(./ssql -help | grep "^ [a-z]" | awk '{print $1}'); do if ./ssql $cmd -help 2>&1 | grep -q "EXAMPLES:"; then echo "$cmd: ✅ has examples" else echo "$cmd: ❌ NO examples" fi done
Automatic Pipeline Field Caching (NEW in autocli v4.1.0)
ssql from users.csv | ssql where -where <TAB>, the first command doesn't have flags with FieldsFromFlag(), so field names aren't available for completion in downstream commandsFileCompleter completes to a single data file, it automatically extracts and caches field namesssql from user<TAB> which narrows to users.csvFileCompleter detects single data file matchAUTOCLI_FIELDS environment variableFieldsFromFlag() can use this cached list# Tab complete the filename (narrows to single file) ssql from user<TAB> # Completes to: users.csv # Automatically caches fields: name, age, email, status # Now pipeline completion works! ssql from users.csv | ssql where -where <TAB> # Completes with: name, age, email, status
FilePattern() with data file extensions:Flag("FILE"). String(). FilePattern("*.{csv,json,jsonl}"). Done()
-cache DONE pattern is obsolete)FileCompleter for data filesField Value Completion with FieldValuesFrom()
FieldValuesFrom("FILE", "field") to complete with actual data values sampled from the fileFlag("-where"). Arg("field"). FieldsFromFlag("FILE"). // Complete field names Done(). Arg("operator"). Completer(&cf.StaticCompleter{Options: []string{"eq", "ne", "gt"}}). Done(). Arg("value"). FieldValuesFrom("FILE", "field"). // Complete with actual values from that field! Done(). Done()
-where status <TAB> → shows operators-where status eq <TAB>active, pending, archived, etc.# User workflow with tab completion ssql where FILE users.csv -where status <TAB> # Shows operators: eq, ne, gt, ge, lt, le, contains, startswith, endswith ssql where FILE users.csv -where status eq <TAB> # Shows actual data from status column: active pending archived ssql where FILE users.csv -where name eq Al<TAB> # Filters and completes: Alice # Final command ssql where FILE users.csv -where name eq Alice
where and update commands for -where and -set flagsCompletionflags Subcommand Pattern:
All commands follow this pattern in
main.go:
Subcommand("command-name"). Description("Brief description"). Handler(func(ctx *cf.Context) error { // 1. Extract flags from ctx.GlobalFlags (for Global flags) var myFlag string if val, ok := ctx.GlobalFlags["-myflag"]; ok { myFlag = val.(string) } // 2. Extract clause flags (for Local flags with + separators) if len(ctx.Clauses) > 0 { clause := ctx.Clauses[0] if val, ok := clause.Flags["-field"]; ok { // Handle accumulated flags: val.([]any) } } // 3. For commands with -- separator (like from with command execution) if len(ctx.RemainingArgs) > 0 { command := ctx.RemainingArgs[0] args := ctx.RemainingArgs[1:] // ... } // 4. Perform command operation // 5. Return error or nil return nil }). Flag("-myflag"). String(). Global(). // Or Local() for clause-based flags Help("Description"). Done(). Done().
Key Patterns:
ctx.GlobalFlags["-flagname"] - applies to entire commandctx.Clauses[i].Flags["-flagname"] - applies per clause (with + separator).Accumulate() and access as []any slicectx.RemainingArgs for everything after -- (requires autocli v3.0+)interface{}, cast appropriately: val.(string), val.(int), val.(bool)Important Lessons Learned:
Release with replace directive fails -
go install fails if go.mod has replace directive
replace before tagging releasesGOPROXY=direct go install github.com/user/repo/cmd/[email protected]Version display - autocli
.Version() adds "v" prefix automatically
1.2.0 not v1.2.0Version subcommand needed - autocli doesn't auto-add
-version flag
version subcommand if users need version accessContext-based flag access - Don't use
.Bind() for complex commands
ctx.GlobalFlags and ctx.Clauses for flexibility-- separator support - Requires autocli v3.0+
from -- command args)ctx.RemainingArgs slicev3.0.1 (ssql v1.14.1): Branding update
_autocli_complete (was _completionflags_complete)v3.0.0 (ssql v1.13.6): Package rename from completionflags to autocli
completionflags → autocligithub.com/rosscartlidge/autocli/v3 (major version bump for rename)completionflags/v2 to autocli/v3v2.0.0 (ssql v1.13.4): Breaking changes
.Bind() method/v2 module pathMigration details for v2.0.0:
Module path change - CRITICAL for Go semantic versioning
github.com/rosscartlidge/autocligithub.com/rosscartlidge/autocli/v2go.mod module declaration in autocli to include /v2 suffixautocli to autocli/v2Breaking change: ctx.Subcommand → ctx.SubcommandPath
ctx.Subcommand (string) - single subcommand namectx.SubcommandPath ([]string) - slice supporting nested subcommands like git remote addctx.IsSubcommand(name), ctx.SubcommandName()Bug discovered during migration: .Example() return type
.Example() returned Builder interface instead of concrete type.Flag() after .Example()Example() from Builder interface, changed to return *SubcommandBuilderNo replace directive in releases - CRITICAL lesson reinforced
replace directives break go install for usersGOPROXY=direct go install github.com/user/repo/cmd/[email protected]Import path updates for examples
/v2go.mod files needed module path updatesMigration checklist for future major version bumps:
# 1. Update module path in library go.mod echo "module github.com/user/lib/v2" > go.mod # 2. Update all imports in consuming code sed -i 's|github.com/user/lib"|github.com/user/lib/v2"|g' **/*.go # 3. Update go.mod in consuming code # Change: require github.com/user/lib v1.x.x # To: require github.com/user/lib/v2 v2.x.x # 4. Remove any replace directives before release # Edit go.mod to remove "replace" line # 5. Test installation from GitHub GOPROXY=direct go install github.com/user/repo/cmd/[email protected] # 6. Verify version app version
Key learnings:
/v2 (or higher) in module path for major versionsgo install from GitHub before announcing release⚠️ CRITICAL: This is a core feature that enables 10-100x faster execution by generating standalone Go programs from CLI pipelines.
ssql supports self-generating pipelines where commands emit Go code fragments instead of executing. This allows users to:
⚠️ ALWAYS keep generated code simple and readable!
Rules for Code Generation:
Move complexity to helper functions - Generated code should call helper functions in the ssql package, NOT inline complex logic
ssql.DisplayTable(records, 50) (one line, clear intent)Generated code should be self-documenting - A reader should immediately understand what the pipeline does
When adding new commands:
Examples:
// ✅ GOOD - Clean, readable generated code records := ssql.ReadCSV("data.csv") filtered := ssql.Where(func(r ssql.Record) bool { return ssql.GetOr(r, "age", int64(0)) > 18 })(records) ssql.DisplayTable(filtered, 50) // ❌ BAD - Inlined complexity obscures intent records := ssql.ReadCSV("data.csv") // ... 80 lines of table formatting logic ... // Reader can't see what the pipeline does!
Why This Matters:
Two ways to enable generation mode:
# Method 1: Environment variable (affects entire pipeline) export SSQLGO=1 ssql from data.csv | ssql where -where age gt 25 | ssql generate-go # Method 2: -generate flag per command ssql from -generate data.csv | ssql where -generate -where age gt 25 | ssql generate-go
The environment variable approach is preferred for full pipelines.
Architecture (
):cmd/ssql/lib/codefragment.go
generate-go command assembles all fragments into a complete Go programFragment Types:
init - First command (e.g., from), creates initial variable, no inputstmt - Middle command (e.g., where, group-by), has input and output variablefinal - Last command (e.g., write-csv), has input but no output variableHelper Functions (in
):cmd/ssql/helpers.go
shouldGenerate(flagValue bool) - Checks flag or SSQLGO env vargetCommandString() - Returns command line that invoked the command (filters out -generate flag)shellQuote(s string) - Quotes arguments for shell safety✅ Commands with -generate support:
from - Generates init fragment with ssql.ReadCSV() or lib.ReadJSON()where - Generates stmt fragment with filter predicateto csv - Generates final fragment with ssql.WriteCSV()to json - Generates final fragment with ssql.WriteJSON()to table - Generates final fragment with ssql.DisplayTable()to chart - Generates final fragment with ssql.QuickChart()limit - Generates stmt fragment with ssql.Limit[ssql.Record](n)offset - Generates stmt fragment with ssql.Offset[ssql.Record](n)sort - Generates stmt fragment with ssql.SortBy()distinct - Generates stmt fragment with ssql.DistinctBy()group-by - Generates TWO stmt fragments (GroupByFields + Aggregate)union - Generates stmt fragment with ssql.Concat() and optionally ssql.DistinctBy(ssql.RecordKey)join - Generates stmt fragment with ssql.Join()Commands that don't need -generate:
generate-go - it's the assembler that produces the final Go codefunctions - displays help information onlyversion - displays version only⚠️ IMPORTANT: Commands without generation support will break pipelines in generation mode. Always add generation support when creating new commands.
Step 1: Add generation function to
:cmd/ssql/helpers.go
// generateMyCommandCode generates Go code for the my-command command func generateMyCommandCode(arg1 string, arg2 int) error { // 1. Read all previous code fragments from stdin fragments, err := lib.ReadAllCodeFragments() if err != nil { return fmt.Errorf("reading code fragments: %w", err) } // 2. Pass through all previous fragments for _, frag := range fragments { if err := lib.WriteCodeFragment(frag); err != nil { return fmt.Errorf("writing previous fragment: %w", err) } } // 3. Get input variable from last fragment (or default to "records") var inputVar string if len(fragments) > 0 { inputVar = fragments[len(fragments)-1].Var } else { inputVar = "records" } // 4. Generate your command's Go code outputVar := "result" code := fmt.Sprintf("%s := ssql.MyCommand(%q, %d)(%s)", outputVar, arg1, arg2, inputVar) // 5. Create and write your fragment imports := []string{"fmt"} // Add any needed imports frag := lib.NewStmtFragment(outputVar, inputVar, code, imports, getCommandString()) return lib.WriteCodeFragment(frag) }
Step 2: Add -generate flag and check to command handler in
:cmd/ssql/main.go
Subcommand("my-command"). Description("Description of my command"). Handler(func(ctx *cf.Context) error { var arg1 string var arg2 int var generate bool // Extract flags if val, ok := ctx.GlobalFlags["-arg1"]; ok { arg1 = val.(string) } if val, ok := ctx.GlobalFlags["-arg2"]; ok { arg2 = val.(int) } if genVal, ok := ctx.GlobalFlags["-generate"]; ok { generate = genVal.(bool) } // Check if generation is enabled (flag or env var) if shouldGenerate(generate) { return generateMyCommandCode(arg1, arg2) } // Normal execution follows... // ... }). Flag("-generate", "-g"). Bool(). Global(). Help("Generate Go code instead of executing"). Done(). Flag("-arg1"). String(). Global(). Help("First argument"). Done(). // ... other flags Done().
Step 3: Add tests to
:cmd/ssql/generation_test.go
func TestMyCommandGeneration(t *testing.T) { buildCmd := exec.Command("go", "build", "-o", "/tmp/ssql_test", ".") if err := buildCmd.Run(); err != nil { t.Fatalf("Failed to build ssql: %v", err) } defer os.Remove("/tmp/ssql_test") cmdLine := `echo '{"type":"init","var":"records"}' | SSQLGO=1 /tmp/ssql_test my-command -arg1 test -arg2 42` cmd := exec.Command("bash", "-c", cmdLine) output, err := cmd.CombinedOutput() if err != nil { t.Logf("Command output: %s", output) } outputStr := string(output) want := []string{`"type":"stmt"`, `"var":"result"`, `ssql.MyCommand`} for _, expected := range want { if !strings.Contains(outputStr, expected) { t.Errorf("Expected output to contain %q, got: %s", expected, outputStr) } } }
Commands with multiple fragments (like group-by):
Some commands generate multiple code fragments. For example,
group-by generates:
GroupByFields fragment (with command string)Aggregate fragment (empty command string - part of same CLI command)// Fragment 1: GroupByFields frag1 := lib.NewStmtFragment("grouped", inputVar, groupCode, nil, getCommandString()) lib.WriteCodeFragment(frag1) // Fragment 2: Aggregate (note: empty command string) frag2 := lib.NewStmtFragment("aggregated", "grouped", aggCode, nil, "") lib.WriteCodeFragment(frag2)
Manual testing:
# Test individual command export SSQLGO=1 echo '{"type":"init","var":"records"}' | ./ssql my-command -arg1 test # Test full pipeline export SSQLGO=1 ./ssql from data.csv | \ ./ssql where -where age gt 25 | \ ./ssql my-command -arg1 test | \ ./ssql generate-go > program.go # Compile and run generated code go run program.go
Automated tests:
cmd/ssql/generation_test.gogo test -v ./cmd/ssql -run TestGenerationCode generation is a CRITICAL feature because:
Always ensure:
⚠️ CLI commands must ALWAYS be implemented using ssql package functions, not raw Go code!
The ssql CLI exists to make the ssql package accessible from the command line. Every CLI command should:
If a CLI feature requires logic that doesn't exist in the ssql package:
Why this matters:
Example - group-by with expressions:
// ❌ WRONG - Generated raw loops and maps groups := make(map[string][]ssql.Record) for record := range records { // ... manual grouping logic } // ✅ CORRECT - Use ssql package functions grouped := ssql.GroupByFields("_group", "dept")(records) aggregated := ssql.Aggregate("_group", map[string]ssql.AggregateFunc{ "total": ssql.ExprAgg("sum(salary * bonus)"), // Add ExprAgg to ssql package })(grouped)
When adding new CLI features:
⚠️ NEVER release a ssql command that doesn't support code generation!
Every data-processing command MUST support code generation (
-generate flag / SSQLGO=1). This is non-negotiable because:
Before releasing any new command:
-generate flag supportcmd/ssql/generation_test.goSSQLGO=1 ssql from ... | ssql new-command ... | ssql generate-goException: Commands that don't process data (like
version, functions, generate-go itself) don't need generation support.
⚠️ All errors MUST cause pipeline failure with clear error messages!
This applies to BOTH execution mode AND code generation mode:
Execution Mode:
Code Generation Mode:
"type":"error")generate-go must detect error fragments and fail (no partial code output)Example - Proper error fragment emission:
if unsupportedFeature { frag := lib.NewErrorFragment("feature X is not yet supported with -generate", getCommandString()) lib.WriteCodeFragment(frag) return fmt.Errorf("feature X is not yet supported with -generate") }
Tests for error handling are in
:cmd/ssql/generation_test.go
TestGenerationErrorHandling - errors prevent partial codeTestErrorFragmentPropagation - errors propagate through pipelineTestErrorFragmentFormat - error fragments have correct format⚠️ GPU acceleration has been implemented and benchmarked. Results were surprising.
| Operation | CPU | GPU | Result |
|---|---|---|---|
| Sum (1M float64) | 86μs | 601μs | CPU 7x faster |
| Filter+Sum (10M float64) | 0.8ms | 5.3ms | CPU 6.6x faster |
| Convolve (100K × 1K) | 195ms | 603μs | GPU 320x faster |
| FFT (1K points) | 5.2ms | 0.25ms | GPU 21x faster |
| FFT (1M points) | hours | 2.9ms | GPU ∞ faster |
Key finding: GPU wins big for compute-heavy operations (convolution: 18-320x, FFT: 21-100x+). For memory-bound operations (aggregations), CPU wins.
PCIe transfer overhead dominates:
1M float64 values (8MB): PCIe to GPU: ~500μs+ GPU sum: ~0.1ms PCIe from GPU: ~0.01ms Total GPU: ~600μs CPU sum: ~86μs (no transfer, fast memory)
Modern CPUs have 50-100 GB/s memory bandwidth. For simple arithmetic, the CPU finishes before the GPU transfer completes.
ssql's
Record type uses Schema + []any. Extracting values requires CPU work:
// This is CPU-bound and often slower than the aggregation itself values := make([]float64, len(records)) for i, r := range records { values[i] = ssql.GetOr(r, "price", 0.0) }
Arrow columnar format bypasses this - data is already contiguous.
gpu/ ├── sum.cu # CUDA kernels (sum, filter, FFT) ├── gpu.go # Go wrappers (build tag: gpu) ├── gpu_stub.go # Stubs for non-GPU builds ├── gpu_test.go # Tests and benchmarks └── Makefile # Builds libssqlgpu.so
Option 1: Docker Build (Recommended - no local CUDA needed)
git clone https://github.com/rosscartlidge/ssql cd ssql # Build and extract the binary make docker-gpu-extract # Install the library and run sudo cp libssqlgpu.so /usr/local/lib && sudo ldconfig ./ssql_gpu version
Option 2: Local CUDA Toolkit
Requires CUDA toolkit installed locally (nvcc compiler).
git clone https://github.com/rosscartlidge/ssql cd ssql # Build everything make build-gpu # Install library system-wide (one-time) sudo make install-gpu # Now ssql_gpu works without LD_LIBRARY_PATH ./ssql_gpu version
Option 3: Docker Image (for container workflows)
make docker-gpu-image docker run --gpus all ssql:gpu version docker run --gpus all -v $(pwd):/data ssql:gpu from /data/input.csv
Available Makefile Targets:
| Target | Description |
|---|---|
| Build CUDA library only (gpu/libssqlgpu.so) |
| Build ssql_gpu binary with GPU support |
| Install library to /usr/local/lib (requires sudo) |
| Build Docker image with ssql_gpu |
| Build via Docker and extract binary |
| Alias for docker-gpu-extract |
Running GPU Tests:
# With local CUDA make install-gpu go test -tags gpu ./gpu/ # Or with LD_LIBRARY_PATH LD_LIBRARY_PATH=./gpu go test -tags gpu ./gpu/
// Convolution (18-320x speedup) - compute-heavy gpu.ConvolveDirect(signal, kernel) // Best for kernel < 10K gpu.ConvolveFFT(signal, kernel) // Best for very large kernels // FFT (21-100x+ speedup) - genuinely compute-bound gpu.FFTMagnitude(data) gpu.FFTMagnitudePhase(data)
⚠️ Always sanity-check benchmark results against theoretical expectations.
We incorrectly concluded "GPU FFT provides no benefit" based on flawed benchmarks showing:
Old (WRONG): 1M-point FFT = 4.2ms CPU, 4.2ms GPU → "Tie" New (CORRECT): 1M-point FFT = 125ms CPU, 4.4ms GPU → GPU 28x faster
The old CPU benchmark was 30x too fast - likely due to:
How to catch this: A 1M-point Cooley-Tukey FFT performs ~20M complex multiply-adds. At 125ms, that's ~6ns per operation (reasonable with cache effects). At 4.2ms, that would be 0.2ns per operation (faster than a single CPU cycle - impossible).
Rule: If benchmark results seem too good, they probably are. Verify that:
Reference: See
doc/research/gpu-arrow-learnings.md for detailed analysis and benchmark data.
ssql supports Apache Arrow format for high-performance I/O:
Benefits:
Usage:
ssql from data.arrow | ssql where -where age gt 25 | ssql to arrow output.arrow
When to use Arrow:
When to use CSV/JSON: