Coding
PromptBeginner5 minmarkdown
Markdown Converter
Agent skill for markdown-converter
7
This document helps AI agents work effectively in this codebase.
Sign in to like and favorite skills
This document helps AI agents work effectively in this codebase.
Parakeet ASR Server - A Go-based automatic speech recognition (ASR) server using NVIDIA's Parakeet TDT 0.6B model in ONNX format. Provides an OpenAI Whisper-compatible API for audio transcription.
# Build make build # Build to ./bin/parakeet # Run make run # Build and run with debug mode make run-dev # Run with custom port (5092) for development ./bin/parakeet # Run binary directly ./bin/parakeet -port 8080 -models /path/to/models -debug=true # Download models make models # Download int8 models (default, ~670MB) make models-int8 # Download int8 quantized models make models-fp32 # Download full precision models (~2.5GB) # Test make test # Run tests make test-coverage # Run with coverage # Code quality make fmt # Format code make vet # Run go vet make lint # Run all linters (vet + fmt) # Dependencies make deps # Download Go dependencies make deps-tidy # Tidy dependencies make deps-onnxruntime # Install ONNX Runtime library # Docker make docker-build-int8 # Build image with int8 models make docker-build-fp32 # Build image with fp32 models make docker-run-int8 # Run container with int8 models make docker-run-fp32 # Run container with fp32 models # Release make release # Build all platforms make release-linux # Build Linux binaries (amd64/arm64) make release-darwin # Build macOS binaries (amd64/arm64) make release-windows # Build Windows binary (amd64)
parakeet/ ├── main.go # Entry point, CLI flags, server initialization ├── internal/ │ ├── asr/ │ │ ├── transcriber.go # ONNX inference pipeline, TDT decoding │ │ ├── mel.go # Mel filterbank feature extraction (FFT, windowing) │ │ └── audio.go # WAV parsing, resampling to 16kHz │ └── server/ │ ├── server.go # HTTP server, route setup, lifecycle management │ ├── handlers.go # API endpoint handlers, response formatting │ └── types.go # Request/response type definitions ├── models/ # ONNX models (downloaded separately) ├── bin/ # Build output directory ├── Makefile # Build recipes ├── Dockerfile # Multi-stage container build ├── .github/ │ └── workflows/ │ ├── ci.yaml # CI pipeline (lint, test, build) │ └── release.yaml # Release pipeline (binaries, docker) └── README.md
main.go (Entry Point)-port, -models, -debug./modelsinternal/server/ (HTTP Server Package)server.goConfig struct: Port, ModelsDir, Debug settingsServer struct: wraps config, transcriber, and HTTP muxNew() - Initializes transcriber and routesRun() - Starts HTTP listenerClose() - Releases resourceshandlers.gohandleTranscription() - Main endpoint, parses multipart form, returns transcriptionhandleTranslation() - Delegates to transcription (Parakeet is English-focused)handleModels() - Returns available models (parakeet-tdt-0.6b, whisper-1 alias)handleHealth() - Health check endpointformatSRTTime(), formatVTTTime()types.goTranscriptionResponse - Simple JSON response with textVerboseTranscriptionResponse - Detailed response with segments, timingSegment - Transcription segment with timing infoErrorResponse, ErrorDetail - OpenAI-compatible error formatModelInfo, ModelsResponse - Model listing typesinternal/asr/ (ASR Package)transcriber.goDebugMode - Global flag for verbose loggingConfig - Model configuration (features_size, subsampling_factor)Transcriber - Main inference structNewTranscriber() - Loads config, vocab, initializes ONNX RuntimeTranscribe() - Main entry: audio -> mel -> encoder -> TDT decode -> textloadAudio() - Format detection and parsingrunInference() - Encoder ONNX session executiontdtDecode() - TDT greedy decoding loop with state managementtokensToText() - Token IDs to text with cleanupmel.goMelFilterbank - Mel-scale filterbank feature extractorNewMelFilterbank() - Creates filterbank with NeMo defaults (128 mels, 512 FFT)Extract() - Computes mel features with Hann windowingnormalize() - Per-utterance mean/variance normalizationfft() - Radix-2 Cooley-Tukey FFT implementationaudio.goparseWAV() - WAV parser supporting multiple chunk layoutsconvertToFloat32() - Supports 8/16/24/32-bit PCM and 32-bit floatresample() - Linear interpolation resampling to 16kHz| Method | Path | Description |
|---|---|---|
| POST | | Transcribe audio (OpenAI-compatible) |
| POST | | Translate audio (delegates to transcription) |
| GET | | List available models |
| GET | | Health check |
file (required) - Audio file (multipart form, max 25MB)model - Accepted but ignored (only one model)language - ISO-639-1 code (default: "en")response_format - json, text, srt, vtt, verbose_json (default: "json")prompt, temperature - Accepted but ignoredparseWAV, convertToFloat32, tdtDecodeinputTensor, outputTensor, lengthTensorfmt.Errorf("context: %w", err)defer (tensor.Destroy(), file.Close())ort.NewTensor(shape, data)ort.NewAdvancedSession() for named inputs/outputs.Destroy() on tensors and sessions after usejson:"field_name" with omitempty where appropriate▁token 123
▁ (U+2581) as word boundary marker<blk> (blank) at index 8192ONNXRUNTIME_LIB env var if not in standard pathsmake deps-onnxruntime to install (requires sudo)encoder-model.int8.onnx (~652MB) or encoder-model.onnx (~2.5GB)decoder_joint-model.int8.onnx (~18MB) or decoder_joint-model.onnx (~72MB)config.json, vocab.txt, nemo128.onnxmake models or manually from HuggingFacemodel parameter accepted but ignored (only one model)prompt and temperature parameters accepted but ignoredlanguage defaults to "en" if not specified| Variable | Description | Default |
|---|---|---|
| Path to libonnxruntime.so | Auto-detect |
From
go.mod:
go 1.25.5 github.com/yalue/onnxruntime_go v1.19.0
No other external Go dependencies. Standard library used for HTTP, JSON, audio processing, FFT.
.github/workflows/ci.yaml).github/workflows/release.yaml)internal/asr/transcriber.go:loadAudio()internal/asr/audio.go[]float32 normalized to [-1, 1] at 16kHzinternal/server/types.gointernal/server/handlers.gointernal/server/handlers.gointernal/server/server.go:setupRoutes()internal/server/types.go if neededinternal/asr/transcriber.go:247 (encoderDim := int64(1024))internal/asr/transcriber.go:314-315 (stateDim, numLayers)internal/asr/transcriber.go:39 (maxTokensPerStep: 10)internal/asr/mel.go:25-27 (nFFT, hopLength, winLength)## Description comment for help@ prefix for silent commands.PHONY if not a file targetgit tag v1.0.0git push origin v1.0.0