Markdown Converter
Agent skill for markdown-converter
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Sign in to like and favorite skills
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is a Russian Text-to-Speech (TTS) system trained on 228 Russian audio files containing phrases from an Agatha Christie story about Hercule Poirot. The text content is encoded in the audio filenames using underscores as word separators.
The TTS system uses a Tacotron2-inspired architecture with Russian-specific text processing:
python3 -m venv venv source venv/bin/activate # macOS/Linux pip install -r requirements.txt
python prepare_dataset.py
Creates training metadata from audio filenames in
input/ directory, generating data/train.csv, data/val.csv, and statistics.
python train.py --epochs 100 --batch_size 8 --device auto
Device selection automatically prioritizes Apple Silicon MPS, then CUDA, then CPU. Training saves checkpoints to
checkpoints/ directory.
python test_setup.py
Comprehensive test of device support, text processing, model creation, dataset loading, and inference setup.
# Single text python inference.py --text "Привет мир" --output hello.wav # Interactive mode python inference.py # Batch processing python inference.py --text_file texts.txt --output_dir generated/
The codebase is optimized for Apple Silicon with automatic MPS (Metal Performance Shaders) detection. Training scripts check for MPS availability before falling back to CUDA or CPU.
Russian text is converted to lowercase and mapped to character IDs. Unknown characters are replaced with spaces. The vocabulary includes all Cyrillic letters (both cases), punctuation, and spaces.
Audio files in
input/ directory use filename format: {number}_{russian_text_with_underscores}.wav
Processed metadata is stored in data/ directory with train/validation splits (90%/10%).
The model architecture expects encoder output dimension (256) to match the encoder_dim parameter in the Attention module. The decoder dimension (1024) is separate and configurable.
When adding new audio files, ensure filenames follow the existing pattern with Russian text encoded using underscores as word separators. Re-run
prepare_dataset.py to update metadata files.
The inference system can work with untrained models (will show warnings) but requires proper checkpoint files for quality speech synthesis.