Markdown Converter
Agent skill for markdown-converter
This document provides comprehensive instructions for AI agents implementing kata exercises for the spaced repetition system.
Sign in to like and favorite skills
This document provides comprehensive instructions for AI agents implementing kata exercises for the spaced repetition system.
Each kata must be atomic - focused on implementing exactly one function or concept.
❌ Bad (Multi-concept kata):
# template.py - TOO BROAD class MultiHeadAttention: def __init__(self, d_model, n_heads): # TODO: initialize weight matrices def split_heads(self, x): # TODO: reshape for multi-head def attention(self, Q, K, V): # TODO: scaled dot-product attention def forward(self, x): # TODO: full multi-head attention
Problems:
✅ Good (Atomic katas):
# kata: attention_scores def attention_scores(Q: torch.Tensor, K: torch.Tensor) -> torch.Tensor: """Compute Q @ K.T / sqrt(d_k)""" # BLANK_START raise NotImplementedError # BLANK_END # kata: attention_weights (depends on softmax) def attention_weights(scores: torch.Tensor) -> torch.Tensor: """Apply softmax to attention scores""" # BLANK_START raise NotImplementedError # BLANK_END # kata: attention_output def attention_output(weights: torch.Tensor, V: torch.Tensor) -> torch.Tensor: """Apply attention weights to values: weights @ V""" # BLANK_START raise NotImplementedError # BLANK_END # kata: multihead_reshape (depends on attention_output) def multihead_reshape(x: torch.Tensor, n_heads: int) -> torch.Tensor: """Reshape (batch, seq, d_model) → (batch, seq, n_heads, d_head)""" # BLANK_START raise NotImplementedError # BLANK_END
Benefits:
Every kata must have exactly 4 files in
exercises/<kata_name>/:
exercises/ └── attention_scores/ ├── __init__.py # Empty file (required for Python module) ├── manifest.toml # Metadata (name, category, difficulty, dependencies) ├── template.py # User-facing code with BLANK_START/BLANK_END ├── reference.py # Your complete solution └── test_kata.py # 5-10 pytest tests
__init__.pyAlways empty. Required for Python module imports.
# Empty file
manifest.tomlMetadata defining the kata:
[kata] name = "attention_scores" # Must match directory name (snake_case) category = "transformers" # Grouping: algorithms, transformers, pytorch, etc. base_difficulty = 3 # 1-5 scale (start conservative) description = """ Compute scaled attention scores from query and key matrices. Implement the core attention scoring mechanism: scores = Q @ K.T / sqrt(d_k) This forms the foundation of the attention mechanism. """ dependencies = [] # List of prerequisite kata names tags = ["attention", "matrix-ops"] # Optional: for searching/filtering
Guidelines:
template.pyUser-facing starter code with exactly ONE
BLANK_START/BLANK_END pair.
import torch def attention_scores( Q: torch.Tensor, K: torch.Tensor, scale: bool = True ) -> torch.Tensor: """ Compute scaled dot-product attention scores. Args: Q: Query matrix (batch, seq_q, d_k) K: Key matrix (batch, seq_k, d_k) scale: Whether to scale by sqrt(d_k) Returns: Attention scores (batch, seq_q, seq_k) Example: >>> Q = torch.randn(2, 10, 64) >>> K = torch.randn(2, 15, 64) >>> scores = attention_scores(Q, K) >>> scores.shape torch.Size([2, 10, 15]) """ # BLANK_START raise NotImplementedError("Compute Q @ K.transpose(-2, -1), then scale if needed") # BLANK_END
Requirements:
BLANK_START/BLANK_END pair (per file)raise NotImplementedError with helpful hintType Hint Guidelines:
torch.Tensor, jnp.ndarray, np.ndarrayOptional[T] for optional args (not T | None for broader compatibility)list, dict, tuplereference.pyYour complete, correct solution. This is used:
import torch def attention_scores( Q: torch.Tensor, K: torch.Tensor, scale: bool = True ) -> torch.Tensor: """ Compute scaled dot-product attention scores. Args: Q: Query matrix (batch, seq_q, d_k) K: Key matrix (batch, seq_k, d_k) scale: Whether to scale by sqrt(d_k) Returns: Attention scores (batch, seq_q, seq_k) """ # Compute Q @ K^T scores = Q @ K.transpose(-2, -1) # Scale by sqrt(d_k) if requested if scale: d_k = Q.shape[-1] scores = scores / (d_k ** 0.5) return scores
Requirements:
test_kata.pyComprehensive pytest tests (5-10 tests covering correctness and edge cases).
import pytest import torch from framework import assert_shape, assert_close # Import user implementation or fall back to reference try: from user_kata import attention_scores except ImportError: from .reference import attention_scores def test_output_shape(): """Verify output has correct shape (batch, seq_q, seq_k)""" Q = torch.randn(2, 10, 64) K = torch.randn(2, 15, 64) scores = attention_scores(Q, K) assert_shape(scores, (2, 10, 15)) def test_scaling(): """Scores should be scaled by sqrt(d_k)""" Q = torch.randn(1, 5, 32) K = torch.randn(1, 5, 32) scores_scaled = attention_scores(Q, K, scale=True) scores_unscaled = attention_scores(Q, K, scale=False) expected_scale = 32 ** 0.5 assert_close(scores_scaled * expected_scale, scores_unscaled, rtol=1e-5) def test_single_sequence(): """Handle single sequence (batch=1, seq=1)""" Q = torch.randn(1, 1, 16) K = torch.randn(1, 1, 16) scores = attention_scores(Q, K) assert_shape(scores, (1, 1, 1)) def test_different_seq_lengths(): """Q and K can have different sequence lengths""" Q = torch.randn(1, 10, 64) K = torch.randn(1, 20, 64) scores = attention_scores(Q, K) assert_shape(scores, (1, 10, 20)) def test_correctness_simple(): """Verify correctness on simple example""" # Simple case: Q=K, so scores should be symmetric x = torch.tensor([[[1.0, 0.0], [0.0, 1.0]]]) # (1, 2, 2) scores = attention_scores(x, x, scale=False) expected = torch.tensor([[[1.0, 0.0], [0.0, 1.0]]]) # Identity assert_close(scores, expected, rtol=1e-5) def test_batch_dimension(): """Verify batching works correctly""" Q = torch.randn(8, 10, 64) K = torch.randn(8, 10, 64) scores = attention_scores(Q, K) assert_shape(scores, (8, 10, 10)) def test_large_dimensions(): """Handle realistic large dimensions""" Q = torch.randn(16, 512, 768) K = torch.randn(16, 512, 768) scores = attention_scores(Q, K) assert_shape(scores, (16, 512, 512))
Test Structure Guidelines:
Import pattern (first 4 lines):
try: from user_kata import <function_name> except ImportError: from .reference import <function_name>
Test count: 5-10 tests (prefer 7-8)
Test categories (include at least one of each):
Test naming: Use descriptive names with
test_ prefix
Assertions: Use framework helpers when possible
assert_shape(tensor, expected_shape) for shapesassert_close(actual, expected, rtol, atol) for numerical comparisonassert for boolean conditionsDocstrings: One-line explanation of what the test verifies
Use dependencies to create structured learning progressions.
Direct dependencies only: List immediate prerequisites, not transitive
# Good [kata] name = "attention_output" dependencies = ["attention_weights"] # Only direct dependency # Bad [kata] name = "attention_output" dependencies = ["attention_weights", "softmax"] # softmax is transitive
Minimal dependencies: Only require what's actually needed
Avoid circular dependencies: Graph must be acyclic
softmax (difficulty: 1, deps: []) ↓ attention_weights (difficulty: 2, deps: [softmax]) ↓ attention_scores (difficulty: 2, deps: []) ↓ attention_output (difficulty: 3, deps: [attention_weights]) ↓ multihead_split (difficulty: 3, deps: []) ↓ multihead_attention (difficulty: 4, deps: [attention_output, multihead_split])
When implementing a kata, verify:
attention_scores)__init__.py, manifest.toml, template.py, reference.py, test_kata.pymanifest.toml has correct name matching directoryBLANK_START/BLANK_END pairFor mathematical functions (softmax, layer_norm, etc.):
# template.py def softmax(x: torch.Tensor, dim: int = -1) -> torch.Tensor: """ Compute softmax: exp(x) / sum(exp(x)) Args: x: Input tensor (any shape) dim: Dimension to normalize over Returns: Probabilities summing to 1.0 along dim """ # BLANK_START raise NotImplementedError("Use torch.exp and normalization") # BLANK_END
Tests should verify:
For nn.Module implementations:
# template.py import torch import torch.nn as nn class LayerNorm(nn.Module): def __init__(self, normalized_shape: int, eps: float = 1e-5): super().__init__() self.eps = eps # BLANK_START raise NotImplementedError("Initialize gamma and beta parameters") # BLANK_END def forward(self, x: torch.Tensor) -> torch.Tensor: """Normalize over last dimension and apply affine transform""" # Implementation goes in reference.py, not template return x # Placeholder
Note: For classes, put BLANK in
__init__, provide complete forward in reference only.
For complex tensor manipulations (einsum, advanced indexing):
# template.py def batch_outer_product(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor: """ Compute outer product for each batch: x[i] ⊗ y[i] Args: x: Tensor (batch, n) y: Tensor (batch, m) Returns: Outer products (batch, n, m) """ # BLANK_START raise NotImplementedError("Use broadcasting or einsum") # BLANK_END
Tests should verify:
Good descriptions:
Bad descriptions:
Start conservative - SM-2 will adjust based on user performance.
In
NotImplementedError messages, provide direction without giving away solution:
Good hints:
Bad hints:
Prioritize:
Avoid:
Here's a complete example of a well-designed atomic kata:
Directory:
exercises/relu/
:manifest.toml
[kata] name = "relu" category = "fundamentals" base_difficulty = 1 description = """ Implement ReLU activation: max(0, x) Rectified Linear Unit - the most common activation function. """ dependencies = []
:template.py
import torch def relu(x: torch.Tensor) -> torch.Tensor: """ Apply ReLU activation element-wise. Args: x: Input tensor (any shape) Returns: Activated tensor (same shape as x) """ # BLANK_START raise NotImplementedError("Return max(0, x) element-wise") # BLANK_END
:reference.py
import torch def relu(x: torch.Tensor) -> torch.Tensor: """ Apply ReLU activation element-wise. Args: x: Input tensor (any shape) Returns: Activated tensor (same shape as x) """ return torch.maximum(x, torch.zeros_like(x))
:test_kata.py
import pytest import torch from framework import assert_shape, assert_close try: from user_kata import relu except ImportError: from .reference import relu def test_output_shape(): """Output shape matches input shape""" x = torch.randn(10, 20) y = relu(x) assert_shape(y, (10, 20)) def test_positive_unchanged(): """Positive values pass through unchanged""" x = torch.tensor([1.0, 2.0, 3.0]) y = relu(x) assert_close(y, x) def test_negative_zeroed(): """Negative values become zero""" x = torch.tensor([-1.0, -2.0, -3.0]) expected = torch.tensor([0.0, 0.0, 0.0]) y = relu(x) assert_close(y, expected) def test_mixed_values(): """Correctly handles mix of positive and negative""" x = torch.tensor([-1.0, 0.0, 1.0, -2.0, 2.0]) expected = torch.tensor([0.0, 0.0, 1.0, 0.0, 2.0]) y = relu(x) assert_close(y, expected) def test_zero_threshold(): """Zero maps to zero (boundary case)""" x = torch.tensor([0.0]) y = relu(x) assert_close(y, torch.tensor([0.0])) def test_multidimensional(): """Works with multidimensional tensors""" x = torch.randn(4, 5, 6) y = relu(x) assert_shape(y, (4, 5, 6)) assert torch.all((y >= 0) & ((y == 0) | (y == x)))
This kata demonstrates all the principles:
Q: What if a concept naturally requires multiple functions?
A: Split into multiple katas. Example: Instead of "attention mechanism" kata, create:
attention_scores (Q @ K.T)attention_weights (softmax)attention_output (weights @ V)Q: Should I include multiple BLANK_START/BLANK_END pairs?
A: No. Exactly one pair per template. If you need multiple, split into multiple katas.
Q: What about class-based implementations (nn.Module)?
A: Put BLANK in
__init__ for parameter initialization. Provide complete forward in reference only. Or split into separate katas (one for init, one for forward).
Q: How do I handle optional advanced features?
A: Use optional arguments with defaults. Test both basic and advanced usage.
Q: Should every kata have dependencies?
A: No. Many fundamental katas (softmax, relu, etc.) have no dependencies. Only add dependencies when the concept genuinely builds on another.
When creating katas, remember:
BLANK_START/BLANK_END pairFollowing these guidelines ensures consistent, high-quality katas that optimize the spaced repetition learning experience.