General

FAQ

Flash attention is an option for accelerating training and inference. Only NVIDIA GPUs of Turing, Ampere, Ada, and Hopper architecture, e.g., H100, A100, RTX 3090, T4, RTX 2080, can support flash attention. **You can use our models without installing it.**

promptBeginner5 min to valuemarkdown
0 views
Feb 1, 2026

Sign in to like and favorite skills