General

Low-Cost Deployment

By default, the model is loaded with FP16 precision, running the above code requires about 13GB of VRAM. If your GPU's VRAM is limited, you can try loading the model quantitatively, as follows:

promptBeginner5 min to valuemarkdown
0 views
Feb 1, 2026

Sign in to like and favorite skills