[🏠Home](README.md)

Open LLM Models List

Due to projects like Explore the LLMs specializing in model indexing, the custom list has been removed.

Noteworthy

Model	Link	Description	Date added
BitNet b1.58 2B4T	https://huggingface.co/microsoft/bitnet-b1.58-2B-4T	the first native 1-bit LLM at the 2-billion parameter scale achieving performance comparable to full-precision models of similar size but with computational efficiency (memory, energy, latency)	2025-04-25
OpenHands-LM	https://huggingface.co/all-hands	openhands 1.5b, 7b and 32b coding models with verified strong performance on SWE-Bench using the OpenHands Coding-Agent	2025-04-25
OpenThinker2-32B	https://huggingface.co/open-thoughts/OpenThinker2-32B	fine-tuned version of Qwen2.5-32B-Instruct on the OpenThoughts2-1M dataset with increased quality compared to base model	2025-04-25
Cogito-V1	https://huggingface.co/collections/deepcogito/cogito-v1-preview-67eb105721081abe4ce2ee53	Cogito model family with Qwen and Llama fine-tunes using Iterated Distillation and Amplification (IDA) to increase coding, STEM and IF quality compared to their base models	2025-04-25
DeepCoder-14B-Preview	https://huggingface.co/agentica-org/DeepCoder-14B-Preview	code reasoning LLM fine-tuned from DeepSeek-R1-Distilled-Qwen-14B using RL to scale up to long context lengths using DeepScaleR and GRPO+	2025-04-25
ZR1-1.5B	https://huggingface.co/Zyphra/ZR1-1.5B	Fine Tuned DeepSeek-R1-Distill-Qwen-1.5B trained extensively on both verified coding and mathematics problems with reinforcement learning	2025-04-25
Skywork-OR1	https://huggingface.co/Skywork/Skywork-OR1-32B-Preview	7B, 7B math and 32B reasoning models with open sourced weights, training data and training code	2025-04-25
GLM-4	https://huggingface.co/collections/THUDM/glm-4-0414-67f3cbcb34dd9d252707cb2e	3 32B models as general, reasoning and deep reasoning variant as well as a 9B SML	2025-04-25
MAI-DS-R1	https://huggingface.co/microsoft/MAI-DS-R1	post-trained DeepSeek-R1 reasoning model by Microsoft AI that enhances responsiveness on blocked topics while maintaining strong reasoning capabilities	2025-04-25
Gemma 3 QAT	https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-gguf	Quantization Aware Training models from Google, regaining bf16 quality in int4 quants and slashing memory footprint	2025-04-25
Sky-T1-7B-Mini	https://huggingface.co/NovaSky-AI/Sky-T1-mini	Trained with simple RL applied on DeepSeek-R1-Distill-Qwen-7B model, achieving close to OpenAI o1-mini performance on math benchmarks	2025-02-21
OmniParser-v2	https://huggingface.co/microsoft/OmniParser-v2.0	A VLM converting screenshots of Phone and Desktop UIs into structured list of interactable elements for Computer-Use	2025-02-21
R1-1776	https://huggingface.co/perplexity-ai/r1-1776	Deepseek-R1 671B Param model with removed Chinese Communist Party Censorship	2025-02-21
Step-Audio-Chat	https://huggingface.co/stepfun-ai/Step-Audio-Chat	Multimodal Large Language Model with 130B parameters for speech recognition, semantic understanding, dialogue management, voice cloning, and speech generation	2025-02-21
Qwen2.5-VL	https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct	3B, 7B and 72B Vision Text Multimodal Model with support for bounding boxes, structured output, OCR for tables, forms etc, long video understanding, agentic computer and phone use, visual and text understanding	2025-02-21
Arcee-Maestro-7B	https://huggingface.co/arcee-ai/Arcee-Maestro-7B-Preview	RL trained reasoning model based on DeepSeek-R1-Distill-Qwen-7B with further GPRO training for reasoning, math and coding	2025-02-21
Arcee-Blitz	https://huggingface.co/arcee-ai/Arcee-Blitz	Mistral-Small-24B-Instruct base distilled with DeepSeek-R1 for fast and efficient resaoning with 32k context	2025-02-21
OpenThinker-32B	https://huggingface.co/open-thoughts/OpenThinker-32B	fine-tuned reasoning model of Qwen/Qwen2.5-32B-Instruct on the DeepSeek-R1 distilled OpenThoughts-114k dataset	2025-02-21
MiniCPM-o	https://huggingface.co/openbmb/MiniCPM-o-2_6	GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone	2025-02-21
DeepSeek-R1	https://huggingface.co/deepseek-ai/DeepSeek-R1	Ground Breaking reasoning model from DeepSeek trained on novel method to decrease RLHF efforts with distilled variants of various sizes	2025-02-01
Sky-T1	https://huggingface.co/NovaSky-AI/Sky-T1-32B-Preview	UC Berkeley's reasoning model with 32B parameters	2025-01-15
QwQ	https://huggingface.co/Qwen/QwQ-32B-Preview	Qwen's reasoning model with 32B parameter	2025-01-15
Moxin LLM	https://huggingface.co/moxin-org/moxin-llm-7b	Fully open data, open training 7B base and chat fine tuned model	2024-12-20
Bamba-9b	https://huggingface.co/blog/bamba	Hybrid Mamba2 model by IBM, Princeton, CMU, UIUC trained on open data with 2.5x throughput available for vLLM, TRL, llama and transformers	2024-12-20
Command R7B	https://huggingface.co/CohereForAI/c4ai-command-r7b-12-2024	open weights research 7B model with reasoning, summarization, question answering, coding, tool use and RAG capabilities	2024-12-20
DeepSeek-V2.5-1210-236B	https://huggingface.co/deepseek-ai/DeepSeek-V2.5-1210	1210 improvement over original V2.5 with Math, Coding and Reasoning improvements	2024-12-20
QwQ-32b	https://qwenlm.github.io/blog/qwq-32b-preview/	Apache 2 licensed LLM from Alibaba Cloud's Qwen team, inspired by OpenAI's o1 reasoning model for test time compute via reasoning tokens to improve performance	2024-12-02
Sparse-Llama-3.1-8B-2of4	https://neuralmagic.com/blog/24-sparse-llama-smaller-models-for-efficient-gpu-inference/	2:4 Sparse Llama: Smaller Models for Efficient GPU Inference	2024-12-02
CursorCore	https://huggingface.co/collections/TechxGenus/cursorcore-series-6706618c38598468866b60e2	Coding LLMs for use within CursorCore and CursorWeb
ichigo	https://huggingface.co/homebrewltd	an open research project extending text-based llama3 to have native "listening" ability, using an early fusion technique, with improved multiturn capabilities and refusal to process inaudible queries
Zamba2	https://www.zyphra.com/post/zamba2-7b	a 7B SOTA SML for running on-device with 25% faster first token time and 20% token per second rate compared to other architectures using Mamba2 blocks interleaved shared attention blocks and LoRA shared MLP block
reader-lm	https://jina.ai/news/reader-lm-small-language-models-for-cleaning-and-converting-html-to-markdown	Jina AI's LLM to convert HTML to Markdown, making heuristics, cleanup and content identification an LLM task
Pixtral	https://huggingface.co/mistralai/Pixtral-12B-2409	12B LLM with a 400M vision encoder for multi modal image and text inference and 128k sequence length by Mistral
llama-3.2	https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/	small and medium sized vision LLMs in 11b and 90b and text only 1b and 3b models by Meta
gemma2 2b	https://huggingface.co/bartowski/gemma-2-2b-it-GGUF	2b small language model by google achieving SOTA performance for sub 3b models on LLM Leaderboard 2
DeepSeekCoderv2	https://github.com/deepseek-ai/DeepSeek-Coder-V2?tab=readme-ov-file#2-model-downloads	16b and 236b mixture of experts coding models with 128k context length
codegemma	https://huggingface.co/google/codegemma-7b	google's coding models from 2b base, 7b base and 7b instruct
codeqwen1.5	https://huggingface.co/Qwen/CodeQwen1.5-7B	base and chat models with 7B parameters and good quality
Qwen2	https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f	English and Chinese models from 0.5b, 1.5b, 7b, and 72b sizes with great performance and 128k context windows for the 7 and 72b models
Phi	https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3	Microsoft's small language and vision models with small and medium parameter sizes, short and long context lengths and great performance
Yi-1.5	[https://huggingface.co/01-ai/Yi-9B](https://huggingface.co/01-ai/Yi-1.5-34B-Chat	9b model focusing on multilingual text understanding, available as 9B and 34B variants
InternLM2.5	https://huggingface.co/internlm/internlm2_5-7b-chat	7B base and chat models focusing reasoning, math and tool use and 1M context window
Mistral-Large	https://huggingface.co/mistralai/Mistral-Large-Instruct-2407	a 123B sized model beating llama-3.1 and gpt-4o in several categories with a focus on multilinguality, coding, agentic tasks and reasoning.
Llama-3.1	https://ai.meta.com/blog/meta-llama-3-1/	Metas most advanced model providing 8b, 70b and 405b base and instruction tuned models and 128k context window with on par quality of current SOTA closed source models
Nuextract	https://huggingface.co/numind/NuExtract	is a structure extraction model based on phi-3-mini, allowing to instruct based on a json template that the model fills from unstructured text provided
Mistral Nemo	https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407	a 12B model by mistral and nvidia offering 128k context window offered as instruct and base models
CodeGeeX4	https://huggingface.co/THUDM/codegeex4-all-9b	9B multilingual code generation model for chat and instruct with a 128k context length
Mamba-Codestral	https://huggingface.co/mistralai/Mamba-Codestral-7B-v0.1	by mistral based on the Mamba2 architecture performing on par with SOTA transformer based code models
Aya-23	https://huggingface.co/CohereForAI/aya-23-35B	8B and 35B instruction tuned multi lingual model focusing on 23 languages
Mistral-7b-instruct-v0.3	https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3	with function calling, new tokenizer and 32k max context
CodeStral-22B	https://huggingface.co/mistralai/Codestral-22B-v0.1	Coding model trained on 80+ languages with instruct and Fill in the Middle tasks, 32k max context
Yuan2-M32	https://huggingface.co/IEITYuan/Yuan2-M32-hf	Mixture of Experts with Attention Router, 32 Experts, 2 Active, TOtal 40B parameters, 3.7B active and max length of 16K
DeepSeek-V2	https://github.com/deepseek-ai/DeepSeek-V2#2-model-downloads	21B Strong, Economical, and Efficient Mixture-of-Experts Language Model
Granite	https://huggingface.co/ibm-granite	family of Code Models from IBM with 3b, 8b, 20b, 34b, base and instruct models for code completion and chat
GemMoE	https://huggingface.co/Crystalcareai/GemMoE-Base-Random	An 8x8 Mixture Of Experts based on Gemma
wavecoder-ultra-6.7b	https://huggingface.co/microsoft/wavecoder-ultra-6.7b	covering four general code-related tasks: code generation, code summary, code translation, and code repair
Mixtral-8x22B-Instruct-v0.1	https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1	an instruct fine-tuned version of the Mixtral-8x22B-v0.1
WizardLM-2-8x22B	https://huggingface.co/alpindale/WizardLM-2-8x22B	Microsoft's WizardLM 2 8x22B beating gpt-4-0314 on MT-Bench
WizardLM-2-7B	https://huggingface.co/microsoft/WizardLM-2-7B	Microsoft's WizardLM 2 7B, release for 70B coming up backup0
aiXcoder	https://huggingface.co/aiXcoder/aixcoder-7b-base	7B Code LLM for code completion, comprehension, generation
Mixtral-8x22B-v0.1	https://huggingface.co/v2ray/Mixtral-8x22B-v0.1	Sparse MoE model with 176B total and 44B active parameters, 65k context size
grok-1	https://huggingface.co/xai-org/grok-1	314b MoE model by xAI
DBRX	https://huggingface.co/databricks/dbrx-base	base and instruct MoE models from databricks with 132B total parameters and a larger number of smaller experts supporting RoPE and 32K context size
command-r-plus	https://huggingface.co/CohereForAI/c4ai-command-r-plus	a 104B model with highly advanced capabilities including RAG and tool use for English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Arabic, and Simplified Chinese
StarCoder2	https://huggingface.co/bigcode/starcoder2-15b	15B, 7B and 3B code completion models trained on The Stack v2
command-r	https://www.maginative.com/article/cohere-launches-command-r-scalable-ai-model-for-enterprise-rag-and-tool-use/	35B optimized for retrieval augmented generation (RAG) and tool use supporting Embed and Rerank methodology. model weights
AI21 Jamba	https://huggingface.co/ai21labs/Jamba-v0.1	production-grade Mamba-based hybrid SSM-Transformer Model licensed under Apache 2.0 with 256K context and 52B MoE at 12B each
Smaug-72B	https://huggingface.co/abacusai/Smaug-72B-v0.1	Based on Qwen-72B and MoMo-72B-Lora then finetuned by Abacus.AI, is the best performing Open LLM on the HF leaderboard by Feb-2024
SLIM Model Family	https://huggingface.co/llmware	Small Specialized Function-Calling Models for Multi-Step Automation, focused on enterprise RAG workflows
aya-101	https://huggingface.co/CohereForAI/aya-101	13b model fine tuned open acess multilingual LLM from Cohere For AI
seamlessM4T v2	https://huggingface.co/docs/transformers/en/model_doc/seamless_m4t_v2	Multimodal Audio and Text Translation between many languages
SeaLLM	https://huggingface.co/SeaLLMs/SeaLLM-7B-v2	multilingual LLM for Southeast Asian (SEA) languages 🇬🇧 🇨🇳 🇻🇳 🇮🇩 🇹🇭 🇲🇾 🇰🇭 🇱🇦 🇲🇲 🇵🇭
meditron	https://github.com/epfLLM/meditron	7B and 70B Llama2 based LLM fine tuning adapted for the medical domain
Mixtral of experts	https://mistral.ai/news/mixtral-of-experts/	A high quality Sparse Mixture-of-Experts.
Poro	https://huggingface.co/LumiOpen/Poro-34B	SiloGen model checkpoints of a family of multilingual open source LLMs covering all official European languages and code, news
deepseek-coder	https://github.com/deepseek-ai/DeepSeek-Coder	code language models, trained on 2T tokens, 87% code 13% English / Chinese, up to 33B with 16K context size achieving SOTA performance on coding benchmarks
openchat	https://github.com/imoneoi/openchat	Advancing Open-source Language Models with Mixed-Quality Data
llmware RAG models	https://huggingface.co/llmware	small LLMs and sentence transformer embedding models specifically fine-tuned for RAG workflows
HelixNet	https://huggingface.co/migtissera/HelixNet	Mixture of Experts with 3 Mistral-7B, LoRA, HelixNet-LMoE optimized version
Mistral-7B-german-assistant-v3	https://huggingface.co/flozi00/Mistral-7B-german-assistant-v3	finetuned version for german instructions and conversations in style of Alpaca. "### Assistant:" "### User:", trained with a context length of 8k tokens. The dataset used is deduplicated and cleaned, with no codes inside. The focus is on instruction following and conversational tasks
WizardMath-70B-V1.0	https://huggingface.co/WizardLM/WizardMath-70B-V1.0	SOTA Mathematical Reasoning
leo-hessianai-13b-chat-bilingual	https://huggingface.co/LeoLM/leo-hessianai-13b-chat-bilingual	based on llama-2 13b is a fine tune of the base leo-hessianai-13b for chat
em_german_leo_mistral	https://huggingface.co/jphme/em_german_leo_mistral	LeoLM Mistral fine tune of LeoLM with german instructions
SauerkrautLM-13B-v1	https://huggingface.co/VAGOsolutions/SauerkrautLM-13b-v1	fine tuned llama-2 13b on a mix of German data augmentation and translations, SauerkrautLM-7b-v1-mistral German SauerkrautLM-7b fine-tuned using QLoRA on 1 A100 80GB with Axolotl
CodeShell	https://github.com/WisdomShell/codeshell/blob/main/README_EN.md	code LLM with 7b parameters trained on 500b tokens, context length of 8k outperforming CodeLlama and Starcoder on humaneval, weights
sqlcoder	https://github.com/defog-ai/sqlcoder	15B parameter model that outperforms gpt-3.5-turbo for natural language to SQL generation tasks
ChatGLM2-6B	https://github.com/THUDM/ChatGLM2-6B	v2 of the GLM 6B open bilingual EN/CN model
baichuan-7b	https://github.com/baichuan-inc/baichuan-7B	Baichuan Intelligent Technology developed baichuan-7B, an open-source language model with 7 billion parameters trained on 1.2 trillion tokens. Supporting Chinese and English, it achieves top performance on authoritative benchmarks (C-EVAL, MMLU)
salesforce/CodeT5	https://github.com/salesforce/codet5	code assistant, has released their codet5+ 16b and other model sizes
VPGTrans	https://vpgtrans.github.io/	Transfer Visual Prompt Generator across LLMs and the VL-Vicuna model is a novel VL-LLM. Paper, code
replit-code	https://huggingface.co/replit/	focused on Code Completion. The model has been trained on a subset of the Stack Dedup v1.2 dataset.
Visual-med-alpaca	https://github.com/cambridgeltl/visual-med-alpaca	fine-tuning llama-7b on self instruct for the biomedical domain. Models locked behind a request form.
Multimodal-GPT	https://github.com/open-mmlab/Multimodal-GPT	multi-modal visual/language chatbot, using llama with custom LoRA weights and openflamingo-9B.
mPLUG-Owl	https://github.com/X-PLUG/mPLUG-Owl	Multimodal finetuned model for visual/language tasks
MOSS by Fudan University	https://github.com/OpenLMLab/MOSS	a 16b Chinese/English custom foundational model with additional models fine tuned on sft and plugin usage
BigCode	Open Scientific collaboration to train a coding LLM	https://huggingface.co/bigcode
CodeGeeX 13B	Multi Language Code Generation Model	https://huggingface.co/spaces/THUDM/CodeGeeX
RWKV: Parallelizable RNN with Transformer-level LLM Performance	https://github.com/BlinkDL/RWKV-LM
GeoV/GeoV-9b	https://huggingface.co/GeoV/GeoV-9b	9B parameter, in-progress training to 300B tokens (33:1)
LAION OpenFlamingo	Multi Modal Model and training architecture	https://github.com/mlfoundations/open_flamingo
Cerebras GPT-13b	https://huggingface.co/cerebras	(release notes)

🏠Home

Open LLM Models List

Due to projects like Explore the LLMs specializing in model indexing, the custom list has been removed.

Noteworthy

Model	Link	Description	Date added
BitNet b1.58 2B4T	https://huggingface.co/microsoft/bitnet-b1.58-2B-4T	the first native 1-bit LLM at the 2-billion parameter scale achieving performance comparable to full-precision models of similar size but with computational efficiency (memory, energy, latency)	2025-04-25
OpenHands-LM	https://huggingface.co/all-hands	openhands 1.5b, 7b and 32b coding models with verified strong performance on SWE-Bench using the OpenHands Coding-Agent	2025-04-25
OpenThinker2-32B	https://huggingface.co/open-thoughts/OpenThinker2-32B	fine-tuned version of Qwen2.5-32B-Instruct on the OpenThoughts2-1M dataset with increased quality compared to base model	2025-04-25
Cogito-V1	https://huggingface.co/collections/deepcogito/cogito-v1-preview-67eb105721081abe4ce2ee53	Cogito model family with Qwen and Llama fine-tunes using Iterated Distillation and Amplification (IDA) to increase coding, STEM and IF quality compared to their base models	2025-04-25
DeepCoder-14B-Preview	https://huggingface.co/agentica-org/DeepCoder-14B-Preview	code reasoning LLM fine-tuned from DeepSeek-R1-Distilled-Qwen-14B using RL to scale up to long context lengths using DeepScaleR and GRPO+	2025-04-25
ZR1-1.5B	https://huggingface.co/Zyphra/ZR1-1.5B	Fine Tuned DeepSeek-R1-Distill-Qwen-1.5B trained extensively on both verified coding and mathematics problems with reinforcement learning	2025-04-25
Skywork-OR1	https://huggingface.co/Skywork/Skywork-OR1-32B-Preview	7B, 7B math and 32B reasoning models with open sourced weights, training data and training code	2025-04-25
GLM-4	https://huggingface.co/collections/THUDM/glm-4-0414-67f3cbcb34dd9d252707cb2e	3 32B models as general, reasoning and deep reasoning variant as well as a 9B SML	2025-04-25
MAI-DS-R1	https://huggingface.co/microsoft/MAI-DS-R1	post-trained DeepSeek-R1 reasoning model by Microsoft AI that enhances responsiveness on blocked topics while maintaining strong reasoning capabilities	2025-04-25
Gemma 3 QAT	https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-gguf	Quantization Aware Training models from Google, regaining bf16 quality in int4 quants and slashing memory footprint	2025-04-25
Sky-T1-7B-Mini	https://huggingface.co/NovaSky-AI/Sky-T1-mini	Trained with simple RL applied on DeepSeek-R1-Distill-Qwen-7B model, achieving close to OpenAI o1-mini performance on math benchmarks	2025-02-21
OmniParser-v2	https://huggingface.co/microsoft/OmniParser-v2.0	A VLM converting screenshots of Phone and Desktop UIs into structured list of interactable elements for Computer-Use	2025-02-21
R1-1776	https://huggingface.co/perplexity-ai/r1-1776	Deepseek-R1 671B Param model with removed Chinese Communist Party Censorship	2025-02-21
Step-Audio-Chat	https://huggingface.co/stepfun-ai/Step-Audio-Chat	Multimodal Large Language Model with 130B parameters for speech recognition, semantic understanding, dialogue management, voice cloning, and speech generation	2025-02-21
Qwen2.5-VL	https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct	3B, 7B and 72B Vision Text Multimodal Model with support for bounding boxes, structured output, OCR for tables, forms etc, long video understanding, agentic computer and phone use, visual and text understanding	2025-02-21
Arcee-Maestro-7B	https://huggingface.co/arcee-ai/Arcee-Maestro-7B-Preview	RL trained reasoning model based on DeepSeek-R1-Distill-Qwen-7B with further GPRO training for reasoning, math and coding	2025-02-21
Arcee-Blitz	https://huggingface.co/arcee-ai/Arcee-Blitz	Mistral-Small-24B-Instruct base distilled with DeepSeek-R1 for fast and efficient resaoning with 32k context	2025-02-21
OpenThinker-32B	https://huggingface.co/open-thoughts/OpenThinker-32B	fine-tuned reasoning model of Qwen/Qwen2.5-32B-Instruct on the DeepSeek-R1 distilled OpenThoughts-114k dataset	2025-02-21
MiniCPM-o	https://huggingface.co/openbmb/MiniCPM-o-2_6	GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone	2025-02-21
DeepSeek-R1	https://huggingface.co/deepseek-ai/DeepSeek-R1	Ground Breaking reasoning model from DeepSeek trained on novel method to decrease RLHF efforts with distilled variants of various sizes	2025-02-01
Sky-T1	https://huggingface.co/NovaSky-AI/Sky-T1-32B-Preview	UC Berkeley's reasoning model with 32B parameters	2025-01-15
QwQ	https://huggingface.co/Qwen/QwQ-32B-Preview	Qwen's reasoning model with 32B parameter	2025-01-15
Moxin LLM	https://huggingface.co/moxin-org/moxin-llm-7b	Fully open data, open training 7B base and chat fine tuned model	2024-12-20
Bamba-9b	https://huggingface.co/blog/bamba	Hybrid Mamba2 model by IBM, Princeton, CMU, UIUC trained on open data with 2.5x throughput available for vLLM, TRL, llama and transformers	2024-12-20
Command R7B	https://huggingface.co/CohereForAI/c4ai-command-r7b-12-2024	open weights research 7B model with reasoning, summarization, question answering, coding, tool use and RAG capabilities	2024-12-20
DeepSeek-V2.5-1210-236B	https://huggingface.co/deepseek-ai/DeepSeek-V2.5-1210	1210 improvement over original V2.5 with Math, Coding and Reasoning improvements	2024-12-20
QwQ-32b	https://qwenlm.github.io/blog/qwq-32b-preview/	Apache 2 licensed LLM from Alibaba Cloud's Qwen team, inspired by OpenAI's o1 reasoning model for test time compute via reasoning tokens to improve performance	2024-12-02
Sparse-Llama-3.1-8B-2of4	https://neuralmagic.com/blog/24-sparse-llama-smaller-models-for-efficient-gpu-inference/	2:4 Sparse Llama: Smaller Models for Efficient GPU Inference	2024-12-02
CursorCore	https://huggingface.co/collections/TechxGenus/cursorcore-series-6706618c38598468866b60e2	Coding LLMs for use within CursorCore and CursorWeb
ichigo	https://huggingface.co/homebrewltd	an open research project extending text-based llama3 to have native "listening" ability, using an early fusion technique, with improved multiturn capabilities and refusal to process inaudible queries
Zamba2	https://www.zyphra.com/post/zamba2-7b	a 7B SOTA SML for running on-device with 25% faster first token time and 20% token per second rate compared to other architectures using Mamba2 blocks interleaved shared attention blocks and LoRA shared MLP block
reader-lm	https://jina.ai/news/reader-lm-small-language-models-for-cleaning-and-converting-html-to-markdown	Jina AI's LLM to convert HTML to Markdown, making heuristics, cleanup and content identification an LLM task
Pixtral	https://huggingface.co/mistralai/Pixtral-12B-2409	12B LLM with a 400M vision encoder for multi modal image and text inference and 128k sequence length by Mistral
llama-3.2	https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/	small and medium sized vision LLMs in 11b and 90b and text only 1b and 3b models by Meta
gemma2 2b	https://huggingface.co/bartowski/gemma-2-2b-it-GGUF	2b small language model by google achieving SOTA performance for sub 3b models on LLM Leaderboard 2
DeepSeekCoderv2	https://github.com/deepseek-ai/DeepSeek-Coder-V2?tab=readme-ov-file#2-model-downloads	16b and 236b mixture of experts coding models with 128k context length
codegemma	https://huggingface.co/google/codegemma-7b	google's coding models from 2b base, 7b base and 7b instruct
codeqwen1.5	https://huggingface.co/Qwen/CodeQwen1.5-7B	base and chat models with 7B parameters and good quality
Qwen2	https://huggingface.co/collections/Qwen/qwen2-6659360b33528ced941e557f	English and Chinese models from 0.5b, 1.5b, 7b, and 72b sizes with great performance and 128k context windows for the 7 and 72b models
Phi	https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3	Microsoft's small language and vision models with small and medium parameter sizes, short and long context lengths and great performance
Yi-1.5	[https://huggingface.co/01-ai/Yi-9B](https://huggingface.co/01-ai/Yi-1.5-34B-Chat	9b model focusing on multilingual text understanding, available as 9B and 34B variants
InternLM2.5	https://huggingface.co/internlm/internlm2_5-7b-chat	7B base and chat models focusing reasoning, math and tool use and 1M context window
Mistral-Large	https://huggingface.co/mistralai/Mistral-Large-Instruct-2407	a 123B sized model beating llama-3.1 and gpt-4o in several categories with a focus on multilinguality, coding, agentic tasks and reasoning.
Llama-3.1	https://ai.meta.com/blog/meta-llama-3-1/	Metas most advanced model providing 8b, 70b and 405b base and instruction tuned models and 128k context window with on par quality of current SOTA closed source models
Nuextract	https://huggingface.co/numind/NuExtract	is a structure extraction model based on phi-3-mini, allowing to instruct based on a json template that the model fills from unstructured text provided
Mistral Nemo	https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407	a 12B model by mistral and nvidia offering 128k context window offered as instruct and base models
CodeGeeX4	https://huggingface.co/THUDM/codegeex4-all-9b	9B multilingual code generation model for chat and instruct with a 128k context length
Mamba-Codestral	https://huggingface.co/mistralai/Mamba-Codestral-7B-v0.1	by mistral based on the Mamba2 architecture performing on par with SOTA transformer based code models
Aya-23	https://huggingface.co/CohereForAI/aya-23-35B	8B and 35B instruction tuned multi lingual model focusing on 23 languages
Mistral-7b-instruct-v0.3	https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3	with function calling, new tokenizer and 32k max context
CodeStral-22B	https://huggingface.co/mistralai/Codestral-22B-v0.1	Coding model trained on 80+ languages with instruct and Fill in the Middle tasks, 32k max context
Yuan2-M32	https://huggingface.co/IEITYuan/Yuan2-M32-hf	Mixture of Experts with Attention Router, 32 Experts, 2 Active, TOtal 40B parameters, 3.7B active and max length of 16K
DeepSeek-V2	https://github.com/deepseek-ai/DeepSeek-V2#2-model-downloads	21B Strong, Economical, and Efficient Mixture-of-Experts Language Model
Granite	https://huggingface.co/ibm-granite	family of Code Models from IBM with 3b, 8b, 20b, 34b, base and instruct models for code completion and chat
GemMoE	https://huggingface.co/Crystalcareai/GemMoE-Base-Random	An 8x8 Mixture Of Experts based on Gemma
wavecoder-ultra-6.7b	https://huggingface.co/microsoft/wavecoder-ultra-6.7b	covering four general code-related tasks: code generation, code summary, code translation, and code repair
Mixtral-8x22B-Instruct-v0.1	https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1	an instruct fine-tuned version of the Mixtral-8x22B-v0.1
WizardLM-2-8x22B	https://huggingface.co/alpindale/WizardLM-2-8x22B	Microsoft's WizardLM 2 8x22B beating gpt-4-0314 on MT-Bench
WizardLM-2-7B	https://huggingface.co/microsoft/WizardLM-2-7B	Microsoft's WizardLM 2 7B, release for 70B coming up backup0
aiXcoder	https://huggingface.co/aiXcoder/aixcoder-7b-base	7B Code LLM for code completion, comprehension, generation
Mixtral-8x22B-v0.1	https://huggingface.co/v2ray/Mixtral-8x22B-v0.1	Sparse MoE model with 176B total and 44B active parameters, 65k context size
grok-1	https://huggingface.co/xai-org/grok-1	314b MoE model by xAI
DBRX	https://huggingface.co/databricks/dbrx-base	base and instruct MoE models from databricks with 132B total parameters and a larger number of smaller experts supporting RoPE and 32K context size
command-r-plus	https://huggingface.co/CohereForAI/c4ai-command-r-plus	a 104B model with highly advanced capabilities including RAG and tool use for English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Arabic, and Simplified Chinese
StarCoder2	https://huggingface.co/bigcode/starcoder2-15b	15B, 7B and 3B code completion models trained on The Stack v2
command-r	https://www.maginative.com/article/cohere-launches-command-r-scalable-ai-model-for-enterprise-rag-and-tool-use/	35B optimized for retrieval augmented generation (RAG) and tool use supporting Embed and Rerank methodology. model weights
AI21 Jamba	https://huggingface.co/ai21labs/Jamba-v0.1	production-grade Mamba-based hybrid SSM-Transformer Model licensed under Apache 2.0 with 256K context and 52B MoE at 12B each
Smaug-72B	https://huggingface.co/abacusai/Smaug-72B-v0.1	Based on Qwen-72B and MoMo-72B-Lora then finetuned by Abacus.AI, is the best performing Open LLM on the HF leaderboard by Feb-2024
SLIM Model Family	https://huggingface.co/llmware	Small Specialized Function-Calling Models for Multi-Step Automation, focused on enterprise RAG workflows
aya-101	https://huggingface.co/CohereForAI/aya-101	13b model fine tuned open acess multilingual LLM from Cohere For AI
seamlessM4T v2	https://huggingface.co/docs/transformers/en/model_doc/seamless_m4t_v2	Multimodal Audio and Text Translation between many languages
SeaLLM	https://huggingface.co/SeaLLMs/SeaLLM-7B-v2	multilingual LLM for Southeast Asian (SEA) languages 🇬🇧 🇨🇳 🇻🇳 🇮🇩 🇹🇭 🇲🇾 🇰🇭 🇱🇦 🇲🇲 🇵🇭
meditron	https://github.com/epfLLM/meditron	7B and 70B Llama2 based LLM fine tuning adapted for the medical domain
Mixtral of experts	https://mistral.ai/news/mixtral-of-experts/	A high quality Sparse Mixture-of-Experts.
Poro	https://huggingface.co/LumiOpen/Poro-34B	SiloGen model checkpoints of a family of multilingual open source LLMs covering all official European languages and code, news
deepseek-coder	https://github.com/deepseek-ai/DeepSeek-Coder	code language models, trained on 2T tokens, 87% code 13% English / Chinese, up to 33B with 16K context size achieving SOTA performance on coding benchmarks
openchat	https://github.com/imoneoi/openchat	Advancing Open-source Language Models with Mixed-Quality Data
llmware RAG models	https://huggingface.co/llmware	small LLMs and sentence transformer embedding models specifically fine-tuned for RAG workflows
HelixNet	https://huggingface.co/migtissera/HelixNet	Mixture of Experts with 3 Mistral-7B, LoRA, HelixNet-LMoE optimized version
Mistral-7B-german-assistant-v3	https://huggingface.co/flozi00/Mistral-7B-german-assistant-v3	finetuned version for german instructions and conversations in style of Alpaca. "### Assistant:" "### User:", trained with a context length of 8k tokens. The dataset used is deduplicated and cleaned, with no codes inside. The focus is on instruction following and conversational tasks
WizardMath-70B-V1.0	https://huggingface.co/WizardLM/WizardMath-70B-V1.0	SOTA Mathematical Reasoning
leo-hessianai-13b-chat-bilingual	https://huggingface.co/LeoLM/leo-hessianai-13b-chat-bilingual	based on llama-2 13b is a fine tune of the base leo-hessianai-13b for chat
em_german_leo_mistral	https://huggingface.co/jphme/em_german_leo_mistral	LeoLM Mistral fine tune of LeoLM with german instructions
SauerkrautLM-13B-v1	https://huggingface.co/VAGOsolutions/SauerkrautLM-13b-v1	fine tuned llama-2 13b on a mix of German data augmentation and translations, SauerkrautLM-7b-v1-mistral German SauerkrautLM-7b fine-tuned using QLoRA on 1 A100 80GB with Axolotl
CodeShell	https://github.com/WisdomShell/codeshell/blob/main/README_EN.md	code LLM with 7b parameters trained on 500b tokens, context length of 8k outperforming CodeLlama and Starcoder on humaneval, weights
sqlcoder	https://github.com/defog-ai/sqlcoder	15B parameter model that outperforms gpt-3.5-turbo for natural language to SQL generation tasks
ChatGLM2-6B	https://github.com/THUDM/ChatGLM2-6B	v2 of the GLM 6B open bilingual EN/CN model
baichuan-7b	https://github.com/baichuan-inc/baichuan-7B	Baichuan Intelligent Technology developed baichuan-7B, an open-source language model with 7 billion parameters trained on 1.2 trillion tokens. Supporting Chinese and English, it achieves top performance on authoritative benchmarks (C-EVAL, MMLU)
salesforce/CodeT5	https://github.com/salesforce/codet5	code assistant, has released their codet5+ 16b and other model sizes
VPGTrans	https://vpgtrans.github.io/	Transfer Visual Prompt Generator across LLMs and the VL-Vicuna model is a novel VL-LLM. Paper, code
replit-code	https://huggingface.co/replit/	focused on Code Completion. The model has been trained on a subset of the Stack Dedup v1.2 dataset.
Visual-med-alpaca	https://github.com/cambridgeltl/visual-med-alpaca	fine-tuning llama-7b on self instruct for the biomedical domain. Models locked behind a request form.
Multimodal-GPT	https://github.com/open-mmlab/Multimodal-GPT	multi-modal visual/language chatbot, using llama with custom LoRA weights and openflamingo-9B.
mPLUG-Owl	https://github.com/X-PLUG/mPLUG-Owl	Multimodal finetuned model for visual/language tasks
MOSS by Fudan University	https://github.com/OpenLMLab/MOSS	a 16b Chinese/English custom foundational model with additional models fine tuned on sft and plugin usage
BigCode	Open Scientific collaboration to train a coding LLM	https://huggingface.co/bigcode
CodeGeeX 13B	Multi Language Code Generation Model	https://huggingface.co/spaces/THUDM/CodeGeeX
RWKV: Parallelizable RNN with Transformer-level LLM Performance	https://github.com/BlinkDL/RWKV-LM
GeoV/GeoV-9b	https://huggingface.co/GeoV/GeoV-9b	9B parameter, in-progress training to 300B tokens (33:1)
LAION OpenFlamingo	Multi Modal Model and training architecture	https://github.com/mlfoundations/open_flamingo
Cerebras GPT-13b	https://huggingface.co/cerebras	(release notes)

[🏠Home](README.md)

Open LLM Models List

Noteworthy

Related Skills

<h1 align="center">

- Identify gaps

2. Apply Deepthink Protocol (reason about dependencies

Open LLM Models List

Noteworthy