Image Generation

Models

text2image:

karlo text2image model
DeepFloyd if by StabilityAI open-source text-to-image model with photorealism and language understanding. code
Kandinsky multilingual text2image latent diffusion model
stable diffusion 1.5
stable diffusion 2.0
stable diffusion 2.1
stable diffusion xl (SDXL) base 0.9 & refinder 0.9
AnimateDiff Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
PixArt-alpha Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis, paper
Latent Consistency Models LoRAs for high quality few step image generation
OnnxStream Stable Diffusion XL 1.0 Base with 298MB of RAM
StreamDiffusion A Pipeline-Level Solution for Real-Time Interactive Generation
AnyText Code and Model for a diffusion pipeline covering a latent module and text embedding to generate and manipulate text in images
InstantID Zero-shot Identity-Preserving Generation in Seconds, ComfyUI plugin
PhotoMaker Rapid customization within seconds, with no additional LoRA training preserving ID with high fidelity and text controllability which can serve as an adapter for other models
StableCascade successor to Stable Diffusion by Stability AI with smaller latent space, higher speeds and better quality
IDM-VTON Virtual Try-on for clothes and fashion
ConsistentID Portrait Generation with Multimodal Fine-Grained Identity Preservation
Flux Black Forrest Labs consisting of ex stabilityAi staff built a SOTA text-to-image model Flux and Flux schnell, a 13B parameter transformer capable of writing text, following complex prompts released under apache 2 license
Lumina-mGPT multimodal autoregressive LLMs capable of generating flexible and photorealistic images from text descriptions

text to 3d:

OpenAI shap-E a text/image to 3D model
shap-e local run text-to-3d locally
stable-dreamfusion A PyTorch implementation of the text-to-3D model Dreamfusion using the Stable Diffusion text-to-2D model

image to 3d:

Wonder3D A cross-domain diffusion model for 3D reconstruction from a single image
DreamCraft3D Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Spann3R is a transformer-based model for dense 3D reconstruction from images, with spatial memory to track and predict 3D structures and capable of real-time processing

image to text (OCR):

pix2tex LaTeX OCR

other:

facebookresearch/segment-anything image segmentation
- YOLOv8 SOTA object detection, segmentation, classification and tracking
- DINOv2 1B-parameter ViT model to generate robust all-purpose visual features that outperform OpenCLIP benchmarks at image and pixel levels
- segment-anything-fast A batched offline inference oriented version of segment-anything
Final2x Image super-resolution through interpolation supporting multiple models like RealCUGAN, ESRGAN, Waifu2x, SRMD
text-to-room text to room
DragGAN Interactive Point-based Manipulation on Generative Images, demo
DragDiffusion Harnessing Diffusion Models for Interactive Point-based Image Editing
HQTrack Tracking Anything in High Quality (HQTrack) is a framework for high performance video object tracking and segmentation
CoTracker It is Better to Track Together. A fast transformer-based model that can track any point in a video
ZeroNVS Zero shot 460 degree view synthesis from single images
x-stable-diffusion Real-time inference for Stable Diffusion - 0.88s latency
Depth-Anything Better depth estimation including a ControlNet for ComfyUI and ONNX and TensorRT versions
SUPIR Super Resolution and Image Restoration
RMBG BRIA Background Removal model hf demo space

Wrappers & GUIs

ComfyUI powerful and modular stable diffusion pipelines using a graph/nodes/flowchart based interface, runs SDXL 0.9, SD2.1, SD2.0, SD1.5
- ComfyUI-Manager installs missing custom nodes automatically
- SeargeSDXL Custom SDXL Node for easier SDXL usage and img2img workflow that utilizes base & refiner
- Sytan ComfyUI SDXL workflow with txt2img using base and refiner
Automatic1111/stable-diffusion-webui well known UI for Stable Diffusion
- sd-webui-cloud-inference extension via omniinfer.io
- stable-diffusion-webui-forge platform on top of SDWebUI to make development easier, optimize resource management, and speed up inference
SD.Next vladmandic/automatic Fork, seemingly more active development efforts compared to automatic1111's original repo
Fooocus Midjourney alike GUI for SDXL to focus on prompting and generating
- RuinedFooocus A Fooocus fork
- Fooocus-MRE A Fooocus fork
stable-diffusion-xl-demo runs SDXL 0.9 in a basic interface
imaginAIry a Stable Diffusion UI
InvokeAI Alternative, polished stable diffusion UI with less features than automatic1111
mlc-ai/web-stable-diffusion
anapnoe/stable-diffusion-webui-ux Redesigned from automatic1111's UI, adding mobile and desktop layouts and UX improvements
refacer One-Click Deepfake Multi-Face Swap Tool
stable-diffusion.cpp CPU inference of Stable Diffusion in pure C/C++ with huge performance gains, supporting ggml, 16/32 bit float, 4/5/8 bit quantization, AVX/AVX2/AVX512, SD1.x, SD2.x, txt2img/img2img
FaceFusion Next generation face swapper and enhancer
OneFlow Backend for diffusers and ComfyUI
StabilityMatrix is a portable package manager and UI for GUIs like Forge, SD.Next, ComfyUI and more, supporting multiple packages, offering built-in Git and Python dependencies, and features like syntax highlighting, workspace management, and model browsing
OneDiff is a PyTorch-based acceleration library for diffusion models, offering out-of-the-box speedups, GPU optimization, and broad model and NVIDIA GPU support

Fine Tuning

https://github.com/JoePenna/Dreambooth-Stable-Diffusion
fast-stable-diffusion TheLastBen's Repo for SD, SDXL fine-tuning and DreamBooth on RunPod, Paperspace, Colab and others
https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth
https://github.com/cloneofsimo/lora
OneTrainer all in one training for SD, SDXL and inpainting models supporting fine-tuning, LoRA, embeddings
sd-scripts by kohya-ss
- LoRA Easy Training Scripts GUI for Kohya's Scripts
- Kohya_ss Windows-focused Gradio GUI for Kohya's Stable Diffusion trainers, experimental sdxl support, reddit thread
Fine tuning concepts explained visually
text2image-gui a Stable Diffusion GUI by NMKD
sd-webui-EasyPhoto / easyphoto plugin for generating AI portraits that can be used to train digital doppelgangers with 5-10 photos and a quick LoRA fine tune, paper
StableTuner Windows GUI for Finetuning / Dreambooth Stable Diffusion models (abandoned)
SimpleTuner fine-tuning for StableDiffusion, PixArt, Flux with LoRA and full U-Net training, multi GPU support, DeepSpeed
x-flux LoRA and ControlNet training scripts for Flux model by Black Forest Labs using DeepSpeed
ai-toolkit Flux LoRA training on local and runpod

Research

Speed Is All You Need up to 50% speed increase for Latent Diffusion Models
ORCa converts glossy objects into radiance-field cameras, enabling depth estimation and novel-view synthesis, project, code
cocktail Mixing Multi-Modality Controls for Text-Conditional Image Generation, project, code
SnapFusion Fast text-to-image diffusion on mobile phones in 2 seconds
Objaverse-xl dataset of 10 million annotated high quality 3D objects, hf
LightGlue Local Feature Matching at Light Speed, a lightweight feature matcher with high accuracy and blazing fast inference. It takes as input a set of keypoints and descriptors for each image and returns the indices of corresponding points
ml-mgie Guiding Instruction-based Image Editing via Multimodal Large Language Models
VAR GPT beats diffusion
InstantStyle towards Style-Preserving in Text-to-Image Generation

Models

text2image:

karlo text2image model

DeepFloyd if by StabilityAI open-source text-to-image model with photorealism and language understanding. code

Kandinsky multilingual text2image latent diffusion model

stable diffusion 1.5

stable diffusion 2.0

stable diffusion 2.1

stable diffusion xl (SDXL) base 0.9 & refinder 0.9

AnimateDiff Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

PixArt-alpha Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis, paper

Latent Consistency Models LoRAs for high quality few step image generation

OnnxStream Stable Diffusion XL 1.0 Base with 298MB of RAM

StreamDiffusion A Pipeline-Level Solution for Real-Time Interactive Generation

AnyText Code and Model for a diffusion pipeline covering a latent module and text embedding to generate and manipulate text in images

InstantID Zero-shot Identity-Preserving Generation in Seconds, ComfyUI plugin

PhotoMaker Rapid customization within seconds, with no additional LoRA training preserving ID with high fidelity and text controllability which can serve as an adapter for other models

StableCascade successor to Stable Diffusion by Stability AI with smaller latent space, higher speeds and better quality

IDM-VTON Virtual Try-on for clothes and fashion

ConsistentID Portrait Generation with Multimodal Fine-Grained Identity Preservation

Flux Black Forrest Labs consisting of ex stabilityAi staff built a SOTA text-to-image model Flux and Flux schnell, a 13B parameter transformer capable of writing text, following complex prompts released under apache 2 license

Lumina-mGPT multimodal autoregressive LLMs capable of generating flexible and photorealistic images from text descriptions

text to 3d:

OpenAI shap-E a text/image to 3D model

shap-e local run text-to-3d locally

stable-dreamfusion A PyTorch implementation of the text-to-3D model Dreamfusion using the Stable Diffusion text-to-2D model

image to 3d:

Wonder3D A cross-domain diffusion model for 3D reconstruction from a single image

DreamCraft3D Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

Spann3R is a transformer-based model for dense 3D reconstruction from images, with spatial memory to track and predict 3D structures and capable of real-time processing

image to text (OCR):

other:

facebookresearch/segment-anything image segmentation

YOLOv8 SOTA object detection, segmentation, classification and tracking
DINOv2 1B-parameter ViT model to generate robust all-purpose visual features that outperform OpenCLIP benchmarks at image and pixel levels
segment-anything-fast A batched offline inference oriented version of segment-anything

Final2x Image super-resolution through interpolation supporting multiple models like RealCUGAN, ESRGAN, Waifu2x, SRMD

text-to-room text to room

DragGAN Interactive Point-based Manipulation on Generative Images, demo

DragDiffusion Harnessing Diffusion Models for Interactive Point-based Image Editing

HQTrack Tracking Anything in High Quality (HQTrack) is a framework for high performance video object tracking and segmentation

CoTracker It is Better to Track Together. A fast transformer-based model that can track any point in a video

ZeroNVS Zero shot 460 degree view synthesis from single images

x-stable-diffusion Real-time inference for Stable Diffusion - 0.88s latency

Depth-Anything Better depth estimation including a ControlNet for ComfyUI and ONNX and TensorRT versions

SUPIR Super Resolution and Image Restoration

RMBG BRIA Background Removal model hf demo space

Wrappers & GUIs

ComfyUI powerful and modular stable diffusion pipelines using a graph/nodes/flowchart based interface, runs SDXL 0.9, SD2.1, SD2.0, SD1.5

ComfyUI-Manager installs missing custom nodes automatically
SeargeSDXL Custom SDXL Node for easier SDXL usage and img2img workflow that utilizes base & refiner
Sytan ComfyUI SDXL workflow with txt2img using base and refiner

Automatic1111/stable-diffusion-webui well known UI for Stable Diffusion

sd-webui-cloud-inference extension via omniinfer.io
stable-diffusion-webui-forge platform on top of SDWebUI to make development easier, optimize resource management, and speed up inference

SD.Next vladmandic/automatic Fork, seemingly more active development efforts compared to automatic1111's original repo

Fooocus Midjourney alike GUI for SDXL to focus on prompting and generating

RuinedFooocus A Fooocus fork
Fooocus-MRE A Fooocus fork

stable-diffusion-xl-demo runs SDXL 0.9 in a basic interface

imaginAIry a Stable Diffusion UI

InvokeAI Alternative, polished stable diffusion UI with less features than automatic1111

mlc-ai/web-stable-diffusion

anapnoe/stable-diffusion-webui-ux Redesigned from automatic1111's UI, adding mobile and desktop layouts and UX improvements

refacer One-Click Deepfake Multi-Face Swap Tool

stable-diffusion.cpp CPU inference of Stable Diffusion in pure C/C++ with huge performance gains, supporting ggml, 16/32 bit float, 4/5/8 bit quantization, AVX/AVX2/AVX512, SD1.x, SD2.x, txt2img/img2img

FaceFusion Next generation face swapper and enhancer

OneFlow Backend for diffusers and ComfyUI

StabilityMatrix is a portable package manager and UI for GUIs like Forge, SD.Next, ComfyUI and more, supporting multiple packages, offering built-in Git and Python dependencies, and features like syntax highlighting, workspace management, and model browsing

OneDiff is a PyTorch-based acceleration library for diffusion models, offering out-of-the-box speedups, GPU optimization, and broad model and NVIDIA GPU support

Fine Tuning

https://github.com/JoePenna/Dreambooth-Stable-Diffusion

fast-stable-diffusion TheLastBen's Repo for SD, SDXL fine-tuning and DreamBooth on RunPod, Paperspace, Colab and others

https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth

https://github.com/cloneofsimo/lora

OneTrainer all in one training for SD, SDXL and inpainting models supporting fine-tuning, LoRA, embeddings

sd-scripts by kohya-ss

LoRA Easy Training Scripts GUI for Kohya's Scripts
Kohya_ss Windows-focused Gradio GUI for Kohya's Stable Diffusion trainers, experimental sdxl support, reddit thread

Fine tuning concepts explained visually

text2image-gui a Stable Diffusion GUI by NMKD

sd-webui-EasyPhoto / easyphoto plugin for generating AI portraits that can be used to train digital doppelgangers with 5-10 photos and a quick LoRA fine tune, paper

StableTuner Windows GUI for Finetuning / Dreambooth Stable Diffusion models (abandoned)

SimpleTuner fine-tuning for StableDiffusion, PixArt, Flux with LoRA and full U-Net training, multi GPU support, DeepSpeed

x-flux LoRA and ControlNet training scripts for Flux model by Black Forest Labs using DeepSpeed

ai-toolkit Flux LoRA training on local and runpod

Research

Speed Is All You Need up to 50% speed increase for Latent Diffusion Models

ORCa converts glossy objects into radiance-field cameras, enabling depth estimation and novel-view synthesis, project, code

cocktail Mixing Multi-Modality Controls for Text-Conditional Image Generation, project, code

SnapFusion Fast text-to-image diffusion on mobile phones in 2 seconds

Objaverse-xl dataset of 10 million annotated high quality 3D objects, hf

LightGlue Local Feature Matching at Light Speed, a lightweight feature matcher with high accuracy and blazing fast inference. It takes as input a set of keypoints and descriptors for each image and returns the indices of corresponding points

ml-mgie Guiding Instruction-based Image Editing via Multimodal Large Language Models

VAR GPT beats diffusion

InstantStyle towards Style-Preserving in Text-to-Image Generation

[🏠Home](README.md)

Image Generation

Models

Wrappers & GUIs

Fine Tuning

Research

Related Skills

<h1 align="center">

- Identify gaps

2. Apply Deepthink Protocol (reason about dependencies

Image Generation

Models

Wrappers & GUIs

Fine Tuning

Research