Video

Text to video generation

ModelScope Text to video synthesis
- zeroscope v2 xl Watermark free modelscope based video model generating high quality video at 1024x576 16:9, to be used with text2video extension for automatic1111
Nvidia VideoLDM: Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
Potat1 , colab
Phenaki multi minute text to video prompts with scene changes, project page
StableVideo Text-driven Consistency-aware Diffusion Video Editing, code, paper
Rerender A Video Zero-Shot Text-Guided Video-to-Video Translation, paper
VideoCrafter1 Open Diffusion Models for High-Quality Video Generation
i2vgen-xl a holistic video generation ecosystem for video generation building on diffusion models
pixeldance High-Dynamic Video Generation
Open-Sora-Plan aims to reproduce Sora
StoryDiffusion Consistent Long-Range Image and Video Generation
Open-Sora Open implementation approach for video generation
CogVideo SOTA video generation and consistency generating 6 seconds of video with 8fps at 720x480 using 18-36GB vRAM
Pyramid-Flow is a highly efficient autoregressive video generation method that leverages flow matching for improved computational efficiency, capable of generating high-quality 10-second videos at 768p resolution and 24 FPS, and supporting image-to-video generation.
HunyuanVideo Tencent's open-weight video-generation model
mochi-1 state of the art video generation model with high-fidelity motion and strong prompt adherence by Genmo
Wan2.1 is an open large-scale video generative model that excels in multiple tasks, including text-to-video and video editing, while achieving SOTA performance on consumer-grade GPUs
SkyReels-V2 an advanced video generation model available in 1.3B and 14B variants, capable of producing unlimited duration videos for both text-to-video and image-to-video tasks, and demonstrating superior performance compared to leading models like HunyuanVideo-13B and Wan2.1-14B
Magi-1 autoregressive video generation model that enables unlimited duration video creation with precise control over timing and dynamics, supporting text-to-video, image-to-video, and video-to-video tasks while leading the Physics-IQ Benchmark for its exceptional performance

Frame Interpolation (Temporal Interpolation)

https://github.com/google-research/frame-interpolation
https://github.com/ltkong218/ifrnet
https://github.com/megvii-research/ECCV2022-RIFE
Framer Interactive User Guided Flow Frame Interpolation

Segmentation & Tracking

Segment and Track Anything, code. an innovative framework combining the Segment Anything Model (SAM) and DeAOT tracking model, enables precise, multimodal object tracking in video, demonstrating superior performance in benchmarks
Track Anything, code. extends the Segment Anything Model (SAM) to achieve high-performance, interactive tracking and segmentation in videos with minimal human intervention, addressing SAM's limitations in consistent video segmentation
MAGVIT Single model for multiple video synthesis outperforming existing methods in quality and inference time, code and models, paper
FastSAM Fast Segment Anything, a CNN trained achieving a comparable performance with the SAM method at 50× higher run-time speed.
SAM-PT Extending SAM to zero-shot video segmentation with point-based tracking, paper
DEVA Tracking Anything with Decoupled Video Segmentation, paper
Cutie Putting the Object Back into Video Object Segmentation, paper
YOLOv10 Real-Time End-to-End Object Detection
SAM2 enables fast, precise selection of any object in any video or image

Super Resolution (Spacial Interpolation)

NeRF

Instant-ngp Train NeRFs in under 5 seconds on windows/linux with support for GPUs
NeRFstudio A Collaboration Friendly Studio for NeRFs simplifying the process of creating, training, and testing NeRFs and supports web-based visualizer, benchmarks, and pipeline support.
Threestudio A Framework for 3D Content Creation from Text Prompts, Single Images, and Few-Shot Images or text2image created single image to 3D
Zero-1-to-3 Zero-shot One Image to 3D Object for novel view synthesis and 3D reconstruction
localrf NeRFs for reconstructing large-scale stabilized scenes from shakey videos, paper, project page
gaussian-splatting reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering", paper
4d-gaussian-splatting Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting, paper

Deepfakes

roop one-click deepfake (face swap)
- rope GUI-focused roop
streamv2v Official Pytorch implementation of StreamV2V
MusePose Pose Driven Image 2 Video framework to generate Virtual Humans
V-Express generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images
Deep-Live-Cam real time face swap and one-click video deepfake with only a single image

Benchmarking

MSU Benchmarks collection of video processing benchmarks developed by the Video Processing Group at the Moscow State University
Video Super Resolution Benchmarks
Video Generation Benchmarks
Video Frame Interpolation Benchmarks

Inpainting Outpainting

ProPainter Improving Propagation and Transformer for Video Inpainting, paper

Text to video generation

ModelScope Text to video synthesis

zeroscope v2 xl Watermark free modelscope based video model generating high quality video at 1024x576 16:9, to be used with text2video extension for automatic1111

Nvidia VideoLDM: Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

Potat1 , colab

Phenaki multi minute text to video prompts with scene changes, project page

StableVideo Text-driven Consistency-aware Diffusion Video Editing, code, paper

Rerender A Video Zero-Shot Text-Guided Video-to-Video Translation, paper

VideoCrafter1 Open Diffusion Models for High-Quality Video Generation

i2vgen-xl a holistic video generation ecosystem for video generation building on diffusion models

pixeldance High-Dynamic Video Generation

Open-Sora-Plan aims to reproduce Sora

StoryDiffusion Consistent Long-Range Image and Video Generation

Open-Sora Open implementation approach for video generation

CogVideo SOTA video generation and consistency generating 6 seconds of video with 8fps at 720x480 using 18-36GB vRAM

Pyramid-Flow is a highly efficient autoregressive video generation method that leverages flow matching for improved computational efficiency, capable of generating high-quality 10-second videos at 768p resolution and 24 FPS, and supporting image-to-video generation.

HunyuanVideo Tencent's open-weight video-generation model

mochi-1 state of the art video generation model with high-fidelity motion and strong prompt adherence by Genmo

Wan2.1 is an open large-scale video generative model that excels in multiple tasks, including text-to-video and video editing, while achieving SOTA performance on consumer-grade GPUs

SkyReels-V2 an advanced video generation model available in 1.3B and 14B variants, capable of producing unlimited duration videos for both text-to-video and image-to-video tasks, and demonstrating superior performance compared to leading models like HunyuanVideo-13B and Wan2.1-14B

Magi-1 autoregressive video generation model that enables unlimited duration video creation with precise control over timing and dynamics, supporting text-to-video, image-to-video, and video-to-video tasks while leading the Physics-IQ Benchmark for its exceptional performance

Segmentation & Tracking

Segment and Track Anything, code. an innovative framework combining the Segment Anything Model (SAM) and DeAOT tracking model, enables precise, multimodal object tracking in video, demonstrating superior performance in benchmarks

Track Anything, code. extends the Segment Anything Model (SAM) to achieve high-performance, interactive tracking and segmentation in videos with minimal human intervention, addressing SAM's limitations in consistent video segmentation

MAGVIT Single model for multiple video synthesis outperforming existing methods in quality and inference time, code and models, paper

FastSAM Fast Segment Anything, a CNN trained achieving a comparable performance with the SAM method at 50× higher run-time speed.

SAM-PT Extending SAM to zero-shot video segmentation with point-based tracking, paper

DEVA Tracking Anything with Decoupled Video Segmentation, paper

Cutie Putting the Object Back into Video Object Segmentation, paper

YOLOv10 Real-Time End-to-End Object Detection

SAM2 enables fast, precise selection of any object in any video or image

NeRF

Instant-ngp Train NeRFs in under 5 seconds on windows/linux with support for GPUs

NeRFstudio A Collaboration Friendly Studio for NeRFs simplifying the process of creating, training, and testing NeRFs and supports web-based visualizer, benchmarks, and pipeline support.

Threestudio A Framework for 3D Content Creation from Text Prompts, Single Images, and Few-Shot Images or text2image created single image to 3D

Zero-1-to-3 Zero-shot One Image to 3D Object for novel view synthesis and 3D reconstruction

localrf NeRFs for reconstructing large-scale stabilized scenes from shakey videos, paper, project page

gaussian-splatting reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering", paper

4d-gaussian-splatting Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting, paper

Deepfakes

roop one-click deepfake (face swap)

rope GUI-focused roop

streamv2v Official Pytorch implementation of StreamV2V

MusePose Pose Driven Image 2 Video framework to generate Virtual Humans

V-Express generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images

Deep-Live-Cam real time face swap and one-click video deepfake with only a single image

[🏠Home](README.md)

Video

Text to video generation

Frame Interpolation (Temporal Interpolation)

Segmentation & Tracking

Super Resolution (Spacial Interpolation)

Spacio Temporal Interpolation

NeRF

Deepfakes

Benchmarking

Inpainting Outpainting

Related Skills

<h1 align="center">

- Identify gaps

2. Apply Deepthink Protocol (reason about dependencies

Video

Text to video generation

Frame Interpolation (Temporal Interpolation)

Segmentation & Tracking

Super Resolution (Spacial Interpolation)

Spacio Temporal Interpolation

NeRF

Deepfakes

Benchmarking

Inpainting Outpainting