Image Generation

RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers

20 February 2025·2754 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Science and Technology of China

RelaCtrl: Relevance-guided control boosts diffusion transformer efficiency, cutting parameters by intelligently allocating resources.

PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data

20 February 2025·1606 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 National University of Singapore

PhotoDoodle: Mimicking artistic image editing with personalized decorative elements through learning from few-shot pairwise data.

Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening

17 February 2025·2525 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University

Diffusion-Sharpening enhances diffusion model fine-tuning by optimizing sampling trajectories, achieving faster convergence and high inference efficiency without extra NFEs, leading to improved alignm…

VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation

11 February 2025·3389 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Fudan University

VidCRAFT3 enables high-quality image-to-video generation with precise control over camera movement, object motion, and lighting, pushing the boundaries of visual content creation.

MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers

11 February 2025·2884 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 School of Artificial Intelligence, University of Chinese Academy of Sciences

MRS: a novel, training-free sampler, drastically speeds up controllable image generation using Mean Reverting Diffusion, achieving 10-20x speedup across various tasks.

Magic 1-For-1: Generating One Minute Video Clips within One Minute

11 February 2025·1947 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University

Magic141 generates one-minute video clips in under a minute by cleverly factorizing the generation task and employing optimization techniques.

CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers

10 February 2025·2569 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

CustomVideoX: Zero-shot personalized video generation, exceeding existing methods in quality & consistency via 3D reference attention and dynamic adaptation.

Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance

10 February 2025·1752 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tongyi Lab, Alibaba Group

Animate Anyone 2 creates high-fidelity character animations by incorporating environmental context, resulting in seamless character-environment integration and more realistic object interactions.

Dual Caption Preference Optimization for Diffusion Models

9 February 2025·4961 words·24 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Arizona State University

Dual Caption Preference Optimization (DCPO) significantly boosts diffusion model image quality by using paired captions to resolve data distribution conflicts and irrelevant prompt issues.

Goku: Flow Based Video Generative Foundation Models

7 February 2025·3430 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Hong Kong

Goku: a novel family of joint image-and-video generation models uses rectified flow Transformers, achieving industry-leading performance with a robust data pipeline and training infrastructure.

On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices

5 February 2025·3325 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Ulsan National Institute of Science and Technology

On-device Sora makes high-quality, diffusion-based text-to-video generation possible on smartphones, overcoming computational and memory limitations through novel techniques.

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

3 February 2025·2129 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance

OmniHuman-1: Scaling up one-stage conditioned human animation through novel mixed-condition training.

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

3 February 2025·2423 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Show Lab, National University of Singapore

LayerTracer innovatively synthesizes cognitive-aligned layered SVGs via diffusion transformers, bridging the gap between AI and professional design standards by learning from a novel dataset of sequen…

Inverse Bridge Matching Distillation

3 February 2025·4522 words·22 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Skolkovo Institute of Science and Technology

Boosting Diffusion Bridge Models: A new distillation technique accelerates inference speed by 4x to 100x, sometimes even improving image quality!

Improved Training Technique for Latent Consistency Models

3 February 2025·3409 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Rutgers University

Researchers significantly enhance latent consistency models’ performance by introducing Cauchy loss, mitigating outlier effects, and employing novel training strategies, thus bridging the gap with dif…

DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation

28 January 2025·3227 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University

DIFFSPLAT repurposes 2D image diffusion models to natively generate high-quality 3D Gaussian splats, overcoming limitations in existing 3D generation methods.

Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

23 January 2025·2900 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University

Researchers significantly enhanced autoregressive image generation by integrating chain-of-thought reasoning strategies, achieving a remarkable +24% improvement on the GenEval benchmark.

TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space

21 January 2025·4649 words·22 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Google DeepMind

TokenVerse: Extract & combine visual concepts from multiple images for creative image generation!

GPS as a Control Signal for Image Generation

21 January 2025·3156 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Michigan

GPS-guided image generation is here! This paper leverages GPS data to create highly realistic images reflecting specific locations, even reconstructing 3D models from 2D photos.

EMO2: End-Effector Guided Audio-Driven Avatar Video Generation

18 January 2025·2205 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Alibaba Group

EMO2 achieves realistic audio-driven avatar video generation by employing a two-stage framework: first generating hand poses directly from audio and then using a diffusion model to synthesize full-bod…