Image Generation

X-Dyna: Expressive Dynamic Human Image Animation

17 January 2025·3011 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Southern California

X-Dyna: a novel diffusion-based pipeline generates realistic human image animation using a zero-shot approach by integrating a Dynamics-Adapter for dynamic detail preservation, exceeding state-of-the-…

Textoon: Generating Vivid 2D Cartoon Characters from Text Descriptions

17 January 2025·2057 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Alibaba Group

Textoon: Generating vivid 2D cartoon characters from text descriptions in under a minute, revolutionizing animation workflow.

SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces

16 January 2025·2347 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Yale University

SynthLight: A novel diffusion model relights portraits realistically by learning to re-render synthetic faces, generalizing remarkably well to real photographs.

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

16 January 2025·4248 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Meta

Scaling visual tokenizers dramatically improves image and video generation, achieving state-of-the-art results and outperforming existing methods with fewer computations by focusing on decoder scaling…

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

16 January 2025·5585 words·27 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 NYU

Boosting diffusion model performance at inference time, this research introduces a novel framework that goes beyond simply increasing denoising steps. By cleverly searching for better noise candidates…

AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation

16 January 2025·2125 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Alibaba Tongyi Lab

AnyStory: A unified framework enables high-fidelity personalized image generation for single and multiple subjects, addressing subject fidelity challenges in existing methods.

The GAN is dead; long live the GAN! A Modern GAN Baseline

9 January 2025·2531 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Brown University

R3GAN: A modernized GAN baseline achieves state-of-the-art results with a simple, stable loss function and modern architecture, debunking the myth that GANs are hard to train.

On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis

8 January 2025·285 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University

This paper unveils critical thresholds for efficient visual autoregressive model computation, proving sub-quartic time is impossible beyond a certain input matrix norm while establishing efficient app…

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation

6 January 2025·3304 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Meta

Through-The-Mask uses mask-based motion trajectories to generate realistic videos from images and text, overcoming limitations of existing methods in handling complex multi-object motion.

MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control

4 January 2025·3209 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Center for Machine Vision and Signal Analysis, Faculty of Information Technology and Electrical Engineering, University of Oulu

MagicFace achieves high-fidelity facial expression editing via AU control, preserving identity and background using a diffusion model and ID encoder, significantly outperforming existing methods.

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

2 January 2025·3436 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Huazhong University of Science and Technology

LightningDiT resolves the optimization dilemma in latent diffusion models by aligning latent space with pre-trained vision models, achieving state-of-the-art ImageNet 256x256 generation with over 21x …

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

30 December 2024·8988 words·43 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University

VisionReward, a novel reward model, surpasses existing methods by precisely capturing multi-dimensional human preferences for image and video generation, enabling more accurate and stable model optimi…

Edicho: Consistent Image Editing in the Wild

30 December 2024·2565 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

Edicho: a novel training-free method for consistent image editing across diverse images, achieving precise consistency by leveraging explicit correspondence.

Bringing Objects to Life: 4D generation from 3D objects

29 December 2024·2761 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Bar-Ilan University

3to4D: Animate any 3D object with text prompts, preserving visual quality and achieving realistic motion!

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models

27 December 2024·4442 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tencent AI Lab

VideoMaker achieves high-fidelity zero-shot customized video generation by cleverly harnessing the inherent power of video diffusion models, eliminating the need for extra feature extraction and injec…

Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching

22 December 2024·3841 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University

Distilled Decoding (DD) drastically speeds up image generation from autoregressive models by using flow matching to enable one-step sampling, achieving significant speedups while maintaining acceptabl…

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

20 December 2024·4398 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 National University of Singapore

CLEAR: Conv-Like Linearization boosts pre-trained Diffusion Transformers, achieving 6.3x faster 8K image generation with minimal quality loss.

UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency

19 December 2024·3351 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ETH Zurich

UIP2P: Unsupervised instruction-based image editing achieves high-fidelity edits by enforcing Cycle Edit Consistency, eliminating the need for ground-truth data.

Parallelized Autoregressive Visual Generation

19 December 2024·4274 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University

Boosting autoregressive visual generation speed by 3.6-9.5x, this research introduces parallel processing while preserving model simplicity and generation quality.

Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion

19 December 2024·3907 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Harvard University

Affordance-Aware Object Insertion uses a novel Mask-Aware Dual Diffusion model & SAM-FB dataset to realistically place objects in scenes, considering contextual relationships.