Image Generation
RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers
·2754 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Science and Technology of China
RelaCtrl: Relevance-guided control boosts diffusion transformer efficiency, cutting parameters by intelligently allocating resources.
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening
·2525 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
Diffusion-Sharpening enhances diffusion model fine-tuning by optimizing sampling trajectories, achieving faster convergence and high inference efficiency without extra NFEs, leading to improved alignm…
VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation
·3389 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Fudan University
VidCRAFT3 enables high-quality image-to-video generation with precise control over camera movement, object motion, and lighting, pushing the boundaries of visual content creation.
MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers
·2884 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 School of Artificial Intelligence, University of Chinese Academy of Sciences
MRS: a novel, training-free sampler, drastically speeds up controllable image generation using Mean Reverting Diffusion, achieving 10-20x speedup across various tasks.
Magic 1-For-1: Generating One Minute Video Clips within One Minute
·1947 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
Magic141 generates one-minute video clips in under a minute by cleverly factorizing the generation task and employing optimization techniques.
CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers
·2569 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Hong Kong University of Science and Technology
CustomVideoX: Zero-shot personalized video generation, exceeding existing methods in quality & consistency via 3D reference attention and dynamic adaptation.
Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance
·1752 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tongyi Lab, Alibaba Group
Animate Anyone 2 creates high-fidelity character animations by incorporating environmental context, resulting in seamless character-environment integration and more realistic object interactions.
Dual Caption Preference Optimization for Diffusion Models
·4961 words·24 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Arizona State University
Dual Caption Preference Optimization (DCPO) significantly boosts diffusion model image quality by using paired captions to resolve data distribution conflicts and irrelevant prompt issues.
Goku: Flow Based Video Generative Foundation Models
·3430 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Hong Kong
Goku: a novel family of joint image-and-video generation models uses rectified flow Transformers, achieving industry-leading performance with a robust data pipeline and training infrastructure.
On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices
·3325 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Ulsan National Institute of Science and Technology
On-device Sora makes high-quality, diffusion-based text-to-video generation possible on smartphones, overcoming computational and memory limitations through novel techniques.
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
·2129 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 ByteDance
OmniHuman-1: Scaling up one-stage conditioned human animation through novel mixed-condition training.
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer
·2423 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Show Lab, National University of Singapore
LayerTracer innovatively synthesizes cognitive-aligned layered SVGs via diffusion transformers, bridging the gap between AI and professional design standards by learning from a novel dataset of sequen…
Inverse Bridge Matching Distillation
·4522 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Skolkovo Institute of Science and Technology
Boosting Diffusion Bridge Models: A new distillation technique accelerates inference speed by 4x to 100x, sometimes even improving image quality!
Improved Training Technique for Latent Consistency Models
·3409 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Rutgers University
Researchers significantly enhance latent consistency models’ performance by introducing Cauchy loss, mitigating outlier effects, and employing novel training strategies, thus bridging the gap with dif…
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation
·3227 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
DIFFSPLAT repurposes 2D image diffusion models to natively generate high-quality 3D Gaussian splats, overcoming limitations in existing 3D generation methods.
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step
·2900 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
Researchers significantly enhanced autoregressive image generation by integrating chain-of-thought reasoning strategies, achieving a remarkable +24% improvement on the GenEval benchmark.
TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space
·4649 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Google DeepMind
TokenVerse: Extract & combine visual concepts from multiple images for creative image generation!
GPS as a Control Signal for Image Generation
·3156 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Michigan
GPS-guided image generation is here! This paper leverages GPS data to create highly realistic images reflecting specific locations, even reconstructing 3D models from 2D photos.
EMO2: End-Effector Guided Audio-Driven Avatar Video Generation
·2205 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Alibaba Group
EMO2 achieves realistic audio-driven avatar video generation by employing a two-stage framework: first generating hand poses directly from audio and then using a diffusion model to synthesize full-bod…
X-Dyna: Expressive Dynamic Human Image Animation
·3011 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Southern California
X-Dyna: a novel diffusion-based pipeline generates realistic human image animation using a zero-shot approach by integrating a Dynamics-Adapter for dynamic detail preservation, exceeding state-of-the-…