Skip to main content

Image Generation

ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
·2259 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 KAIST
ORIGEN: First zero-shot 3D orientation grounding in text-to-image generation.
Optimal Stepsize for Diffusion Sampling
·3204 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University Chinese Academic of Science
Optimal Stepsize Distillation accelerates diffusion sampling by distilling knowledge from reference trajectories, achieving 10x speedup with minimal performance loss.
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
·3416 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai AI Laboratory
Lumina-Image 2.0: A unified & efficient image generative framework, outperforming previous models with only 2.6B parameters.
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing
·2412 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Waterloo
LOCATEdit refines cross-attention maps with graph Laplacian regularization, achieving precise & localized text-guided image editing without artifacts.
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
·2431 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai AI Laboratory
LeX-Art: High-quality text-to-image generation via scalable data synthesis.
ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model
·1950 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Alibaba Group
ChatAnyone: Stylized real-time portrait video generation with hierarchical motion diffusion model.
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
·393 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 KAIST
Fixing fine-tuned diffusion models! By using richer, unconditional priors, they generate better images and videos.
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
·10790 words·51 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University
BIZGEN: Article-level Visual Text Rendering for Infographics Generation
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models
·2885 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Central South University
LongTextAR advances long-text image generation via a novel tokenizer, enabling accurate, controllable, and high-fidelity text rendering in images.
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
·2020 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 KAIST
Inference-time scaling for flow models enhances alignment with user preferences via stochastic generation and budget allocation.
GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers
·3412 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ARC Lab, Tencent PCG
Visually perfect generations aren’t always optimal! GenHancer finds that subtly imperfect generations can greatly improve vision-centric tasks.
Training-free Diffusion Acceleration with Bottleneck Sampling
·3305 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University
Bottleneck Sampling: Accelerate diffusion models without retraining by cleverly using low-resolution priors for efficient inference!
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
·1777 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Yonsei University
LSRNA: Super-resolution in latent space enhances image generation with diffusion models, achieving faster speeds and improved detail.
Equivariant Image Modeling
·3413 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Science and Technology of China
Aligning image generation subtasks: Equivariant modeling boosts efficiency and generalization by leveraging natural visual signal invariance.
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
·3661 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Beihang University
Diffusion-4K: Synthesizing ultra-high-resolution images with a new benchmark dataset and wavelet-based fine-tuning that makes 4K image creation more detailed and accessible!
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models
·3380 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 S-Lab, Nanyang Technological University
CFG-Zero*: A better Classifier-Free Guidance to improve the image quality and text alignment in Flow Matching models.
RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation
·3176 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Pattern Recognition Center, WeChat AI, Tencent
RDTF: Efficient animated sticker generation via dual-mask training, outperforming parameter-efficient tuning under constrained resources.
When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO
·1831 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Fudan University
Adaptive Diffusion Models with Minority-Aware Adaptive DPO
Ultra-Resolution Adaptation with Ease
·2457 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 National University of Singapore
URA: Ultra-resolution adaptation made easy! Uses synthetic data & minor weight tuning for efficient, high-res text-to-image diffusion models.
Tokenize Image as a Set
·3037 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Science and Technology of China
TokenSet: Tokenizing images as unordered sets for dynamic capacity allocation and robust generation, breaking from fixed-position latent codes.