Skip to main content

Image Generation

Tokenize Image as a Set
·3037 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Science and Technology of China
TokenSet: Tokenizing images as unordered sets for dynamic capacity allocation and robust generation, breaking from fixed-position latent codes.
Scale-wise Distillation of Diffusion Models
·3863 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Yandex Research
SWD: Scale-wise distillation of diffusion models achieves faster image generation by upscaling resolution during denoising, outperforming counterparts with similar computation.
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
·4169 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Fudan University
MagicMotion: A controllable video generation framework enabling precise object motion control through dense-to-sparse trajectory guidance.
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
·1572 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance Intelligent Creation
InfU: A new framework for flexible photo re-creation while preserving identity using Diffusion Transformers(DiTs).
Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction
·2606 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 National University of Singapore
Coarse-to-Fine Token Prediction improves autoregressive image generation by assigning the same coarse label for similar tokens, balancing generation quality and computational efficiency.
Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts
·4277 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance Seed
Expert Race: A flexible routing strategy for scaling diffusion transformer with mixture of experts.
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
·3405 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Hong Kong
TokenBridge bridges continuous and discrete tokens for autoregressive visual generation, achieving high-quality synthesis with simple autoregressive modeling.
Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation
·2233 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai Artificial Intelligence Laboratory
FakeVLM: A multimodal model & artifact-annotated dataset for detecting synthetic images with interpretable explanations, setting a new benchmark.
LEGION: Learning to Ground and Explain for Synthetic Image Detection
·3727 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai Jiao Tong University
LEGION: Grounding and explaining synthetic image detection and refinement via multimodal learning.
Efficient Personalization of Quantized Diffusion Model without Backpropagation
·6238 words·30 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Seoul National University
Personalize diffusion models efficiently on devices without backpropagation.
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
·365 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University
DiffMoE: Dynamically selects tokens for scalable diffusion transformers, unlocking new efficiency levels in image generation.
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
·4257 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 NVIDIA
Cosmos-Transfer1: An adaptable conditional world generation model using multimodal control.
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation
·2806 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology
Rewards Are Enough!
Edit Transfer: Learning Image Editing via Vision In-Context Relations
·3168 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Communication University of China
Edit Transfer: Learns image edits from a single example and applies it to new images, surpassing text/reference-based methods!
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
·3109 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Zhejiang University
DreamRenderer: Taming attribute control in large-scale text-to-image models with a plug-and-play, training-free approach for enhanced content creation.
BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing
·2181 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University
BlobCtrl: Precisely edit images at the element level with a unified, flexible framework, bridging the gap between generation and editing.
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
·3299 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 UCLA
Reflect-DiT: Scaling Text-to-Image Diffusion Transformers via In-Context Reflection!
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
·2532 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 CUHK MMLab
GoT: Reasoning guides vivid image generation and editing!
Distilling Diversity and Control in Diffusion Models
·4046 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Northeastern University
Distilling diffusion models?💡 This paper shows you how to retain base model diversity while keeping the distilled model’s speed!
CoSTA$st$: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing
·5298 words·25 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Maryland, College Park
COSTA*: A cost-effective agent that smartly navigates AI tools to edit images with high quality and low cost, balancing user preferences!