Image Generation

Tokenize Image as a Set

20 March 2025·3037 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Science and Technology of China

TokenSet: Tokenizing images as unordered sets for dynamic capacity allocation and robust generation, breaking from fixed-position latent codes.

Scale-wise Distillation of Diffusion Models

20 March 2025·3863 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Yandex Research

SWD: Scale-wise distillation of diffusion models achieves faster image generation by upscaling resolution during denoising, outperforming counterparts with similar computation.

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

20 March 2025·4169 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Fudan University

MagicMotion: A controllable video generation framework enabling precise object motion control through dense-to-sparse trajectory guidance.

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

20 March 2025·1572 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance Intelligent Creation

InfU: A new framework for flexible photo re-creation while preserving identity using Diffusion Transformers(DiTs).

Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction

20 March 2025·2606 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 National University of Singapore

Coarse-to-Fine Token Prediction improves autoregressive image generation by assigning the same coarse label for similar tokens, balancing generation quality and computational efficiency.

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

20 March 2025·4277 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance Seed

Expert Race: A flexible routing strategy for scaling diffusion transformer with mixture of experts.

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

20 March 2025·3405 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Hong Kong

TokenBridge bridges continuous and discrete tokens for autoregressive visual generation, achieving high-quality synthesis with simple autoregressive modeling.

Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

19 March 2025·2233 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai Artificial Intelligence Laboratory

FakeVLM: A multimodal model & artifact-annotated dataset for detecting synthetic images with interpretable explanations, setting a new benchmark.

LEGION: Learning to Ground and Explain for Synthetic Image Detection

19 March 2025·3727 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai Jiao Tong University

LEGION: Grounding and explaining synthetic image detection and refinement via multimodal learning.

Efficient Personalization of Quantized Diffusion Model without Backpropagation

19 March 2025·6238 words·30 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Seoul National University

Personalize diffusion models efficiently on devices without backpropagation.

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

18 March 2025·365 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University

DiffMoE: Dynamically selects tokens for scalable diffusion transformers, unlocking new efficiency levels in image generation.

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

18 March 2025·4257 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 NVIDIA

Cosmos-Transfer1: An adaptable conditional world generation model using multimodal control.

Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation

17 March 2025·2806 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

Rewards Are Enough!

Edit Transfer: Learning Image Editing via Vision In-Context Relations

17 March 2025·3168 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Communication University of China

Edit Transfer: Learns image edits from a single example and applies it to new images, surpassing text/reference-based methods!

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

17 March 2025·3109 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Zhejiang University

DreamRenderer: Taming attribute control in large-scale text-to-image models with a plug-and-play, training-free approach for enhanced content creation.

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

17 March 2025·2181 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University

BlobCtrl: Precisely edit images at the element level with a unified, flexible framework, bridging the gap between generation and editing.

Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection

15 March 2025·3299 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 UCLA

Reflect-DiT: Scaling Text-to-Image Diffusion Transformers via In-Context Reflection!

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

13 March 2025·2532 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 CUHK MMLab

GoT: Reasoning guides vivid image generation and editing!

Distilling Diversity and Control in Diffusion Models

13 March 2025·4046 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Northeastern University

Distilling diffusion models?💡 This paper shows you how to retain base model diversity while keeping the distilled model’s speed!

CoSTA$st$: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

13 March 2025·5298 words·25 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Maryland, College Park

COSTA*: A cost-effective agent that smartly navigates AI tools to edit images with high quality and low cost, balancing user preferences!