Image Generation
MagicQuill: An Intelligent Interactive Image Editing System
·4923 words·24 mins
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 HKUST
MagicQuill: an intelligent interactive image editing system enabling intuitive, precise image edits via brushstrokes and real-time intent prediction by a multimodal LLM.
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
·3438 words·17 mins
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Waterloo
OmniEdit, a novel instruction-based image editing model, surpasses existing methods by leveraging specialist supervision and high-quality data, achieving superior performance across diverse editing ta…
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models
·3087 words·15 mins
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 NVIDIA Research
Edify Image: groundbreaking pixel-perfect photorealistic image generation using cascaded pixel-space diffusion models with a novel Laplacian diffusion process, enabling diverse applications including …
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
·3359 words·16 mins
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 NVIDIA Research
Add-it: Training-free object insertion in images using pretrained diffusion models by cleverly balancing information from the scene, text prompt, and generated image, achieving state-of-the-art result…
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images
·2454 words·12 mins
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tencent AI Lab
StdGEN: Generate high-quality, semantically decomposed 3D characters from a single image in minutes, enabling flexible customization for various applications.
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
·4041 words·19 mins
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 MIT
SVDQuant boosts 4-bit diffusion models by absorbing outliers via low-rank components, achieving 3.5x memory reduction and 3x speedup on 12B parameter models.
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
·3777 words·18 mins
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Toronto
SG-I2V: Zero-shot controllable image-to-video generation using a self-guided approach that leverages pre-trained models for precise object and camera motion control.
Training-free Regional Prompting for Diffusion Transformers
·1817 words·9 mins
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
Training-free Regional Prompting for FLUX boosts compositional text-to-image generation by cleverly manipulating attention mechanisms, achieving fine-grained control without retraining.
Randomized Autoregressive Visual Generation
·4145 words·20 mins
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 ByteDance
Randomized Autoregressive Modeling (RAR) sets a new state-of-the-art in image generation by cleverly introducing randomness during training to improve the model’s ability to learn from bidirectional c…
Constant Acceleration Flow
·3289 words·16 mins
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Korea University
Constant Acceleration Flow (CAF) dramatically speeds up diffusion model generation by using a constant acceleration equation, outperforming state-of-the-art methods with improved accuracy and few-step…
In-Context LoRA for Diffusion Transformers
·392 words·2 mins
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tongyi Lab
In-Context LoRA empowers existing text-to-image models for high-fidelity multi-image generation by simply concatenating images and using minimal task-specific LoRA tuning.
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models
·2152 words·11 mins
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
HelloMeme enhances text-to-image models by integrating spatial knitting attentions, enabling high-fidelity meme video generation while preserving model generalization.