Image Generation
UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency
·3351 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 ETH Zurich
UIP2P: Unsupervised instruction-based image editing achieves high-fidelity edits by enforcing Cycle Edit Consistency, eliminating the need for ground-truth data.
Parallelized Autoregressive Visual Generation
·4274 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
Boosting autoregressive visual generation speed by 3.6-9.5x, this research introduces parallel processing while preserving model simplicity and generation quality.
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
·3907 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Harvard University
Affordance-Aware Object Insertion uses a novel Mask-Aware Dual Diffusion model & SAM-FB dataset to realistically place objects in scenes, considering contextual relationships.
FashionComposer: Compositional Fashion Image Generation
·2265 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Hong Kong
FashionComposer revolutionizes fashion image creation through flexible composition of garments, faces, and poses.
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers
·1458 words·7 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tongyi Lab
ChatDiT enables zero-shot, multi-turn image generation using pretrained diffusion transformers and a novel multi-agent framework.
ColorFlow: Retrieval-Augmented Image Sequence Colorization
·2655 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tsinghua University
ColorFlow, a new AI model, accurately colorizes black-and-white image sequences while preserving character identity.
Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images
·2021 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of British Columbia
New attack fools breast ultrasound AI using subtle text prompts.
BrushEdit: All-In-One Image Inpainting and Editing
·3281 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
BrushEdit revolutionizes interactive image editing with instructions & inpainting.
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
·2785 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 ETH Zurich
LoRACLR merges multiple LoRA models for high-fidelity multi-concept image generation, using a contrastive objective to ensure concept distinctiveness and prevent interference.
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
·2401 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Nanyang Technological University
FreeScale generates stunning 8K images and high-fidelity videos without retraining.
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers
·2812 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Virginia Tech
Edit images precisely with AI, no masks needed!
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
·3185 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 CUHK MMLab
EasyRef uses multimodal LLMs to generate images from multiple references, overcoming limitations of prior methods by capturing consistent visual elements and offering improved zero-shot generalization…
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
·3252 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
DisPose disentangles pose guidance for controllable human image animation, generating diverse animations while preserving appearance consistency using only sparse skeleton pose input, eliminating the …
Arbitrary-steps Image Super-resolution via Diffusion Inversion
·3889 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Nanyang Technological University
InvSR: a novel image super-resolution technique using diffusion inversion, enabling flexible sampling steps for efficient and high-fidelity results.
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
·3117 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 University of Hong Kong
UniReal: a universal framework for image generation and editing, unifying diverse tasks via learning real-world dynamics from video data, achieving highly realistic and versatile results.
STIV: Scalable Text and Image Conditioned Video Generation
·5285 words·25 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Apple
STIV: A novel, scalable method for text and image-conditioned video generation, systematically improving model architectures, training, and data curation for superior performance.
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models
·3186 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Stanford University
FiVA dataset and its adaptation framework enable unprecedented fine-grained control over visual attributes in text-to-image generation, empowering users to craft highly customized images.
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing
·3317 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Institute of Automation, Chinese Academy of Sciences
FireFlow makes editing images faster and better.
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
·4676 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Shanghai Artificial Intelligence Laboratory
Introducing Evaluation Agent, a faster, more flexible human-like framework for evaluating visual generative AI.
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
·3918 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Peking University
DiffSensei: A new framework generates customized manga with dynamic multi-character control using multi-modal LLMs and diffusion models, outperforming existing methods.