Computer Vision

LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

6 December 2024·276 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Fudan University

LiFT leverages human feedback, including reasoning, to effectively align text-to-video models with human preferences, significantly improving video quality.

ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality

5 December 2024·2050 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Zhejiang University

ZipAR accelerates autoregressive image generation by up to 91% through parallel decoding leveraging spatial locality in images, making high-resolution image generation significantly faster.

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

5 December 2024·3379 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 VinAI Research

SwiftEdit achieves lightning-fast, high-quality text-guided image editing in just 0.23 seconds via a novel one-step diffusion process.

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

5 December 2024·5538 words·26 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance

Infinity, a novel bitwise autoregressive model, sets new records in high-resolution image synthesis, outperforming top diffusion models in speed and quality.

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

5 December 2024·7357 words·35 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University

HumanEdit: A new human-rewarded dataset revolutionizes instruction-based image editing by providing high-quality, diverse image pairs with detailed instructions, enabling precise model evaluation and …

Hidden in the Noise: Two-Stage Robust Watermarking for Images

5 December 2024·3984 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 New York University

WIND: A novel, distortion-free image watermarking method leveraging diffusion models’ initial noise for robust AI-generated content authentication.

AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

5 December 2024·3014 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance

AnyDressing: Customizable multi-garment virtual dressing via a novel latent diffusion model!

NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images

4 December 2024·3265 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tencent AI Lab

NVComposer: A novel generative NVS model boosts synthesis quality by implicitly inferring spatial relationships from multiple sparse, unposed images, eliminating reliance on external alignment.

MV-Adapter: Multi-view Consistent Image Generation Made Easy

4 December 2024·3888 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 School of Software, Beihang University

MV-Adapter easily transforms existing image generators into multi-view consistent image generators, improving efficiency and adaptability.

MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities

4 December 2024·3868 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 School of Artificial Intelligence, Shanghai Jiao Tong University

MRGen, a novel diffusion-based data engine, controllably synthesizes MRI data for unannotated modalities, boosting segmentation model performance.

Mimir: Improving Video Diffusion Models for Precise Text Understanding

4 December 2024·3398 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Ant Group

Mimir: A novel framework harmonizes LLMs and video diffusion models for precise text understanding in video generation, producing high-quality videos with superior text comprehension.

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

4 December 2024·2260 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University

MIDI: a novel multi-instance diffusion model generates compositional 3D scenes from single images by simultaneously creating multiple 3D instances with accurate spatial relationships and high generali…

Imagine360: Immersive 360 Video Generation from Perspective Anchor

4 December 2024·2648 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Chinese University of Hong Kong

Imagine360: Generating immersive 360° videos from perspective videos, improving quality and accessibility of 360° content creation.

Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion

4 December 2024·4118 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University

ScoreLiDAR: Distilling diffusion models for 5x faster, higher-quality 3D LiDAR scene completion!

CleanDIFT: Diffusion Features without Noise

4 December 2024·3337 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 CompVis @ LMU Munich, MCML

CleanDIFT revolutionizes diffusion feature extraction by leveraging clean images and a lightweight fine-tuning method, significantly boosting performance across various tasks without noise or timestep…

2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction

4 December 2024·2645 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tsinghua University

2DGS-Room: Seed-guided 2D Gaussian splatting with geometric constraints achieves state-of-the-art high-fidelity indoor scene reconstruction.

VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation

3 December 2024·2511 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hong Kong University of Science and Technology

VideoGen-of-Thought (VGoT) creates high-quality, multi-shot videos by collaboratively generating scripts, keyframes, and video clips, ensuring narrative consistency and visual coherence.

SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

3 December 2024·4159 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 VinAI Research

SNOOPI supercharges one-step diffusion model distillation with enhanced guidance, achieving state-of-the-art performance by stabilizing training and enabling negative prompt control.

Scaling Image Tokenizers with Grouped Spherical Quantization

3 December 2024·7140 words·34 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Jülich Supercomputing Centre

GSQ-GAN, a novel image tokenizer, achieves superior reconstruction quality with 16x downsampling using grouped spherical quantization, enabling efficient scaling for high-fidelity image generation.

OmniCreator: Self-Supervised Unified Generation with Universal Editing

3 December 2024·5399 words·26 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hong Kong University of Science and Technology

OmniCreator: Self-supervised unified image+video generation & universal editing.