Computer Vision

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation

10 March 2025·3702 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University

WISE: Evaluates world knowledge in text-to-image generation.

Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model

10 March 2025·3772 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance

Seedream 2.0: A native Chinese-English bilingual image generation model that understands cultural nuances and excels in text rendering.

RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories

10 March 2025·2040 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance Inc.

RayFlow: Accelerating diffusion with instance-aware adaptive flow, boosting speed & quality!

PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity

10 March 2025·4256 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Samsung Research

PLADIS: Sparsity boosts attention for diffusion models, enhancing text-to-image generation at inference time!

PE3R: Perception-Efficient 3D Reconstruction

10 March 2025·2061 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 National University of Singapore

PE3R: Achieves fast and accurate 3D scene reconstruction from 2D images by enhanced perception and efficiency.

Effective and Efficient Masked Image Generation Models

10 March 2025·4167 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Renmin University of China

eMIGM: A unified, efficient masked image generation model achieving state-of-the-art performance with fewer resources.

EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

10 March 2025·2653 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tiamat AI

EasyControl: Efficient & flexible control for Diffusion Transformers, enabling sophisticated image generation.

DreamRelation: Relation-Centric Video Customization

10 March 2025·2731 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Fudan University

DreamRelation: Personalize videos by customizing relationships between subjects, generalizing to new domains.

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

9 March 2025·2686 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 CUHK

Seg-Zero: Cognitive Reinforcement for Reasoning-Chain Guided Segmentation!

Learning Few-Step Diffusion Models by Trajectory Distribution Matching

9 March 2025·4283 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

TDM: a new diffusion distillation paradigm unifying trajectory distillation and distribution matching, surpassing teachers in a data-free manner with state-of-the-art performance and low training cost…

DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation

8 March 2025·4887 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 IEIT System Co., Ltd.

DropletVideo: A dataset and approach to explore integral spatio-temporal consistent video generation.

VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control

7 March 2025·3223 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Chinese University of Hong Kong

VideoPainter: Edit any video, any length, with user-guided instructions!

TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models

7 March 2025·2590 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Chinese University of Hong Kong

TrajectoryCrafter: Precisely control camera movement in monocular videos with a novel diffusion model for coherent 4D content generation.

MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice

7 March 2025·1539 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hedra Inc.

MagicInfinite: Infinite talking videos from words and voice!

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM

6 March 2025·4656 words·22 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Yonsei University

AnyAnomaly: LVLM for customizable zero-shot video anomaly detection, adapting to diverse environments without retraining.

ProReflow: Progressive Reflow with Decomposed Velocity

5 March 2025·1902 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University

ProReflow: Improves diffusion model efficiency via progressive training and direction-focused velocity alignment.

RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification

4 March 2025·2593 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

RectifiedHR: Enables training-free high-resolution image generation via energy rectification, boosting both efficiency and effectiveness.

Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content

4 March 2025·3985 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai Jiao Tong University

Q-Eval-100K: A new, large dataset for evaluating visual quality and text alignment in AI-generated content.

Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation

3 March 2025·2689 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 HKUST(GZ)

Kiss3DGen generates 3D assets by repurposing 2D diffusion models, enabling efficient 3D editing and enhancement.

Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator

3 March 2025·2905 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 NVIDIA Research

Likelihood-based generative models get a GAN-like boost via a new Direct Discriminative Optimization, ditching the joint training complexity.