Skip to main content

Computer Vision

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
·3702 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University
WISE: Evaluates world knowledge in text-to-image generation.
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
·3772 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance
Seedream 2.0: A native Chinese-English bilingual image generation model that understands cultural nuances and excels in text rendering.
RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories
·2040 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance Inc.
RayFlow: Accelerating diffusion with instance-aware adaptive flow, boosting speed & quality!
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
·4256 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Samsung Research
PLADIS: Sparsity boosts attention for diffusion models, enhancing text-to-image generation at inference time!
PE3R: Perception-Efficient 3D Reconstruction
·2061 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 National University of Singapore
PE3R: Achieves fast and accurate 3D scene reconstruction from 2D images by enhanced perception and efficiency.
Effective and Efficient Masked Image Generation Models
·4167 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Renmin University of China
eMIGM: A unified, efficient masked image generation model achieving state-of-the-art performance with fewer resources.
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer
·2653 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tiamat AI
EasyControl: Efficient & flexible control for Diffusion Transformers, enabling sophisticated image generation.
DreamRelation: Relation-Centric Video Customization
·2731 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Fudan University
DreamRelation: Personalize videos by customizing relationships between subjects, generalizing to new domains.
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
·2686 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 CUHK
Seg-Zero: Cognitive Reinforcement for Reasoning-Chain Guided Segmentation!
Learning Few-Step Diffusion Models by Trajectory Distribution Matching
·4283 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology
TDM: a new diffusion distillation paradigm unifying trajectory distillation and distribution matching, surpassing teachers in a data-free manner with state-of-the-art performance and low training cost…
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation
·4887 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 IEIT System Co., Ltd.
DropletVideo: A dataset and approach to explore integral spatio-temporal consistent video generation.
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control
·3223 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Chinese University of Hong Kong
VideoPainter: Edit any video, any length, with user-guided instructions!
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
·2590 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Chinese University of Hong Kong
TrajectoryCrafter: Precisely control camera movement in monocular videos with a novel diffusion model for coherent 4D content generation.
MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice
·1539 words·8 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hedra Inc.
MagicInfinite: Infinite talking videos from words and voice!
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
·4656 words·22 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Yonsei University
AnyAnomaly: LVLM for customizable zero-shot video anomaly detection, adapting to diverse environments without retraining.
ProReflow: Progressive Reflow with Decomposed Velocity
·1902 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University
ProReflow: Improves diffusion model efficiency via progressive training and direction-focused velocity alignment.
RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification
·2593 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology
RectifiedHR: Enables training-free high-resolution image generation via energy rectification, boosting both efficiency and effectiveness.
Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content
·3985 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai Jiao Tong University
Q-Eval-100K: A new, large dataset for evaluating visual quality and text alignment in AI-generated content.
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
·2689 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 HKUST(GZ)
Kiss3DGen generates 3D assets by repurposing 2D diffusion models, enabling efficient 3D editing and enhancement.
Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator
·2905 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 NVIDIA Research
Likelihood-based generative models get a GAN-like boost via a new Direct Discriminative Optimization, ditching the joint training complexity.