Video Understanding
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
·3977 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Shanghai Artificial Intelligence Laboratory
VBench 2.0: A new benchmark suite advancing video generation evaluation with intrinsic faithfulness metrics.
Exploring the Evolution of Physics Cognition in Video Generation: A Survey
·3260 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Huazhong University of Science and Technology
This survey explores the evolution of physics cognition in video generation, addressing the gap between visual realism and physical accuracy.
Synthetic Video Enhances Physical Fidelity in Video Synthesis
·4236 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 ByteDance Seed
Synthetic data can enhance the physical realism of video synthesis, paving the way for more believable generated content.
Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
·4505 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Stanford University
Opt-CWM: Self-supervised motion learning via counterfactual optimization, achieving state-of-the-art without labels!
AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset
·2413 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Beihang University
AccVideo accelerates video diffusion by 8.5x with a synthetic dataset and trajectory-based distillation, maintaining quality and enabling higher resolution video generation.
Video-T1: Test-Time Scaling for Video Generation
·3231 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Tsinghua University
Video-T1 enhances video generation through test-time scaling, improving quality and consistency by viewing generation as a search for optimal video trajectories.
AMD-Hummingbird: Towards an Efficient Text-to-Video Model
·739 words·4 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Advanced Micro Devices, Inc.
Hummingbird: An efficient text-to-video model that balances quality and computational efficiency via pruning and visual feedback learning.
OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models
·2382 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 OriginAI, Tel-Aviv, Israel
OmnimatteZero: Real-time omnimatte using pre-trained video diffusion, no training needed!
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
·4361 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 King Abdullah University of Science and Technology
4D-Bench: The first benchmark for assessing MLLMs in 4D object understanding, revealing weak temporal understanding and the need for advancements.
Temporal Regularization Makes Your Video Generator Stronger
·3350 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Hong Kong University of Science and Technology
FluxFlow: Make your video generator stronger via temporal regularization!
Make Your Training Flexible: Towards Deployment-Efficient Video Models
·5609 words·27 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Shanghai Jiao Tong University
FluxViT: Flexible video models via adaptive token selection for efficient deployment!
MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation
·3052 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Peking University
MagicComp: Dual-Phase Refinement Enables Training-Free Compositional Video Generation
Impossible Videos
·4228 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 National University of Singapore
Impossible videos expose AI limits!
Concat-ID: Towards Universal Identity-Preserving Video Synthesis
·2138 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Gaoling School of AI, Renmin University of China
Concat-ID: A universal, scalable framework for identity-preserving video synthesis, balancing consistency and editability.
Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait
·2626 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 University of Liverpool
KDTalker: Accurate & efficient audio-driven talking portrait via implicit keypoint-based spatiotemporal diffusion, unlocking diverse & realistic animations.
MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization
·2743 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Zhejiang University
MagicID: ID-consistent & dynamic-preserved video customization via hybrid preference optimization.
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
·2617 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Zhejiang University
ReCamMaster: Re-shoots videos via generative rendering, controlling camera movement from a single source, for novel perspectives and enhanced video creation.
MTV-Inpaint: Multi-Task Long Video Inpainting
·3551 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 City University of Hong Kong
MTV-Inpaint: A unified framework for multi-task long video inpainting, enabling versatile object insertion, scene completion, editing, and removal.
Long Context Tuning for Video Generation
·2260 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 the Chinese University of Hong Kong
LCT: Fine-tunes single-shot video diffusion models for coherent multi-shot video generation without extra parameters!
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance
·1806 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 ByteDance Intelligent Creation
CINEMA: MLLM-guided coherent multi-subject video generation for consistent and controllable content creation.