Skip to main content

Video Understanding

MTGS: A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction
·3398 words·16 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 Idiap Research Institute
MTGS: a unified framework jointly predicts gaze and social gaze (shared attention, mutual gaze) for multiple people in videos, achieving state-of-the-art results using a temporal transformer model and…
Moving Off-the-Grid: Scene-Grounded Video Representations
·2151 words·11 mins· loading · loading
Video Understanding 🏢 Google DeepMind
MooG: Self-supervised video model learns off-the-grid representations, enabling consistent scene element tracking even with motion; outperforming grid-based baselines on various vision tasks.
MotionCraft: Physics-Based Zero-Shot Video Generation
·2646 words·13 mins· loading · loading
Computer Vision Video Understanding 🏢 Politecnico Di Torino
MotionCraft: Physics-based zero-shot video generation creates realistic videos with complex motion dynamics by cleverly warping the noise latent space of an image diffusion model using optical flow fr…
Motion Graph Unleashed: A Novel Approach to Video Prediction
·2948 words·14 mins· loading · loading
Computer Vision Video Understanding 🏢 Microsoft
Motion Graph unleashes efficient and accurate video prediction by transforming video frames into interconnected graph nodes, capturing complex motion patterns with minimal computational cost.
Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
·2797 words·14 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 Microsoft
Boosting video diffusion: Motion Consistency Model (MCM) disentangles motion and appearance learning for high-fidelity, fast video generation using few sampling steps.
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
·2123 words·10 mins· loading · loading
Computer Vision Video Understanding 🏢 Tongji University
MoTE: A novel framework harmonizes generalization and specialization for visual-language video knowledge transfer, achieving state-of-the-art results.
MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
·2752 words·13 mins· loading · loading
Video Understanding 🏢 Shanghai Jiao Tong University
MECD: A new task and dataset unlocks multi-event causal discovery in videos, enabling a novel framework that outperforms existing models by efficiently identifying causal relationships between chronol…
MambaSCI: Efficient Mamba-UNet for Quad-Bayer Patterned Video Snapshot Compressive Imaging
·3150 words·15 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 Harbin Institute of Technology (Shenzhen)
MambaSCI: Efficient, novel deep learning model reconstructs high-quality quad-Bayer video from compressed snapshots, surpassing existing methods.
Long-Range Feedback Spiking Network Captures Dynamic and Static Representations of the Visual Cortex under Movie Stimuli
·2020 words·10 mins· loading · loading
Computer Vision Video Understanding 🏢 Peking University
Long-range feedback spiking network (LoRaFB-SNet) surpasses other models in capturing dynamic and static visual cortical representations under movie stimuli, advancing our understanding of visual syst…
Learning Truncated Causal History Model for Video Restoration
·2473 words·12 mins· loading · loading
Computer Vision Video Understanding 🏢 University of Alberta
TURTLE: a novel video restoration framework that learns a truncated causal history model for efficient and high-performing video restoration, achieving state-of-the-art results on various benchmark ta…
HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness
·4260 words·20 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 University of Texas at Austin
HOI-Swap: a novel diffusion model flawlessly swaps objects in videos while intelligently preserving natural hand interactions, producing high-quality edits.
GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching
·3808 words·18 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 School of Computer Science, National Engineering Research Center for Multimedia Software, and Institute of Artificial Intelligence, Wuhan University
GoMatching, a novel video text spotting baseline, enhances tracking efficiency while maintaining strong recognition by integrating long- and short-term matching via a Transformer-based module and a re…
GenRec: Unifying Video Generation and Recognition with Diffusion Models
·2342 words·11 mins· loading · loading
Computer Vision Video Understanding 🏢 Fudan University
GenRec: One diffusion model to rule both video generation and recognition!
Generalizable Implicit Motion Modeling for Video Frame Interpolation
·2114 words·10 mins· loading · loading
Computer Vision Video Understanding 🏢 Nanyang Technological University
Generalizable Implicit Motion Modeling (GIMM) revolutionizes video frame interpolation by accurately predicting optical flows at any timestep, surpassing existing methods and achieving state-of-the-ar…
FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention
·1815 words·9 mins· loading · loading
Computer Vision Video Understanding 🏢 Zhejiang University
FreeLong: Generate high-fidelity long videos without retraining using spectral blending of global and local video features!
FIFO-Diffusion: Generating Infinite Videos from Text without Training
·3112 words·15 mins· loading · loading
Computer Vision Video Understanding 🏢 Seoul National University
FIFO-Diffusion generates infinitely long, high-quality videos from text prompts using a pretrained model, solving the challenge of long video generation without retraining.
Fast and Memory-Efficient Video Diffusion Using Streamlined Inference
·3474 words·17 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 Northeastern University
Streamlined Inference, a novel training-free framework, dramatically reduces the computation and memory costs of video diffusion models without sacrificing quality, enabling high-resolution video gene…
FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological Sensing
·3096 words·15 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 Department of Computer Science, University College London
FactorizePhys leverages Non-negative Matrix Factorization for a novel multidimensional attention mechanism (FSAM) to improve remote PPG signal extraction from videos.
Extending Video Masked Autoencoders to 128 frames
·2466 words·12 mins· loading · loading
Computer Vision Video Understanding 🏢 Google Research
Long-video masked autoencoders (LVMAE) achieve state-of-the-art performance by using an adaptive masking strategy that prioritizes important video tokens, enabling efficient training on 128 frames.
Exocentric-to-Egocentric Video Generation
·2698 words·13 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 National University of Singapore
Exo2Ego-V generates realistic egocentric videos from sparse exocentric views, significantly outperforming state-of-the-art methods on a challenging benchmark.