Video Understanding

MTGS: A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction

26 September 2024·3398 words·16 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Idiap Research Institute

MTGS: a unified framework jointly predicts gaze and social gaze (shared attention, mutual gaze) for multiple people in videos, achieving state-of-the-art results using a temporal transformer model and…

Moving Off-the-Grid: Scene-Grounded Video Representations

26 September 2024·2151 words·11 mins· loading · loading

Video Understanding 🏢 Google DeepMind

MooG: Self-supervised video model learns off-the-grid representations, enabling consistent scene element tracking even with motion; outperforming grid-based baselines on various vision tasks.

MotionCraft: Physics-Based Zero-Shot Video Generation

26 September 2024·2646 words·13 mins· loading · loading

Computer Vision Video Understanding 🏢 Politecnico Di Torino

MotionCraft: Physics-based zero-shot video generation creates realistic videos with complex motion dynamics by cleverly warping the noise latent space of an image diffusion model using optical flow fr…

Motion Graph Unleashed: A Novel Approach to Video Prediction

26 September 2024·2948 words·14 mins· loading · loading

Computer Vision Video Understanding 🏢 Microsoft

Motion Graph unleashes efficient and accurate video prediction by transforming video frames into interconnected graph nodes, capturing complex motion patterns with minimal computational cost.

Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

26 September 2024·2797 words·14 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Microsoft

Boosting video diffusion: Motion Consistency Model (MCM) disentangles motion and appearance learning for high-fidelity, fast video generation using few sampling steps.

MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer

26 September 2024·2123 words·10 mins· loading · loading

Computer Vision Video Understanding 🏢 Tongji University

MoTE: A novel framework harmonizes generalization and specialization for visual-language video knowledge transfer, achieving state-of-the-art results.

MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning

26 September 2024·2752 words·13 mins· loading · loading

Video Understanding 🏢 Shanghai Jiao Tong University

MECD: A new task and dataset unlocks multi-event causal discovery in videos, enabling a novel framework that outperforms existing models by efficiently identifying causal relationships between chronol…

MambaSCI: Efficient Mamba-UNet for Quad-Bayer Patterned Video Snapshot Compressive Imaging

26 September 2024·3150 words·15 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Harbin Institute of Technology (Shenzhen)

MambaSCI: Efficient, novel deep learning model reconstructs high-quality quad-Bayer video from compressed snapshots, surpassing existing methods.

Long-Range Feedback Spiking Network Captures Dynamic and Static Representations of the Visual Cortex under Movie Stimuli

26 September 2024·2020 words·10 mins· loading · loading

Computer Vision Video Understanding 🏢 Peking University

Long-range feedback spiking network (LoRaFB-SNet) surpasses other models in capturing dynamic and static visual cortical representations under movie stimuli, advancing our understanding of visual syst…

Learning Truncated Causal History Model for Video Restoration

26 September 2024·2473 words·12 mins· loading · loading

Computer Vision Video Understanding 🏢 University of Alberta

TURTLE: a novel video restoration framework that learns a truncated causal history model for efficient and high-performing video restoration, achieving state-of-the-art results on various benchmark ta…

HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness

26 September 2024·4260 words·20 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 University of Texas at Austin

HOI-Swap: a novel diffusion model flawlessly swaps objects in videos while intelligently preserving natural hand interactions, producing high-quality edits.

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching

26 September 2024·3808 words·18 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 School of Computer Science, National Engineering Research Center for Multimedia Software, and Institute of Artificial Intelligence, Wuhan University

GoMatching, a novel video text spotting baseline, enhances tracking efficiency while maintaining strong recognition by integrating long- and short-term matching via a Transformer-based module and a re…

GenRec: Unifying Video Generation and Recognition with Diffusion Models

26 September 2024·2342 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Fudan University

GenRec: One diffusion model to rule both video generation and recognition!

Generalizable Implicit Motion Modeling for Video Frame Interpolation

26 September 2024·2114 words·10 mins· loading · loading

Computer Vision Video Understanding 🏢 Nanyang Technological University

Generalizable Implicit Motion Modeling (GIMM) revolutionizes video frame interpolation by accurately predicting optical flows at any timestep, surpassing existing methods and achieving state-of-the-ar…

FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention

26 September 2024·1815 words·9 mins· loading · loading

Computer Vision Video Understanding 🏢 Zhejiang University

FreeLong: Generate high-fidelity long videos without retraining using spectral blending of global and local video features!

FIFO-Diffusion: Generating Infinite Videos from Text without Training

26 September 2024·3112 words·15 mins· loading · loading

Computer Vision Video Understanding 🏢 Seoul National University

FIFO-Diffusion generates infinitely long, high-quality videos from text prompts using a pretrained model, solving the challenge of long video generation without retraining.

Fast and Memory-Efficient Video Diffusion Using Streamlined Inference

26 September 2024·3474 words·17 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Northeastern University

Streamlined Inference, a novel training-free framework, dramatically reduces the computation and memory costs of video diffusion models without sacrificing quality, enabling high-resolution video gene…

FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological Sensing

26 September 2024·3096 words·15 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Department of Computer Science, University College London

FactorizePhys leverages Non-negative Matrix Factorization for a novel multidimensional attention mechanism (FSAM) to improve remote PPG signal extraction from videos.

Extending Video Masked Autoencoders to 128 frames

26 September 2024·2466 words·12 mins· loading · loading

Computer Vision Video Understanding 🏢 Google Research

Long-video masked autoencoders (LVMAE) achieve state-of-the-art performance by using an adaptive masking strategy that prioritizes important video tokens, enabling efficient training on 128 frames.

Exocentric-to-Egocentric Video Generation

26 September 2024·2698 words·13 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 National University of Singapore

Exo2Ego-V generates realistic egocentric videos from sparse exocentric views, significantly outperforming state-of-the-art methods on a challenging benchmark.