Video Understanding

Video Token Merging for Long Video Understanding

26 September 2024·2290 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Korea University

Researchers boost long-form video understanding efficiency by 6.89x and reduce memory usage by 84% using a novel learnable video token merging algorithm.

Video Diffusion Models are Training-free Motion Interpreter and Controller

26 September 2024·2252 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Peking University

Training-free video motion control achieved via novel Motion Feature (MOFT) extraction from existing video diffusion models, offering architecture-agnostic insights and high performance.

VFIMamba: Video Frame Interpolation with State Space Models

26 September 2024·2179 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Tencent AI Lab

VFIMamba uses state-space models for efficient and dynamic video frame interpolation, achieving state-of-the-art results by introducing a novel Mixed-SSM Block and curriculum learning.

TrackIME: Enhanced Video Point Tracking via Instance Motion Estimation

26 September 2024·2140 words·11 mins· loading · loading

Video Understanding 🏢 KAIST

TrackIME enhances video point tracking by cleverly pruning the search space, resulting in improved accuracy and efficiency.

Towards Multi-Domain Learning for Generalizable Video Anomaly Detection

26 September 2024·2936 words·14 mins· loading · loading

Computer Vision Video Understanding 🏢 Kyung Hee University

Researchers propose Multi-Domain learning for Video Anomaly Detection (MDVAD) to create generalizable models handling conflicting abnormality criteria across diverse datasets, improving accuracy and a…

Temporally Consistent Atmospheric Turbulence Mitigation with Neural Representations

26 September 2024·1994 words·10 mins· loading · loading

Computer Vision Video Understanding 🏢 University of Maryland

ConVRT: A novel framework restores turbulence-distorted videos by decoupling spatial and temporal information in a neural representation, achieving temporally consistent mitigation.

TAPTRv2: Attention-based Position Update Improves Tracking Any Point

26 September 2024·1868 words·9 mins· loading · loading

Computer Vision Video Understanding 🏢 South China University of Technology

TAPTRv2 enhances point tracking by introducing an attention-based position update, eliminating cost-volume reliance for improved accuracy and efficiency.

SyncVIS: Synchronized Video Instance Segmentation

26 September 2024·2160 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 University of Hong Kong

SyncVIS: A new framework for video instance segmentation achieves state-of-the-art results by synchronously modeling video and frame-level information, overcoming limitations of asynchronous approache…

StreamFlow: Streamlined Multi-Frame Optical Flow Estimation for Video Sequences

26 September 2024·2803 words·14 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Peking University

StreamFlow accelerates video optical flow estimation by 44% via a streamlined in-batch multi-frame pipeline and innovative spatiotemporal modeling, achieving state-of-the-art results.

Splatter a Video: Video Gaussian Representation for Versatile Processing

26 September 2024·2610 words·13 mins· loading · loading

Computer Vision Video Understanding 🏢 University of Hong Kong

Researchers introduce Video Gaussian Representation (VGR) for versatile video processing, embedding videos into explicit 3D Gaussians for intuitive motion and appearance modeling.

Slot State Space Models

26 September 2024·2613 words·13 mins· loading · loading

Computer Vision Video Understanding 🏢 Rutgers University

SlotSSMs: a novel framework for modular sequence modeling, achieving significant performance gains by incorporating independent mechanisms and sparse interactions into State Space Models.

SF-V: Single Forward Video Generation Model

26 September 2024·1607 words·8 mins· loading · loading

Computer Vision Video Understanding 🏢 Snap Inc.

Researchers developed SF-V, a single-step image-to-video generation model, achieving a 23x speedup compared to existing models without sacrificing quality, paving the way for real-time video synthesis…

ReVideo: Remake a Video with Motion and Content Control

26 September 2024·2423 words·12 mins· loading · loading

Computer Vision Video Understanding 🏢 Peking University

ReVideo enables precise local video editing by independently controlling content and motion, overcoming limitations of existing methods and paving the way for advanced video manipulation.

OPEL: Optimal Transport Guided ProcedurE Learning

26 September 2024·2652 words·13 mins· loading · loading

Computer Vision Video Understanding 🏢 Purdue University

OPEL: a novel optimal transport framework for procedure learning, significantly outperforms SOTA methods by aligning similar video frames and relaxing strict temporal assumptions.

OnlineTAS: An Online Baseline for Temporal Action Segmentation

26 September 2024·2736 words·13 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 National University of Singapore

OnlineTAS, a novel framework, achieves state-of-the-art performance in online temporal action segmentation by using an adaptive memory and a post-processing method to mitigate over-segmentation.

On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection

26 September 2024·2133 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Shanghai Jiao Tong University

MM-Det, a novel algorithm, uses multimodal learning and spatiotemporal attention to detect diffusion-generated videos, achieving state-of-the-art performance on the new DVF dataset.

NVRC: Neural Video Representation Compression

26 September 2024·1996 words·10 mins· loading · loading

Computer Vision Video Understanding 🏢 Visual Information Lab, University of Bristol, UK

NVRC: A novel end-to-end neural video codec achieves 23% coding gain over VVC VTM by optimizing representation compression.

NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction

26 September 2024·2374 words·12 mins· loading · loading

Video Understanding 🏢 Tongji University

NeuroClips: groundbreaking fMRI-to-video reconstruction, achieving high-fidelity smooth video up to 6s at 8FPS by decoding both high-level semantics and low-level perception flows.

NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

26 September 2024·2217 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 National Yang Ming Chiao Tung University

NaRCan: High-quality video editing via diffusion priors and hybrid deformation fields.

Multi-view Masked Contrastive Representation Learning for Endoscopic Video Analysis

26 September 2024·2187 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Xiangtan University

Multi-view Masked Contrastive Representation Learning (M²CRL) significantly boosts endoscopic video analysis by using a novel multi-view masking strategy and contrastive learning, achieving state-of-t…