Video Understanding

End-to-End Video Semantic Segmentation in Adverse Weather using Fusion Blocks and Temporal-Spatial Teacher-Student Learning

26 September 2024·2581 words·13 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 National University of Singapore

Optical-flow-free video semantic segmentation excels in adverse weather by merging adjacent frame information via a fusion block and a novel temporal-spatial teacher-student learning strategy.

Efficient Temporal Action Segmentation via Boundary-aware Query Voting

26 September 2024·3348 words·16 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Stony Brook University

BaFormer: a novel boundary-aware Transformer network achieves efficient and accurate temporal action segmentation by using instance and global queries for segment classification and boundary predictio…

EEG2Video: Towards Decoding Dynamic Visual Perception from EEG Signals

26 September 2024·2189 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Microsoft Research

EEG2Video reconstructs dynamic videos from EEG signals, achieving 79.8% accuracy in semantic classification and 0.256 SSIM in video reconstruction.

E-Motion: Future Motion Simulation via Event Sequence Diffusion

26 September 2024·4535 words·22 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Xidian University

E-Motion: Predicting future motion with unprecedented accuracy using event cameras and video diffusion models.

DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos

26 September 2024·2231 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Carnegie Mellon University

DreamScene4D generates realistic 3D dynamic multi-object scenes from monocular videos via novel view synthesis, addressing limitations of existing methods with a novel decompose-recompose approach.

Don't Look Twice: Faster Video Transformers with Run-Length Tokenization

26 September 2024·1863 words·9 mins· loading · loading

Video Understanding 🏢 Carnegie Mellon University

Run-Length Tokenization (RLT) dramatically speeds up video transformer training and inference by efficiently removing redundant video tokens, matching baseline model performance with significant time …

Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos

26 September 2024·4229 words·20 mins· loading · loading

Video Understanding 🏢 University of Catania

This paper introduces a novel differentiable framework for learning task graphs from video demonstrations of procedural activities. By directly optimizing the weights of a task graph’s edges, the mod…

DeltaDEQ: Exploiting Heterogeneous Convergence for Accelerating Deep Equilibrium Iterations

26 September 2024·2888 words·14 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 ETH Zurich

DeltaDEQ accelerates deep equilibrium model inference by 73-84% via a novel ‘heterogeneous convergence’ exploitation technique, maintaining accuracy.

CV-VAE: A Compatible Video VAE for Latent Generative Video Models

26 September 2024·3396 words·16 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Tencent AI Lab

CV-VAE: A compatible video VAE enabling efficient, high-quality latent video generation by bridging the gap between image and video latent spaces.

COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing

26 September 2024·2236 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 Tsinghua University

COVE: Consistent high-quality video editing achieved by leveraging diffusion feature correspondence for temporal consistency.

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

26 September 2024·2588 words·13 mins· loading · loading

Computer Vision Video Understanding 🏢 Stanford University

Collaborative Video Diffusion (CVD) generates multiple consistent videos from various camera angles using a novel cross-video synchronization module, significantly improving consistency compared to ex…

bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction

26 September 2024·3565 words·17 mins· loading · loading

Computer Vision Video Understanding 🏢 Case Western Reserve University

bit2bit reconstructs high-quality videos from sparse, binary quanta image sensor data using self-supervised photon location prediction, significantly improving resolution and usability.

Beyond Euclidean: Dual-Space Representation Learning for Weakly Supervised Video Violence Detection

26 September 2024·2466 words·12 mins· loading · loading

Computer Vision Video Understanding 🏢 Chongqing University of Posts and Telecommunications

Beyond Euclidean spaces, Dual-Space Representation Learning (DSRL) enhances weakly supervised video violence detection by cleverly integrating Euclidean and hyperbolic geometries for superior discrimi…

Beyond Accuracy: Tracking more like Human via Visual Search

26 September 2024·2966 words·14 mins· loading · loading

Computer Vision Video Understanding 🏢 School of Artificial Intelligence, University of Chinese Academy of Sciences

CPDTrack: Human-like Visual Search Boosts Object Tracking!

AverNet: All-in-one Video Restoration for Time-varying Unknown Degradations

26 September 2024·2558 words·13 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 College of Computer Science, Sichuan University, China

AverNet: All-in-one video restoration defying time-varying unknown degradations.

ActAnywhere: Subject-Aware Video Background Generation

26 September 2024·1990 words·10 mins· loading · loading

Computer Vision Video Understanding 🏢 Stanford University

ActAnywhere, a novel video diffusion model, seamlessly integrates foreground subjects into new backgrounds by generating realistic video backgrounds tailored to subject motion, significantly reducing …

A Motion-aware Spatio-temporal Graph for Video Salient Object Ranking

26 September 2024·2245 words·11 mins· loading · loading

Computer Vision Video Understanding 🏢 School of Computer Science and Engineering, Southeast University

A novel motion-aware spatio-temporal graph model surpasses existing methods in video salient object ranking by jointly optimizing multi-scale spatial and temporal features, thus accurately prioritizin…

4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

26 September 2024·1721 words·9 mins· loading · loading

Computer Vision Video Understanding 🏢 Snap Inc.

4Real: Photorealistic 4D scene generation from text prompts using video diffusion models, exceeding object-centric approaches for higher realism and efficiency.