Skip to main content

Video Understanding

End-to-End Video Semantic Segmentation in Adverse Weather using Fusion Blocks and Temporal-Spatial Teacher-Student Learning
·2581 words·13 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 National University of Singapore
Optical-flow-free video semantic segmentation excels in adverse weather by merging adjacent frame information via a fusion block and a novel temporal-spatial teacher-student learning strategy.
Efficient Temporal Action Segmentation via Boundary-aware Query Voting
·3348 words·16 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 Stony Brook University
BaFormer: a novel boundary-aware Transformer network achieves efficient and accurate temporal action segmentation by using instance and global queries for segment classification and boundary predictio…
EEG2Video: Towards Decoding Dynamic Visual Perception from EEG Signals
·2189 words·11 mins· loading · loading
Computer Vision Video Understanding 🏢 Microsoft Research
EEG2Video reconstructs dynamic videos from EEG signals, achieving 79.8% accuracy in semantic classification and 0.256 SSIM in video reconstruction.
E-Motion: Future Motion Simulation via Event Sequence Diffusion
·4535 words·22 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 Xidian University
E-Motion: Predicting future motion with unprecedented accuracy using event cameras and video diffusion models.
DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos
·2231 words·11 mins· loading · loading
Computer Vision Video Understanding 🏢 Carnegie Mellon University
DreamScene4D generates realistic 3D dynamic multi-object scenes from monocular videos via novel view synthesis, addressing limitations of existing methods with a novel decompose-recompose approach.
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
·1863 words·9 mins· loading · loading
Video Understanding 🏢 Carnegie Mellon University
Run-Length Tokenization (RLT) dramatically speeds up video transformer training and inference by efficiently removing redundant video tokens, matching baseline model performance with significant time …
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
·4229 words·20 mins· loading · loading
Video Understanding 🏢 University of Catania
This paper introduces a novel differentiable framework for learning task graphs from video demonstrations of procedural activities. By directly optimizing the weights of a task graph’s edges, the mod…
DeltaDEQ: Exploiting Heterogeneous Convergence for Accelerating Deep Equilibrium Iterations
·2888 words·14 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 ETH Zurich
DeltaDEQ accelerates deep equilibrium model inference by 73-84% via a novel ‘heterogeneous convergence’ exploitation technique, maintaining accuracy.
CV-VAE: A Compatible Video VAE for Latent Generative Video Models
·3396 words·16 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 Tencent AI Lab
CV-VAE: A compatible video VAE enabling efficient, high-quality latent video generation by bridging the gap between image and video latent spaces.
COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing
·2236 words·11 mins· loading · loading
Computer Vision Video Understanding 🏢 Tsinghua University
COVE: Consistent high-quality video editing achieved by leveraging diffusion feature correspondence for temporal consistency.
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control
·2588 words·13 mins· loading · loading
Computer Vision Video Understanding 🏢 Stanford University
Collaborative Video Diffusion (CVD) generates multiple consistent videos from various camera angles using a novel cross-video synchronization module, significantly improving consistency compared to ex…
bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction
·3565 words·17 mins· loading · loading
Computer Vision Video Understanding 🏢 Case Western Reserve University
bit2bit reconstructs high-quality videos from sparse, binary quanta image sensor data using self-supervised photon location prediction, significantly improving resolution and usability.
Beyond Euclidean: Dual-Space Representation Learning for Weakly Supervised Video Violence Detection
·2466 words·12 mins· loading · loading
Computer Vision Video Understanding 🏢 Chongqing University of Posts and Telecommunications
Beyond Euclidean spaces, Dual-Space Representation Learning (DSRL) enhances weakly supervised video violence detection by cleverly integrating Euclidean and hyperbolic geometries for superior discrimi…
Beyond Accuracy: Tracking more like Human via Visual Search
·2966 words·14 mins· loading · loading
Computer Vision Video Understanding 🏢 School of Artificial Intelligence, University of Chinese Academy of Sciences
CPDTrack: Human-like Visual Search Boosts Object Tracking!
AverNet: All-in-one Video Restoration for Time-varying Unknown Degradations
·2558 words·13 mins· loading · loading
AI Generated Computer Vision Video Understanding 🏢 College of Computer Science, Sichuan University, China
AverNet: All-in-one video restoration defying time-varying unknown degradations.
ActAnywhere: Subject-Aware Video Background Generation
·1990 words·10 mins· loading · loading
Computer Vision Video Understanding 🏢 Stanford University
ActAnywhere, a novel video diffusion model, seamlessly integrates foreground subjects into new backgrounds by generating realistic video backgrounds tailored to subject motion, significantly reducing …
A Motion-aware Spatio-temporal Graph for Video Salient Object Ranking
·2245 words·11 mins· loading · loading
Computer Vision Video Understanding 🏢 School of Computer Science and Engineering, Southeast University
A novel motion-aware spatio-temporal graph model surpasses existing methods in video salient object ranking by jointly optimizing multi-scale spatial and temporal features, thus accurately prioritizin…
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
·1721 words·9 mins· loading · loading
Computer Vision Video Understanding 🏢 Snap Inc.
4Real: Photorealistic 4D scene generation from text prompts using video diffusion models, exceeding object-centric approaches for higher realism and efficiency.