Scene Understanding
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learner
·2048 words·10 mins·
loading
·
loading
Computer Vision
Scene Understanding
🏢 String
Efficient Multi-Task Learning (EMTAL) transforms pre-trained Vision Transformers into efficient multi-task learners by using a MoEfied LoRA structure, a Quality Retaining optimization, and a router fa…
Multiview Scene Graph
·2365 words·12 mins·
loading
·
loading
Computer Vision
Scene Understanding
🏢 New York University
AI models struggle to understand 3D space like humans do. This paper introduces Multiview Scene Graphs (MSGs) – a new topological scene representation using interconnected place and object nodes buil…
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
·2610 words·13 mins·
loading
·
loading
Computer Vision
Scene Understanding
🏢 University of Illinois Urbana-Champaign
Lexicon3D: a first comprehensive study probing diverse visual foundation models for superior 3D scene understanding, revealing that unsupervised image models outperform others across various tasks.
Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models
·2348 words·12 mins·
loading
·
loading
AI Generated
Computer Vision
Scene Understanding
🏢 ReLER, AAII, University of Technology Sydney
DIFFUSIONHOI: A novel HOI detector using text-to-image diffusion models to improve compositional reasoning and handling of novel concepts, achieving state-of-the-art performance.
DiffSF: Diffusion Models for Scene Flow Estimation
·2351 words·12 mins·
loading
·
loading
Scene Understanding
🏢 Linköping University
DiffSF boosts scene flow estimation accuracy and reliability by cleverly combining transformer networks with denoising diffusion models, offering state-of-the-art results and uncertainty quantificatio…
CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos
·2627 words·13 mins·
loading
·
loading
Computer Vision
Scene Understanding
🏢 University of Arkansas
CYCLO: A novel cyclic graph transformer excels at multi-object relationship modeling in aerial videos.
Coherent 3D Scene Diffusion From a Single RGB Image
·2684 words·13 mins·
loading
·
loading
Computer Vision
Scene Understanding
🏢 Technical University of Munich
Coherent 3D scenes are diffused from a single RGB image using a novel image-conditioned 3D scene diffusion model, surpassing state-of-the-art methods.