Scene Understanding

Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learner

26 September 2024·2048 words·10 mins· loading · loading

Computer Vision Scene Understanding 🏢 String

Efficient Multi-Task Learning (EMTAL) transforms pre-trained Vision Transformers into efficient multi-task learners by using a MoEfied LoRA structure, a Quality Retaining optimization, and a router fa…

Multiview Scene Graph

26 September 2024·2365 words·12 mins· loading · loading

Computer Vision Scene Understanding 🏢 New York University

AI models struggle to understand 3D space like humans do. This paper introduces Multiview Scene Graphs (MSGs) – a new topological scene representation using interconnected place and object nodes buil…

Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

26 September 2024·2610 words·13 mins· loading · loading

Computer Vision Scene Understanding 🏢 University of Illinois Urbana-Champaign

Lexicon3D: a first comprehensive study probing diverse visual foundation models for superior 3D scene understanding, revealing that unsupervised image models outperform others across various tasks.

Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models

26 September 2024·2348 words·12 mins· loading · loading

AI Generated Computer Vision Scene Understanding 🏢 ReLER, AAII, University of Technology Sydney

DIFFUSIONHOI: A novel HOI detector using text-to-image diffusion models to improve compositional reasoning and handling of novel concepts, achieving state-of-the-art performance.

DiffSF: Diffusion Models for Scene Flow Estimation

26 September 2024·2351 words·12 mins· loading · loading

Scene Understanding 🏢 Linköping University

DiffSF boosts scene flow estimation accuracy and reliability by cleverly combining transformer networks with denoising diffusion models, offering state-of-the-art results and uncertainty quantificatio…

CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos

26 September 2024·2627 words·13 mins· loading · loading

Computer Vision Scene Understanding 🏢 University of Arkansas

CYCLO: A novel cyclic graph transformer excels at multi-object relationship modeling in aerial videos.

Coherent 3D Scene Diffusion From a Single RGB Image

26 September 2024·2684 words·13 mins· loading · loading

Computer Vision Scene Understanding 🏢 Technical University of Munich

Coherent 3D scenes are diffused from a single RGB image using a novel image-conditioned 3D scene diffusion model, surpassing state-of-the-art methods.