🏢 South China University of Technology
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences
·3719 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 South China University of Technology
LSceneLLM boosts large 3D scene understanding by adaptively focusing on task-relevant visual details using LLMs’ visual preferences, surpassing existing methods on multiple benchmarks.