Skip to main content

Scene Understanding

Towards a Unified Copernicus Foundation Model for Earth Vision
·4400 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Scene Understanding 🏢 Technical University of Munich
Unified Copernicus Foundation Model for Earth Vision: A multimodal approach to improve scalability, versatility, and adaptability of EO models.
Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model
·3468 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Scene Understanding 🏢 Hong Kong Center for Construction Robotics, the Hong Kong University of Science and Technology
Plane-DUSt3R: Leveraging pre-trained models for unposed sparse views room layout reconstruction, enhancing robustness and generalization.
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
·3707 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Scene Understanding 🏢 MBZUAI
KITAB-Bench: A new multi-domain Arabic OCR benchmark to bridge the performance gap with English OCR technologies.
CrossOver: 3D Scene Cross-Modal Alignment
·5760 words·28 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Scene Understanding 🏢 Stanford University
CrossOver: Flexible scene-level cross-modal alignment via modality-agnostic embeddings, unlocking robust 3D scene understanding.
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework
·2585 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Scene Understanding 🏢 MBZUAI
New geolocation dataset & reasoning framework enhance accuracy and interpretability by leveraging human gameplay data.