Scene Understanding
Towards a Unified Copernicus Foundation Model for Earth Vision
·4400 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Scene Understanding
🏢 Technical University of Munich
Unified Copernicus Foundation Model for Earth Vision: A multimodal approach to improve scalability, versatility, and adaptability of EO models.
Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model
·3468 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Scene Understanding
🏢 Hong Kong Center for Construction Robotics, the Hong Kong University of Science and Technology
Plane-DUSt3R: Leveraging pre-trained models for unposed sparse views room layout reconstruction, enhancing robustness and generalization.
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
·3707 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Scene Understanding
🏢 MBZUAI
KITAB-Bench: A new multi-domain Arabic OCR benchmark to bridge the performance gap with English OCR technologies.
CrossOver: 3D Scene Cross-Modal Alignment
·5760 words·28 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Scene Understanding
🏢 Stanford University
CrossOver: Flexible scene-level cross-modal alignment via modality-agnostic embeddings, unlocking robust 3D scene understanding.
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework
·2585 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Scene Understanding
🏢 MBZUAI
New geolocation dataset & reasoning framework enhance accuracy and interpretability by leveraging human gameplay data.