🏢 MBZUAI
Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models
·4635 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 MBZUAI
Video SimpleQA: A New Benchmark for Factuality Evaluation in Large Video Language Models.
Word Form Matters: LLMs' Semantic Reconstruction under Typoglycemia
·2734 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 MBZUAI
LLMs primarily rely on word form, unlike humans, when reconstructing semantics, indicating a need for context-aware mechanisms to enhance LLMs’ adaptability.
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
·3707 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Scene Understanding
🏢 MBZUAI
KITAB-Bench: A new multi-domain Arabic OCR benchmark to bridge the performance gap with English OCR technologies.
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework
·2585 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Scene Understanding
🏢 MBZUAI
New geolocation dataset & reasoning framework enhance accuracy and interpretability by leveraging human gameplay data.