Question Answering
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
·6574 words·31 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Yale NLP
MMVU: a new benchmark pushes multimodal video understanding to expert level, revealing limitations of current models and paving the way for more advanced AI.
GeAR: Generation Augmented Retrieval
·1952 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Microsoft Research
GeAR, a new retrieval model, boosts accuracy by combining document retrieval with fine-grained information generation, leading to better understanding and improved localization.
MapQaTor: A System for Efficient Annotation of Map Query Datasets
·3496 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Department of Computer Science and Engineering
MAPQATOR: a web app that streamlines creation of reproducible geospatial QA datasets, boosting annotation speed by 30x!
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain
·3082 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Gaoling School of Artificial Intelligence, Renmin University of China
OmniEval: Automatic benchmark for evaluating financial RAG systems.
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation
·4628 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Renmin University of China
RetroLLM unifies retrieval & generation in LLMs, boosting accuracy and cutting costs.
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering
·5666 words·27 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Department of Computer Science, University of Oregon
MedRGB benchmark reveals current LLMs struggle with noisy medical data, emphasizing the need for robust RAG systems in healthcare AI.
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework
·2696 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Singapore University of Technology and Design
M-LongDoc: a new benchmark and retrieval-aware tuning framework revolutionizes multimodal long document understanding, improving model accuracy by 4.6%.
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
·2200 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Renmin University of China
HtmlRAG boosts RAG system accuracy by using HTML, not plain text, to model retrieved knowledge, improving knowledge representation and mitigating LLM hallucination.
GRS-QA -- Graph Reasoning-Structured Question Answering Dataset
·5467 words·26 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 University of California Santa Cruz
GRS-QA: New benchmark dataset reveals LLM reasoning limitations!