Question Answering
Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering
·2478 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Johns Hopkins University
Test-time scaling + confidence = better QA!
SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL
·3833 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Department of Artificial Intelligence, Chung-Ang University
SAFE-SQL boosts Text-to-SQL accuracy by intelligently generating and filtering self-augmented examples for in-context learning, surpassing existing methods in challenging scenarios.
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models
·4327 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Intel Labs
SQUARE, a novel prompting technique, enhances LLM reasoning by prompting self-interrogation through sequential question answering, significantly outperforming traditional methods.
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking
·1354 words·7 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 University of Copenhagen
Fact-checkers need explainable AI: This study reveals how AI tools can better support fact-checkers by providing explanations tailored to their workflows, addressing unmet needs, and improving the eff…
Expect the Unexpected: FailSafe Long Context QA for Finance
·2633 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 OpenAI
FailSafeQA benchmark rigorously evaluates LLMs’ resilience against diverse human-interaction variations, revealing critical weaknesses in even high-performing models, particularly regarding hallucinat…
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
·8117 words·39 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 University of British Columbia
ARR: A novel zero-shot prompting method significantly boosts LLM performance on diverse question-answering tasks by explicitly incorporating question analysis, information retrieval, and step-by-step …
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
·2973 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences
DeepRAG enhances LLM reasoning by strategically integrating retrieval, modeled as an MDP, improving accuracy by 21.99% and retrieval efficiency.
ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution
·228 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Adobe Research
ChartCitor: A multi-agent LLM framework combats LLM hallucination in ChartQA by providing fine-grained visual citations, enhancing user trust and productivity.
Chain-of-Retrieval Augmented Generation
·4155 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Microsoft Research
CoRAG, a novel Chain-of-Retrieval Augmented Generation model, dynamically refines queries for improved accuracy in multi-hop question answering, achieving state-of-the-art performance.
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
·6574 words·31 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Yale NLP
MMVU: a new benchmark pushes multimodal video understanding to expert level, revealing limitations of current models and paving the way for more advanced AI.
GeAR: Generation Augmented Retrieval
·1952 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Microsoft Research
GeAR, a new retrieval model, boosts accuracy by combining document retrieval with fine-grained information generation, leading to better understanding and improved localization.
MapQaTor: A System for Efficient Annotation of Map Query Datasets
·3496 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Department of Computer Science and Engineering
MAPQATOR: a web app that streamlines creation of reproducible geospatial QA datasets, boosting annotation speed by 30x!
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain
·3082 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Gaoling School of Artificial Intelligence, Renmin University of China
OmniEval: Automatic benchmark for evaluating financial RAG systems.
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation
·4628 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Renmin University of China
RetroLLM unifies retrieval & generation in LLMs, boosting accuracy and cutting costs.
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering
·5666 words·27 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Department of Computer Science, University of Oregon
MedRGB benchmark reveals current LLMs struggle with noisy medical data, emphasizing the need for robust RAG systems in healthcare AI.
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework
·2696 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Singapore University of Technology and Design
M-LongDoc: a new benchmark and retrieval-aware tuning framework revolutionizes multimodal long document understanding, improving model accuracy by 4.6%.
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
·2200 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Renmin University of China
HtmlRAG boosts RAG system accuracy by using HTML, not plain text, to model retrieved knowledge, improving knowledge representation and mitigating LLM hallucination.
GRS-QA -- Graph Reasoning-Structured Question Answering Dataset
·5467 words·26 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 University of California Santa Cruz
GRS-QA: New benchmark dataset reveals LLM reasoning limitations!