Natural Language Processing

RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment

18 December 2024·4393 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 School of Artificial Intelligence, University of Chinese Academy of Sciences

First benchmark for RAG reward models reveals their limitations and the need for preference-aligned training.

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

18 December 2024·2716 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Surrey

Mix-LN boosts deep layer power in LLMs.

AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge

18 December 2024·2611 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Nanyang Technological University

Auto-built benchmark with up-to-date knowledge ensures contamination-free LLM evaluation.

OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain

17 December 2024·3082 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

OmniEval: Automatic benchmark for evaluating financial RAG systems.

Are Your LLMs Capable of Stable Reasoning?

17 December 2024·2140 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory

G-Pass@k & LiveMathBench: Evaluating the stability of LLMs.

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

16 December 2024·3747 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

Self-play method SPAR enhances LLMs instruction following abilities, beating GPT-4 on IFEval

SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator

16 December 2024·3575 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Huawei Noah's Ark Lab

SepLLM shrinks LLMs, speeding them up by over 50% without losing much accuracy.

RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation

16 December 2024·4628 words·22 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Renmin University of China

RetroLLM unifies retrieval & generation in LLMs, boosting accuracy and cutting costs.

Smaller Language Models Are Better Instruction Evolvers

15 December 2024·5507 words·26 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Beijing University of Posts and Telecommunications

Smaller is better: SLMs outperform LLMs in evolving complex & diverse instructions for AI training.

SCBench: A KV Cache-Centric Analysis of Long-Context Methods

13 December 2024·5380 words·26 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Microsoft Corporation

New benchmark for evaluating long-context models finds sub-O(n) methods lacking in real-world use cases.

Byte Latent Transformer: Patches Scale Better Than Tokens

13 December 2024·4848 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Washington

BLT: tokenizer-free LLM for efficiency and robustness

Word Sense Linking: Disambiguating Outside the Sandbox

12 December 2024·2984 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Word Sense Disambiguation 🏢 Sapienza University of Rome

Word Sense Linking (WSL) revolutionizes word sense disambiguation by tackling its real-world limitations. It combines span identification and sense linking in plain text, offering better integration …

The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective

12 December 2024·1893 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 National Library of Norway

Norwegians show that using copyrighted material improves LLMs, but raises legal and ethical issues.

Shiksha: A Technical Domain focused Translation Dataset and Model for Indian Languages

12 December 2024·1855 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Machine Translation 🏢 Indian Institute of Technology Madras

Shiksha: A new multilingual translation dataset and model surpasses existing benchmarks for Indian languages, focusing on scientific, technical, and educational domains.

RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios

12 December 2024·3495 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 UC Santa Barbara

RULEARENA, a new benchmark, rigorously evaluates large language models’ ability to apply complex, real-world rules across diverse scenarios, revealing significant shortcomings in current LLMs’ rule-gu…

Phi-4 Technical Report

12 December 2024·2630 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Microsoft Research

Phi-4: a 14B parameter LLM surpassing its teacher model (GPT-4) in STEM-focused QA through innovative synthetic data generation and post-training techniques.

JuStRank: Benchmarking LLM Judges for System Ranking

12 December 2024·13985 words·66 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 IBM Research

JuStRank: LLM system ranker benchmark reveals critical judge qualities (decisiveness, bias) impacting ranking accuracy, highlighting instance-level performance doesn’t guarantee accurate system-level…

SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs

11 December 2024·2774 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Saudi Data & Artificial Intelligence Authority

Fine-tuning small language models? Tweak the learning rate and batch size for a reasoning boost!

Granite Guardian

10 December 2024·4191 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 IBM Research

Granite Guardian: Open-source risk detection models for LLMs, surpassing existing models in accuracy and offering comprehensive coverage across multiple risk dimensions, promoting safer AI.

Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation

10 December 2024·1928 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Pisa

Contextualized AI counterspeech significantly outperforms generic methods by adapting to the moderation context and user, improving persuasiveness without sacrificing other qualities.