Natural Language Processing
How to Synthesize Text Data without Model Collapse?
·5702 words·27 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
Token-level editing prevents language model collapse from synthetic data by theoretically bounding test error and empirically improving model performance.
Fietje: An open, efficient LLM for Dutch
·3094 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 KU Leuven
Fietje: an open-source, efficient Dutch language model outperforming larger models.
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
·3123 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 NVIDIA Research
AceMath achieves state-of-the-art results in mathematical reasoning by introducing highly effective instruction-tuned models and reward models.
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
·2677 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
AI agents are tested in a simulated company, revealing their capability to automate tasks and shortcomings with complex workflows and interfaces.
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
·4393 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 School of Artificial Intelligence, University of Chinese Academy of Sciences
First benchmark for RAG reward models reveals their limitations and the need for preference-aligned training.
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
·2716 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Surrey
Mix-LN boosts deep layer power in LLMs.
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge
·2611 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Nanyang Technological University
Auto-built benchmark with up-to-date knowledge ensures contamination-free LLM evaluation.
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain
·3082 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Gaoling School of Artificial Intelligence, Renmin University of China
OmniEval: Automatic benchmark for evaluating financial RAG systems.
Are Your LLMs Capable of Stable Reasoning?
·2140 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Shanghai AI Laboratory
G-Pass@k & LiveMathBench: Evaluating the stability of LLMs.
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
·3747 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
Self-play method SPAR enhances LLMs instruction following abilities, beating GPT-4 on IFEval
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
·3575 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Huawei Noah's Ark Lab
SepLLM shrinks LLMs, speeding them up by over 50% without losing much accuracy.
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation
·4628 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Renmin University of China
RetroLLM unifies retrieval & generation in LLMs, boosting accuracy and cutting costs.
Smaller Language Models Are Better Instruction Evolvers
·5507 words·26 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Beijing University of Posts and Telecommunications
Smaller is better: SLMs outperform LLMs in evolving complex & diverse instructions for AI training.
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
·5380 words·26 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Microsoft Corporation
New benchmark for evaluating long-context models finds sub-O(n) methods lacking in real-world use cases.
Byte Latent Transformer: Patches Scale Better Than Tokens
·4848 words·23 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Washington
BLT: tokenizer-free LLM for efficiency and robustness
Word Sense Linking: Disambiguating Outside the Sandbox
·2984 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Word Sense Disambiguation
🏢 Sapienza University of Rome
Word Sense Linking (WSL) revolutionizes word sense disambiguation by tackling its real-world limitations. It combines span identification and sense linking in plain text, offering better integration …
The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective
·1893 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 National Library of Norway
Norwegians show that using copyrighted material improves LLMs, but raises legal and ethical issues.
Shiksha: A Technical Domain focused Translation Dataset and Model for Indian Languages
·1855 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Machine Translation
🏢 Indian Institute of Technology Madras
Shiksha: A new multilingual translation dataset and model surpasses existing benchmarks for Indian languages, focusing on scientific, technical, and educational domains.
RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios
·3495 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 UC Santa Barbara
RULEARENA, a new benchmark, rigorously evaluates large language models’ ability to apply complex, real-world rules across diverse scenarios, revealing significant shortcomings in current LLMs’ rule-gu…
Phi-4 Technical Report
·2630 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Microsoft Research
Phi-4: a 14B parameter LLM surpassing its teacher model (GPT-4) in STEM-focused QA through innovative synthetic data generation and post-training techniques.