Skip to main content

Large Language Models

MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design
·2482 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Microsoft Research
MixLLM achieves state-of-the-art LLM compression by using mixed-precision quantization between output features, improving accuracy and system efficiency.
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps
·11623 words·55 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 TU Darmstadt
M-ALERT, a new multilingual benchmark, reveals significant safety inconsistencies across languages in top LLMs.
How to Synthesize Text Data without Model Collapse?
·5702 words·27 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
Token-level editing prevents language model collapse from synthetic data by theoretically bounding test error and empirically improving model performance.
Fietje: An open, efficient LLM for Dutch
·3094 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 KU Leuven
Fietje: an open-source, efficient Dutch language model outperforming larger models.
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
·3123 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NVIDIA Research
AceMath achieves state-of-the-art results in mathematical reasoning by introducing highly effective instruction-tuned models and reward models.
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
·2677 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Carnegie Mellon University
AI agents are tested in a simulated company, revealing their capability to automate tasks and shortcomings with complex workflows and interfaces.
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
·4393 words·21 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 School of Artificial Intelligence, University of Chinese Academy of Sciences
First benchmark for RAG reward models reveals their limitations and the need for preference-aligned training.
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
·2716 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Surrey
Mix-LN boosts deep layer power in LLMs.
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge
·2611 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Nanyang Technological University
Auto-built benchmark with up-to-date knowledge ensures contamination-free LLM evaluation.
Are Your LLMs Capable of Stable Reasoning?
·2140 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory
G-Pass@k & LiveMathBench: Evaluating the stability of LLMs.
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
·3747 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University
Self-play method SPAR enhances LLMs instruction following abilities, beating GPT-4 on IFEval
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
·3575 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Huawei Noah's Ark Lab
SepLLM shrinks LLMs, speeding them up by over 50% without losing much accuracy.
Smaller Language Models Are Better Instruction Evolvers
·5507 words·26 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Beijing University of Posts and Telecommunications
Smaller is better: SLMs outperform LLMs in evolving complex & diverse instructions for AI training.
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
·5380 words·26 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Microsoft Corporation
New benchmark for evaluating long-context models finds sub-O(n) methods lacking in real-world use cases.
Byte Latent Transformer: Patches Scale Better Than Tokens
·4848 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Washington
BLT: tokenizer-free LLM for efficiency and robustness
The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective
·1893 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 National Library of Norway
Norwegians show that using copyrighted material improves LLMs, but raises legal and ethical issues.
RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios
·3495 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 UC Santa Barbara
RULEARENA, a new benchmark, rigorously evaluates large language models’ ability to apply complex, real-world rules across diverse scenarios, revealing significant shortcomings in current LLMs’ rule-gu…
Phi-4 Technical Report
·2630 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Microsoft Research
Phi-4: a 14B parameter LLM surpassing its teacher model (GPT-4) in STEM-focused QA through innovative synthetic data generation and post-training techniques.
JuStRank: Benchmarking LLM Judges for System Ranking
·13985 words·66 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 IBM Research
JuStRank: LLM system ranker benchmark reveals critical judge qualities (decisiveness, bias) impacting ranking accuracy, highlighting instance-level performance doesn’t guarantee accurate system-level…
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs
·2774 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Saudi Data & Artificial Intelligence Authority
Fine-tuning small language models? Tweak the learning rate and batch size for a reasoning boost!