Natural Language Processing

SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators

10 February 2025·5896 words·28 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 AIRI

SynthDetoxM generates high-quality multilingual parallel data for text detoxification using LLMs, outperforming existing datasets and models in few-shot settings.

Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM

10 February 2025·3355 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

Steel-LLM: A fully open-source, resource-efficient Chinese LLM trained with transparency, achieving competitive performance despite limited resources.

ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates

10 February 2025·2360 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Princeton University

ReasonFlux boosts LLM mathematical reasoning by using hierarchical thought templates, outperforming top LLMs like OpenAI’s 01-preview and DeepSeek V3.

Matryoshka Quantization

10 February 2025·9741 words·46 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Google DeepMind

Matryoshka Quantization (MatQuant) boosts low-precision model accuracy by up to 10% through a novel multi-scale training approach. It leverages the nested structure of integer data types, allowing a …

Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning

10 February 2025·3104 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Université Paris-Saclay

Boosting RL fine-tuning efficiency in LLMs: A novel KL penalty modification prioritizes exploration on critical tokens, dramatically improving model performance on arithmetic tasks.

Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training

10 February 2025·3376 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Amazon

Hephaestus-Forge, a new large-scale pre-training corpus, significantly boosts LLM agent capabilities in API function calling, reasoning, and adaptability through continual pre-training.

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

10 February 2025·1736 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory

OREAL, a novel RL framework, achieves state-of-the-art mathematical reasoning in LLMs using only binary outcome rewards, demonstrating that a 7B model can match the performance of 32B models.

Expect the Unexpected: FailSafe Long Context QA for Finance

10 February 2025·2633 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 OpenAI

FailSafeQA benchmark rigorously evaluates LLMs’ resilience against diverse human-interaction variations, revealing critical weaknesses in even high-performing models, particularly regarding hallucinat…

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

10 February 2025·3884 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

Smaller LLMs can outperform larger ones by strategically increasing computation during inference, defying conventional LLM scaling.

Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

9 February 2025·507 words·3 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Stanford University

Language models learn effective social deduction strategies in a virtual game by using their goal to predict useful information as a dense reward signal, doubling win rates compared to standard RL.

The Curse of Depth in Large Language Models

9 February 2025·2429 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Medical Artificial Intelligence Laboratory, Westlake University

Deep layers in LLMs underperform due to Pre-Layer Normalization; LayerNorm Scaling resolves this by controlling output variance, significantly improving training efficiency.

LM2: Large Memory Models

9 February 2025·2722 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Convergence Labs Ltd

LM2: Large Memory Models enhance Transformers by adding an auxiliary memory module, significantly improving multi-step reasoning and long-context information synthesis.

APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding

8 February 2025·6090 words·29 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

APE: a novel method significantly speeds up context-augmented generation (CAG). By using adaptive parallel encoding, APE achieves a 4.5x speedup and maintains high accuracy even with 128K length cont…

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

7 February 2025·5939 words·28 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Maryland

Boost LLM reasoning power at test time by recursively processing latent information, enabling dramatic performance gains with fewer parameters.

QuEST: Stable Training of LLMs with 1-Bit Weights and Activations

7 February 2025·3320 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 ISTA

QuEST enables stable, accurate LLM training using only 1-bit weights and activations, achieving Pareto-optimal performance compared to higher-precision models.

Generating Symbolic World Models via Test-time Scaling of Large Language Models

7 February 2025·2722 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

LLMs excel at complex reasoning but struggle with planning; this paper introduces a test-time scaling approach that enhances LLMs’ PDDL reasoning, enabling high-quality PDDL domain generation, outperf…

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

7 February 2025·2622 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of California, Los Angeles

DuoGuard: a novel two-player RL framework generates high-quality synthetic data, improving multilingual LLM safety by outperforming state-of-the-art models with a significantly smaller model size and …

ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning

7 February 2025·8117 words·39 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 University of British Columbia

ARR: A novel zero-shot prompting method significantly boosts LLM performance on diverse question-answering tasks by explicitly incorporating question analysis, information retrieval, and step-by-step …

Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

6 February 2025·6016 words·29 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Brown University

Simple interactions can easily elicit harmful outputs from LLMs, which are often overlooked. The SPEAK EASY framework and HARMSCORE metric expose this vulnerability and provide tools for better safet…

PILAF: Optimal Human Preference Sampling for Reward Modeling

6 February 2025·2374 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 NYU

PILAF optimizes human feedback in reward modeling for better LLM alignment by using a novel response sampling strategy that aligns reward modeling with value optimization.