Skip to main content

Natural Language Processing

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
·4209 words·20 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 MIT
SelfCite: A self-supervised approach boosts LLM citation accuracy via context ablation. By removing or isolating cited text, SelfCite trains LLMs to generate high-quality citations without manual ann…
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
·2116 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Beijing University of Posts and Telecommunications
MUDDFormer boosts Transformer performance by dynamically generating connection weights, improving cross-layer information flow and surpassing models trained with significantly more compute.
CRANE: Reasoning with constrained LLM generation
·2445 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 University of Illinois Urbana-Champaign
CRANE: A novel constrained decoding algorithm boosts LLM reasoning accuracy by strategically alternating between unconstrained reasoning and constrained generation.
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
·3429 words·17 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 National University of Singapore
CoT-Valve dynamically adjusts reasoning chain lengths based on task difficulty, significantly reducing inference costs in large language models without substantial accuracy loss.
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging
·3494 words·17 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 SCB 10X R&D
Low-resource language LLMs gain strong reasoning abilities by merging with a high-resource reasoning model, achieving performance comparable to state-of-the-art models while maintaining target languag…
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs
·2416 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Tsinghua University
New benchmark COUNTERMATH enhances LLMs’ mathematical reasoning using counterexample-driven proofs, revealing current models’ limitations and paving the way for improved mathematical capabilities.
Better Embeddings with Coupled Adam
·2826 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 AI Sweden
Coupled Adam: A novel optimizer fixes anisotropic word embeddings in LLMs, boosting model performance.
We Can't Understand AI Using our Existing Vocabulary
·3226 words·16 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Google DeepMind
To understand AI, we need new words! This paper argues that developing neologismsβ€”new words for human & machine conceptsβ€”is key to bridging the communication gap and achieving better AI control.
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
·3137 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 UC Berkeley
LLMs can be effectively taught complex reasoning via efficient fine-tuning on demonstration data focusing on structure, not content, of the reasoning process.
LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid
·2654 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Shanghai AI Laboratory
LASP-2 revolutionizes linear attention training by achieving 36.6% faster speeds than Ring Attention via a novel sequence parallelism method, boosting efficiency for very long sequences.
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
·5174 words·25 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Hong Kong University of Science and Technology
CODEI/O: Condensing reasoning patterns from code into LLM training data for enhanced reasoning.
Auditing Prompt Caching in Language Model APIs
·5759 words·28 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Stanford University
Researchers expose widespread prompt caching in LLMs via novel timing attacks, highlighting significant privacy risks and model architecture leakage.
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators
·5896 words·28 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 AIRI
SynthDetoxM generates high-quality multilingual parallel data for text detoxification using LLMs, outperforming existing datasets and models in few-shot settings.
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
·3355 words·16 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Tsinghua University
Steel-LLM: A fully open-source, resource-efficient Chinese LLM trained with transparency, achieving competitive performance despite limited resources.
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
·2360 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Princeton University
ReasonFlux boosts LLM mathematical reasoning by using hierarchical thought templates, outperforming top LLMs like OpenAI’s 01-preview and DeepSeek V3.
Matryoshka Quantization
·9741 words·46 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Google DeepMind
Matryoshka Quantization (MatQuant) boosts low-precision model accuracy by up to 10% through a novel multi-scale training approach. It leverages the nested structure of integer data types, allowing a …
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning
·3104 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 UniversitΓ© Paris-Saclay
Boosting RL fine-tuning efficiency in LLMs: A novel KL penalty modification prioritizes exploration on critical tokens, dramatically improving model performance on arithmetic tasks.
Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training
·3376 words·16 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Amazon
Hephaestus-Forge, a new large-scale pre-training corpus, significantly boosts LLM agent capabilities in API function calling, reasoning, and adaptability through continual pre-training.
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
·1736 words·9 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Shanghai AI Laboratory
OREAL, a novel RL framework, achieves state-of-the-art mathematical reasoning in LLMs using only binary outcome rewards, demonstrating that a 7B model can match the performance of 32B models.
Expect the Unexpected: FailSafe Long Context QA for Finance
·2633 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Question Answering 🏒 OpenAI
FailSafeQA benchmark rigorously evaluates LLMs’ resilience against diverse human-interaction variations, revealing critical weaknesses in even high-performing models, particularly regarding hallucinat…