Natural Language Processing

PAFT: Prompt-Agnostic Fine-Tuning

18 February 2025·3569 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

PAFT dynamically adjusts prompts during LLM fine-tuning, improving model robustness and generalization across diverse prompts without sacrificing performance or efficiency.

MoBA: Mixture of Block Attention for Long-Context LLMs

18 February 2025·3939 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Moonshot AI

MoBA: Mixture of Block Attention enables efficient long-context LLMs by dynamically selecting relevant blocks, improving performance without compromising efficiency.

How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild

18 February 2025·3895 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 WüNLP, CAIDAS, University of Würzburg

Multilingual LLMs Hallucinate! This study measures hallucination across 30 languages.

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

18 February 2025·4689 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 California Institute of Technology

HEADINFER achieves memory-efficient LLM inference by cleverly offloading key-value cache to the CPU, enabling 4 million token inference on a single consumer GPU.

Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations for LLM-as-a-Judge

18 February 2025·3819 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 City University of Hong Kong

Crowd-based comparative evaluation significantly boosts LLM-as-a-judge accuracy by using crowd responses to expose deeper details, resulting in more reliable and efficient auto-evaluation.

Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

18 February 2025·2814 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 AIRI

LLMs can losslessly compress 1568 tokens into a single vector, surpassing prior methods by two orders of magnitude.

System Message Generation for User Preferences using Open-Source Models

17 February 2025·3777 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Upstage AI

SYSGEN: A novel pipeline generates effective system messages for LLMs using open-source models, improving model responses and addressing data scarcity in supervised fine-tuning.

SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL

17 February 2025·3833 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Department of Artificial Intelligence, Chung-Ang University

SAFE-SQL boosts Text-to-SQL accuracy by intelligently generating and filtering self-augmented examples for in-context learning, surpassing existing methods in challenging scenarios.

Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?

17 February 2025·2710 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 School of Computer Science, Fudan University

Contrary to popular belief, longer reasoning chains don’t always boost Large Language Model (LLM) accuracy; this research reveals that parallel scaling with shorter solutions outperforms sequential sc…

PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning

17 February 2025·2524 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Xi'an Jiaotong University

PhysReason benchmark evaluates physics-based reasoning in LLMs, revealing critical limitations and guiding future improvements.

Large Language Models and Mathematical Reasoning Failures

17 February 2025·397 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 KTH Royal Institute of Technology

Large language models struggle with mathematical word problems, demonstrating flaws in reasoning despite achieving high accuracy; a new study highlights these persistent gaps in generalization abiliti…

Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance

17 February 2025·1604 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 KTH Royal Institute of Technology

LLMs’ performance on language complexity tasks (LIX & ADD) reveals a strong correlation with general capabilities, suggesting complexity metrics as noisy zero-shot proxies for model evaluation.

Continuous Diffusion Model for Language Modeling

17 February 2025·1809 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Text Generation 🏢 Korea Advanced Institute of Science and Technology

RDLM: A novel continuous diffusion model for language modeling leverages the geometry of categorical distributions, outperforming existing discrete approaches and approaching autoregressive model perf…

Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity

17 February 2025·2347 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Illinois Urbana-Champaign

PoPilot, a novel proof-oriented programming LLM, outperforms GPT-40 by 64% under data scarcity by using synthetic data augmentation.

Atom of Thoughts for Markov LLM Test-Time Scaling

17 February 2025·2660 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

Atom of Thoughts (AOT) revolutionizes LLM test-time scaling by decomposing complex reasoning into independent sub-questions, drastically reducing computation while maintaining high accuracy.

Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems

16 February 2025·3486 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Sony Group Corporation

TalkHier, a novel framework for LLM multi-agent systems, uses structured communication and hierarchical refinement to achieve state-of-the-art performance on various tasks, improving collaboration and…

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

16 February 2025·2722 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 DeepSeek-AI

NSA: a novel sparse attention mechanism achieves efficient long-context modeling by combining algorithmic innovations with hardware-aligned optimizations, surpassing full attention models across vario…

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

16 February 2025·7040 words·34 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Zhejiang University

LLMs’ knowledge acquisition is unveiled through the lens of evolving knowledge circuits, revealing how new knowledge integration depends on relevance to existing knowledge, exhibiting distinct phases …

FinMTEB: Finance Massive Text Embedding Benchmark

16 February 2025·3630 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

FinMTEB: A new benchmark reveals that general-purpose embedding models struggle in the finance domain; domain-specific models excel, and surprisingly, simple BoW outperforms sophisticated models on ce…

Dyve: Thinking Fast and Slow for Dynamic Process Verification

16 February 2025·1995 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Chinese University of Hong Kong

Dyve: A novel dynamic process verifier boosts LLM reasoning accuracy by cleverly combining fast, immediate checks with deeper, slower analyses for complex steps, achieving significant performance gain…