Natural Language Processing

SAGE: A Framework of Precise Retrieval for RAG

3 March 2025·3653 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Tsinghua University

SAGE: Precise RAG via semantic segmentation, adaptive chunking, and LLM feedback, boosting QA accuracy & cost-efficiency.

Liger: Linearizing Large Language Models to Gated Recurrent Structures

3 March 2025·4096 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory

Liger: LLMs linearized to gated recurrent models, enabling efficient deployment via key matrix repurposing and LoRA fine-tuning.

Large-Scale Data Selection for Instruction Tuning

3 March 2025·2665 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Washington

RDS+ is the unsung hero for scaling instruction tuning data selection!

Forgetting Transformer: Softmax Attention with a Forget Gate

3 March 2025·4225 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Mila & Université De Montréal

Transformers get forgetful! This paper introduces the Forgetting Transformer (FoX), incorporating a forget gate into the attention mechanism for improved sequence modeling.

Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective

3 March 2025·2296 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 SandLogic Technologies Pvt Ltd

Shakti SLMs: Fine-tuning compact language models for efficient, domain-specific AI on edge devices.

CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom

3 March 2025·8404 words·40 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Huazhong University of Science and Technology

CROWDSELECT boosts instruction tuning by cleverly selecting synthetic data using multi-LLM wisdom, enhancing model performance across diverse tasks.

CodeArena: A Collective Evaluation Platform for LLM Code Generation

3 March 2025·1693 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Nanyang Technological University

CodeArena: Collective evaluation for LLM code generation.

SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking

2 March 2025·3011 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 FPT Software AI Center, Viet Nam

SemViQA: A new approach to boost Vietnamese fact-checking with semantic understanding and efficient evidence retrieval.

DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting

2 March 2025·2236 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Fudan University

DuoDecoding: Accelerating LLM inference by strategically deploying draft & target models on CPU & GPU for parallel decoding and dynamic drafting.

Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers

2 March 2025·2242 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 DAMO Academy, Alibaba Group

Babel: An open multilingual LLM supports over 90% of global speakers, filling the language coverage gap and setting new performance standards.

R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning

27 February 2025·219 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Machine Translation 🏢 Huawei, China

R1-T1: RL-driven framework incentivizing translation capability in LLMs via reasoning learning, achieving superior performance in multiple languages & domains.

Mixture of Structural-and-Textual Retrieval over Text-rich Graph Knowledge Bases

27 February 2025·2582 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 University of Oregon

MoR: Adaptive knowledge retrieval by fusing structural and textual data for better question answering.

LongRoPE2: Near-Lossless LLM Context Window Scaling

27 February 2025·3732 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Microsoft

LongRoPE2: Extends LLM context windows while preserving performance and reducing training costs!

NeoBERT: A Next-Generation BERT

26 February 2025·2699 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Polytechnique Montréal

NeoBERT: A new encoder that enhances bidirectional language understanding with cutting-edge architecture, data, and training, achieving SOTA results with only 250M parameters.

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens

26 February 2025·4298 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Text Generation 🏢 NLCo Lab, BIGAI

TokenSwift: Accelerate LLM ultra-long sequence generation up to 100K tokens with >3x speedup and lossless accuracy!

Exploring Rewriting Approaches for Different Conversational Tasks

26 February 2025·1596 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Adobe Research

Rewriting method is critical to conversational assistant effectiveness.

Chain of Draft: Thinking Faster by Writing Less

25 February 2025·1398 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Zoom Communications

CoD: LLMs think faster by writing less! A novel prompting strategy cuts costs and latency while maintaining reasoning accuracy.

Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

24 February 2025·3779 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

GOAT: Adaptively boosts LoRA with SVD & MoE alignment, closing the gap with Full FT.

Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning

24 February 2025·9576 words·45 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Yonsei University

Test-time scaling isn’t a universal solve-all for multilingual math reasoning, unlike pre-training scaling, shows MCLM benchmark.

Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties

24 February 2025·2937 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Southeast University

CTM: A new benchmark for assessing temporal reasoning in LLMs across Chinese dynastic history.