Large Language Models

Large-Scale Data Selection for Instruction Tuning

3 March 2025·2665 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Washington

RDS+ is the unsung hero for scaling instruction tuning data selection!

Forgetting Transformer: Softmax Attention with a Forget Gate

3 March 2025·4225 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Mila & Université De Montréal

Transformers get forgetful! This paper introduces the Forgetting Transformer (FoX), incorporating a forget gate into the attention mechanism for improved sequence modeling.

Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective

3 March 2025·2296 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 SandLogic Technologies Pvt Ltd

Shakti SLMs: Fine-tuning compact language models for efficient, domain-specific AI on edge devices.

CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom

3 March 2025·8404 words·40 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Huazhong University of Science and Technology

CROWDSELECT boosts instruction tuning by cleverly selecting synthetic data using multi-LLM wisdom, enhancing model performance across diverse tasks.

CodeArena: A Collective Evaluation Platform for LLM Code Generation

3 March 2025·1693 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Nanyang Technological University

CodeArena: Collective evaluation for LLM code generation.

DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting

2 March 2025·2236 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Fudan University

DuoDecoding: Accelerating LLM inference by strategically deploying draft & target models on CPU & GPU for parallel decoding and dynamic drafting.

Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers

2 March 2025·2242 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 DAMO Academy, Alibaba Group

Babel: An open multilingual LLM supports over 90% of global speakers, filling the language coverage gap and setting new performance standards.

LongRoPE2: Near-Lossless LLM Context Window Scaling

27 February 2025·3732 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Microsoft

LongRoPE2: Extends LLM context windows while preserving performance and reducing training costs!

NeoBERT: A Next-Generation BERT

26 February 2025·2699 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Polytechnique Montréal

NeoBERT: A new encoder that enhances bidirectional language understanding with cutting-edge architecture, data, and training, achieving SOTA results with only 250M parameters.

Chain of Draft: Thinking Faster by Writing Less

25 February 2025·1398 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Zoom Communications

CoD: LLMs think faster by writing less! A novel prompting strategy cuts costs and latency while maintaining reasoning accuracy.

Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

24 February 2025·3779 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

GOAT: Adaptively boosts LoRA with SVD & MoE alignment, closing the gap with Full FT.

Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning

24 February 2025·9576 words·45 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Yonsei University

Test-time scaling isn’t a universal solve-all for multilingual math reasoning, unlike pre-training scaling, shows MCLM benchmark.

The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer

21 February 2025·2673 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Vrije Universiteit Brussel

LLMs: 03-mini achieves superior accuracy without longer reasoning chains, suggesting ’thinking harder’ matters more than ’thinking longer'.

LightThinker: Thinking Step-by-Step Compression

21 February 2025·1662 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph

LightThinker: LLMs dynamically compress intermediate steps, reducing memory & boosting reasoning efficiency without sacrificing accuracy.

SurveyX: Academic Survey Automation via Large Language Models

20 February 2025·2720 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Renmin University of China

SURVEYX automates academic survey generation, enhancing content and citation quality.

How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

20 February 2025·2645 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 AIRI

Packing new knowledge into LoRA adapters can harm LLMs! A delicate balance is needed to prevent performance decline.

Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information

20 February 2025·4876 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Korea University

LLMs have ‘Temporal Heads’ that process time-specific facts!

Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models

19 February 2025·3075 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Zhejiang University

LORAM: Train small, infer large LLMs by memory-efficient LoRA training. Enables 70B parameter model training on a 20G HBM GPU, replacing A100-80G. Reduces parameter storage cost by 15.81x.

SIFT: Grounding LLM Reasoning in Contexts via Stickers

19 February 2025·3144 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai Jiao Tong University

SIFT: Grounds LLM reasoning with ‘Stickers’ to highlight context and improve accuracy without extra training.

REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models

19 February 2025·582 words·3 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Pohang University of Science and Technology

REFIND: Detects LLM hallucinations by directly leveraging retrieved documents, using a novel Context Sensitivity Ratio.