Skip to main content

Large Language Models

Large-Scale Data Selection for Instruction Tuning
·2665 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 University of Washington
RDS+ is the unsung hero for scaling instruction tuning data selection!
Forgetting Transformer: Softmax Attention with a Forget Gate
·4225 words·20 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Mila & UniversitΓ© De MontrΓ©al
Transformers get forgetful! This paper introduces the Forgetting Transformer (FoX), incorporating a forget gate into the attention mechanism for improved sequence modeling.
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective
·2296 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 SandLogic Technologies Pvt Ltd
Shakti SLMs: Fine-tuning compact language models for efficient, domain-specific AI on edge devices.
CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom
·8404 words·40 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Huazhong University of Science and Technology
CROWDSELECT boosts instruction tuning by cleverly selecting synthetic data using multi-LLM wisdom, enhancing model performance across diverse tasks.
CodeArena: A Collective Evaluation Platform for LLM Code Generation
·1693 words·8 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Nanyang Technological University
CodeArena: Collective evaluation for LLM code generation.
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
·2236 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Fudan University
DuoDecoding: Accelerating LLM inference by strategically deploying draft & target models on CPU & GPU for parallel decoding and dynamic drafting.
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers
·2242 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 DAMO Academy, Alibaba Group
Babel: An open multilingual LLM supports over 90% of global speakers, filling the language coverage gap and setting new performance standards.
LongRoPE2: Near-Lossless LLM Context Window Scaling
·3732 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Microsoft
LongRoPE2: Extends LLM context windows while preserving performance and reducing training costs!
NeoBERT: A Next-Generation BERT
·2699 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Polytechnique MontrΓ©al
NeoBERT: A new encoder that enhances bidirectional language understanding with cutting-edge architecture, data, and training, achieving SOTA results with only 250M parameters.
Chain of Draft: Thinking Faster by Writing Less
·1398 words·7 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Zoom Communications
CoD: LLMs think faster by writing less! A novel prompting strategy cuts costs and latency while maintaining reasoning accuracy.
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
·3779 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Hong Kong University of Science and Technology
GOAT: Adaptively boosts LoRA with SVD & MoE alignment, closing the gap with Full FT.
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
·9576 words·45 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Yonsei University
Test-time scaling isn’t a universal solve-all for multilingual math reasoning, unlike pre-training scaling, shows MCLM benchmark.
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer
·2673 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Vrije Universiteit Brussel
LLMs: 03-mini achieves superior accuracy without longer reasoning chains, suggesting ’thinking harder’ matters more than ’thinking longer'.
LightThinker: Thinking Step-by-Step Compression
·1662 words·8 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph
LightThinker: LLMs dynamically compress intermediate steps, reducing memory & boosting reasoning efficiency without sacrificing accuracy.
SurveyX: Academic Survey Automation via Large Language Models
·2720 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Renmin University of China
SURVEYX automates academic survey generation, enhancing content and citation quality.
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
·2645 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 AIRI
Packing new knowledge into LoRA adapters can harm LLMs! A delicate balance is needed to prevent performance decline.
Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information
·4876 words·23 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Korea University
LLMs have ‘Temporal Heads’ that process time-specific facts!
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
·3075 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Zhejiang University
LORAM: Train small, infer large LLMs by memory-efficient LoRA training. Enables 70B parameter model training on a 20G HBM GPU, replacing A100-80G. Reduces parameter storage cost by 15.81x.
SIFT: Grounding LLM Reasoning in Contexts via Stickers
·3144 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Shanghai Jiao Tong University
SIFT: Grounds LLM reasoning with ‘Stickers’ to highlight context and improve accuracy without extra training.
REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models
·582 words·3 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Pohang University of Science and Technology
REFIND: Detects LLM hallucinations by directly leveraging retrieved documents, using a novel Context Sensitivity Ratio.