Skip to main content

Natural Language Processing

SAGE: A Framework of Precise Retrieval for RAG
·3653 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Question Answering 🏒 Tsinghua University
SAGE: Precise RAG via semantic segmentation, adaptive chunking, and LLM feedback, boosting QA accuracy & cost-efficiency.
Liger: Linearizing Large Language Models to Gated Recurrent Structures
·4096 words·20 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Shanghai AI Laboratory
Liger: LLMs linearized to gated recurrent models, enabling efficient deployment via key matrix repurposing and LoRA fine-tuning.
Large-Scale Data Selection for Instruction Tuning
·2665 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 University of Washington
RDS+ is the unsung hero for scaling instruction tuning data selection!
Forgetting Transformer: Softmax Attention with a Forget Gate
·4225 words·20 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Mila & UniversitΓ© De MontrΓ©al
Transformers get forgetful! This paper introduces the Forgetting Transformer (FoX), incorporating a forget gate into the attention mechanism for improved sequence modeling.
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective
·2296 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 SandLogic Technologies Pvt Ltd
Shakti SLMs: Fine-tuning compact language models for efficient, domain-specific AI on edge devices.
CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom
·8404 words·40 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Huazhong University of Science and Technology
CROWDSELECT boosts instruction tuning by cleverly selecting synthetic data using multi-LLM wisdom, enhancing model performance across diverse tasks.
CodeArena: A Collective Evaluation Platform for LLM Code Generation
·1693 words·8 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Nanyang Technological University
CodeArena: Collective evaluation for LLM code generation.
SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking
·3011 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Question Answering 🏒 FPT Software AI Center, Viet Nam
SemViQA: A new approach to boost Vietnamese fact-checking with semantic understanding and efficient evidence retrieval.
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
·2236 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Fudan University
DuoDecoding: Accelerating LLM inference by strategically deploying draft & target models on CPU & GPU for parallel decoding and dynamic drafting.
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers
·2242 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 DAMO Academy, Alibaba Group
Babel: An open multilingual LLM supports over 90% of global speakers, filling the language coverage gap and setting new performance standards.
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning
·219 words·2 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Machine Translation 🏒 Huawei, China
R1-T1: RL-driven framework incentivizing translation capability in LLMs via reasoning learning, achieving superior performance in multiple languages & domains.
Mixture of Structural-and-Textual Retrieval over Text-rich Graph Knowledge Bases
·2582 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Question Answering 🏒 University of Oregon
MoR: Adaptive knowledge retrieval by fusing structural and textual data for better question answering.
LongRoPE2: Near-Lossless LLM Context Window Scaling
·3732 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Microsoft
LongRoPE2: Extends LLM context windows while preserving performance and reducing training costs!
NeoBERT: A Next-Generation BERT
·2699 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Polytechnique MontrΓ©al
NeoBERT: A new encoder that enhances bidirectional language understanding with cutting-edge architecture, data, and training, achieving SOTA results with only 250M parameters.
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens
·4298 words·21 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Text Generation 🏒 NLCo Lab, BIGAI
TokenSwift: Accelerate LLM ultra-long sequence generation up to 100K tokens with >3x speedup and lossless accuracy!
Exploring Rewriting Approaches for Different Conversational Tasks
·1596 words·8 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Question Answering 🏒 Adobe Research
Rewriting method is critical to conversational assistant effectiveness.
Chain of Draft: Thinking Faster by Writing Less
·1398 words·7 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Zoom Communications
CoD: LLMs think faster by writing less! A novel prompting strategy cuts costs and latency while maintaining reasoning accuracy.
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
·3779 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Hong Kong University of Science and Technology
GOAT: Adaptively boosts LoRA with SVD & MoE alignment, closing the gap with Full FT.
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
·9576 words·45 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Yonsei University
Test-time scaling isn’t a universal solve-all for multilingual math reasoning, unlike pre-training scaling, shows MCLM benchmark.
Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties
·2937 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Question Answering 🏒 Southeast University
CTM: A new benchmark for assessing temporal reasoning in LLMs across Chinese dynastic history.