Natural Language Processing
SAGE: A Framework of Precise Retrieval for RAG
·3653 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Question Answering
π’ Tsinghua University
SAGE: Precise RAG via semantic segmentation, adaptive chunking, and LLM feedback, boosting QA accuracy & cost-efficiency.
Liger: Linearizing Large Language Models to Gated Recurrent Structures
·4096 words·20 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Shanghai AI Laboratory
Liger: LLMs linearized to gated recurrent models, enabling efficient deployment via key matrix repurposing and LoRA fine-tuning.
Large-Scale Data Selection for Instruction Tuning
·2665 words·13 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ University of Washington
RDS+ is the unsung hero for scaling instruction tuning data selection!
Forgetting Transformer: Softmax Attention with a Forget Gate
·4225 words·20 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Mila & UniversitΓ© De MontrΓ©al
Transformers get forgetful! This paper introduces the Forgetting Transformer (FoX), incorporating a forget gate into the attention mechanism for improved sequence modeling.
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective
·2296 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ SandLogic Technologies Pvt Ltd
Shakti SLMs: Fine-tuning compact language models for efficient, domain-specific AI on edge devices.
CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom
·8404 words·40 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Huazhong University of Science and Technology
CROWDSELECT boosts instruction tuning by cleverly selecting synthetic data using multi-LLM wisdom, enhancing model performance across diverse tasks.
CodeArena: A Collective Evaluation Platform for LLM Code Generation
·1693 words·8 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Nanyang Technological University
CodeArena: Collective evaluation for LLM code generation.
SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking
·3011 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Question Answering
π’ FPT Software AI Center, Viet Nam
SemViQA: A new approach to boost Vietnamese fact-checking with semantic understanding and efficient evidence retrieval.
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
·2236 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Fudan University
DuoDecoding: Accelerating LLM inference by strategically deploying draft & target models on CPU & GPU for parallel decoding and dynamic drafting.
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers
·2242 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ DAMO Academy, Alibaba Group
Babel: An open multilingual LLM supports over 90% of global speakers, filling the language coverage gap and setting new performance standards.
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning
·219 words·2 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Machine Translation
π’ Huawei, China
R1-T1: RL-driven framework incentivizing translation capability in LLMs via reasoning learning, achieving superior performance in multiple languages & domains.
Mixture of Structural-and-Textual Retrieval over Text-rich Graph Knowledge Bases
·2582 words·13 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Question Answering
π’ University of Oregon
MoR: Adaptive knowledge retrieval by fusing structural and textual data for better question answering.
LongRoPE2: Near-Lossless LLM Context Window Scaling
·3732 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Microsoft
LongRoPE2: Extends LLM context windows while preserving performance and reducing training costs!
NeoBERT: A Next-Generation BERT
·2699 words·13 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Polytechnique MontrΓ©al
NeoBERT: A new encoder that enhances bidirectional language understanding with cutting-edge architecture, data, and training, achieving SOTA results with only 250M parameters.
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens
·4298 words·21 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Text Generation
π’ NLCo Lab, BIGAI
TokenSwift: Accelerate LLM ultra-long sequence generation up to 100K tokens with >3x speedup and lossless accuracy!
Exploring Rewriting Approaches for Different Conversational Tasks
·1596 words·8 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Question Answering
π’ Adobe Research
Rewriting method is critical to conversational assistant effectiveness.
Chain of Draft: Thinking Faster by Writing Less
·1398 words·7 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Zoom Communications
CoD: LLMs think faster by writing less! A novel prompting strategy cuts costs and latency while maintaining reasoning accuracy.
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
·3779 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Hong Kong University of Science and Technology
GOAT: Adaptively boosts LoRA with SVD & MoE alignment, closing the gap with Full FT.
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
·9576 words·45 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Yonsei University
Test-time scaling isn’t a universal solve-all for multilingual math reasoning, unlike pre-training scaling, shows MCLM benchmark.
Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties
·2937 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Question Answering
π’ Southeast University
CTM: A new benchmark for assessing temporal reasoning in LLMs across Chinese dynastic history.