Skip to main content

Natural Language Processing

Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM's Nest
·3405 words·16 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Information Extraction 🏒 UC San Diego
Cuckoo: a novel information extraction (IE) model leverages LLM pre-training data, achieving superior performance in few-shot settings by reframing next-token prediction as token extraction.
Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages
·2355 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Minzu University of China
XLM-SWCM: A novel framework efficiently adapts multilingual encoders for text generation in extremely low-resource languages by cleverly sharing weights between encoder and decoder, achieving superior…
Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey
·1603 words·8 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Northeastern University
This survey paper comprehensively analyzes methods for injecting domain-specific knowledge into LLMs, categorizing them into four key approaches and evaluating their trade-offs to enhance performance …
You Do Not Fully Utilize Transformer's Representation Capacity
·4126 words·20 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 T-Tech HSE University Moscow Institute of Physics and Technology
Boosting Transformer performance, Layer-Integrated Memory (LIMe) enhances representation capacity by enabling access to earlier layers’ hidden states, significantly improving performance across variou…
Typhoon T1: An Open Thai Reasoning Model
·3148 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 SCB 10X R&D
Typhoon T1: Open Thai reasoning model improves complex task performance by generating long chains of thought, detailed methodology, and open-source resources are provided.
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
·2201 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Tencent AI Lab
LLMs often fail to demonstrate true understanding of concepts, acting as ‘stochastic parrots’ – a phenomenon quantitatively proven by the PHYSICO benchmark.
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models
·4327 words·21 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Question Answering 🏒 Intel Labs
SQUARE, a novel prompting technique, enhances LLM reasoning by prompting self-interrogation through sequential question answering, significantly outperforming traditional methods.
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking
·1354 words·7 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Question Answering 🏒 University of Copenhagen
Fact-checkers need explainable AI: This study reveals how AI tools can better support fact-checkers by providing explanations tailored to their workflows, addressing unmet needs, and improving the eff…
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
·4209 words·20 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 MIT
SelfCite: A self-supervised approach boosts LLM citation accuracy via context ablation. By removing or isolating cited text, SelfCite trains LLMs to generate high-quality citations without manual ann…
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
·2116 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Beijing University of Posts and Telecommunications
MUDDFormer boosts Transformer performance by dynamically generating connection weights, improving cross-layer information flow and surpassing models trained with significantly more compute.
CRANE: Reasoning with constrained LLM generation
·2445 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 University of Illinois Urbana-Champaign
CRANE: A novel constrained decoding algorithm boosts LLM reasoning accuracy by strategically alternating between unconstrained reasoning and constrained generation.
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
·3429 words·17 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 National University of Singapore
CoT-Valve dynamically adjusts reasoning chain lengths based on task difficulty, significantly reducing inference costs in large language models without substantial accuracy loss.
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging
·3494 words·17 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 SCB 10X R&D
Low-resource language LLMs gain strong reasoning abilities by merging with a high-resource reasoning model, achieving performance comparable to state-of-the-art models while maintaining target languag…
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs
·2416 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Tsinghua University
New benchmark COUNTERMATH enhances LLMs’ mathematical reasoning using counterexample-driven proofs, revealing current models’ limitations and paving the way for improved mathematical capabilities.
Better Embeddings with Coupled Adam
·2826 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 AI Sweden
Coupled Adam: A novel optimizer fixes anisotropic word embeddings in LLMs, boosting model performance.
We Can't Understand AI Using our Existing Vocabulary
·3226 words·16 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Google DeepMind
To understand AI, we need new words! This paper argues that developing neologismsβ€”new words for human & machine conceptsβ€”is key to bridging the communication gap and achieving better AI control.
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
·3137 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 UC Berkeley
LLMs can be effectively taught complex reasoning via efficient fine-tuning on demonstration data focusing on structure, not content, of the reasoning process.
LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid
·2654 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Shanghai AI Laboratory
LASP-2 revolutionizes linear attention training by achieving 36.6% faster speeds than Ring Attention via a novel sequence parallelism method, boosting efficiency for very long sequences.
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
·5174 words·25 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Hong Kong University of Science and Technology
CODEI/O: Condensing reasoning patterns from code into LLM training data for enhanced reasoning.
Auditing Prompt Caching in Language Model APIs
·5759 words·28 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Stanford University
Researchers expose widespread prompt caching in LLMs via novel timing attacks, highlighting significant privacy risks and model architecture leakage.