Natural Language Processing

Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM's Nest

16 February 2025·3405 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Information Extraction 🏢 UC San Diego

Cuckoo: a novel information extraction (IE) model leverages LLM pre-training data, achieving superior performance in few-shot settings by reframing next-token prediction as token extraction.

Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages

15 February 2025·2355 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Minzu University of China

XLM-SWCM: A novel framework efficiently adapts multilingual encoders for text generation in extremely low-resource languages by cleverly sharing weights between encoder and decoder, achieving superior…

Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey

15 February 2025·1603 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Northeastern University

This survey paper comprehensively analyzes methods for injecting domain-specific knowledge into LLMs, categorizing them into four key approaches and evaluating their trade-offs to enhance performance …

You Do Not Fully Utilize Transformer's Representation Capacity

13 February 2025·4126 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 T-Tech HSE University Moscow Institute of Physics and Technology

Boosting Transformer performance, Layer-Integrated Memory (LIMe) enhances representation capacity by enabling access to earlier layers’ hidden states, significantly improving performance across variou…

Typhoon T1: An Open Thai Reasoning Model

13 February 2025·3148 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 SCB 10X R&D

Typhoon T1: Open Thai reasoning model improves complex task performance by generating long chains of thought, detailed methodology, and open-source resources are provided.

The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

13 February 2025·2201 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab

LLMs often fail to demonstrate true understanding of concepts, acting as ‘stochastic parrots’ – a phenomenon quantitatively proven by the PHYSICO benchmark.

SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models

13 February 2025·4327 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Intel Labs

SQUARE, a novel prompting technique, enhances LLM reasoning by prompting self-interrogation through sequential question answering, significantly outperforming traditional methods.

Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking

13 February 2025·1354 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 University of Copenhagen

Fact-checkers need explainable AI: This study reveals how AI tools can better support fact-checkers by providing explanations tailored to their workflows, addressing unmet needs, and improving the eff…

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

13 February 2025·4209 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 MIT

SelfCite: A self-supervised approach boosts LLM citation accuracy via context ablation. By removing or isolating cited text, SelfCite trains LLMs to generate high-quality citations without manual ann…

MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections

13 February 2025·2116 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Beijing University of Posts and Telecommunications

MUDDFormer boosts Transformer performance by dynamically generating connection weights, improving cross-layer information flow and surpassing models trained with significantly more compute.

CRANE: Reasoning with constrained LLM generation

13 February 2025·2445 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Illinois Urbana-Champaign

CRANE: A novel constrained decoding algorithm boosts LLM reasoning accuracy by strategically alternating between unconstrained reasoning and constrained generation.

CoT-Valve: Length-Compressible Chain-of-Thought Tuning

13 February 2025·3429 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 National University of Singapore

CoT-Valve dynamically adjusts reasoning chain lengths based on task difficulty, significantly reducing inference costs in large language models without substantial accuracy loss.

An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging

13 February 2025·3494 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 SCB 10X R&D

Low-resource language LLMs gain strong reasoning abilities by merging with a high-resource reasoning model, achieving performance comparable to state-of-the-art models while maintaining target languag…

One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs

12 February 2025·2416 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

New benchmark COUNTERMATH enhances LLMs’ mathematical reasoning using counterexample-driven proofs, revealing current models’ limitations and paving the way for improved mathematical capabilities.

Better Embeddings with Coupled Adam

12 February 2025·2826 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 AI Sweden

Coupled Adam: A novel optimizer fixes anisotropic word embeddings in LLMs, boosting model performance.

We Can't Understand AI Using our Existing Vocabulary

11 February 2025·3226 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Google DeepMind

To understand AI, we need new words! This paper argues that developing neologisms—new words for human & machine concepts—is key to bridging the communication gap and achieving better AI control.

LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

11 February 2025·3137 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 UC Berkeley

LLMs can be effectively taught complex reasoning via efficient fine-tuning on demonstration data focusing on structure, not content, of the reasoning process.

LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid

11 February 2025·2654 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory

LASP-2 revolutionizes linear attention training by achieving 36.6% faster speeds than Ring Attention via a novel sequence parallelism method, boosting efficiency for very long sequences.

CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

11 February 2025·5174 words·25 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

CODEI/O: Condensing reasoning patterns from code into LLM training data for enhanced reasoning.

Auditing Prompt Caching in Language Model APIs

11 February 2025·5759 words·28 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Stanford University

Researchers expose widespread prompt caching in LLMs via novel timing attacks, highlighting significant privacy risks and model architecture leakage.