Large Language Models
Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance
·1604 words·8 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 KTH Royal Institute of Technology
LLMs’ performance on language complexity tasks (LIX & ADD) reveals a strong correlation with general capabilities, suggesting complexity metrics as noisy zero-shot proxies for model evaluation.
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity
·2347 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Illinois Urbana-Champaign
PoPilot, a novel proof-oriented programming LLM, outperforms GPT-40 by 64% under data scarcity by using synthetic data augmentation.
Atom of Thoughts for Markov LLM Test-Time Scaling
·2660 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Hong Kong University of Science and Technology
Atom of Thoughts (AOT) revolutionizes LLM test-time scaling by decomposing complex reasoning into independent sub-questions, drastically reducing computation while maintaining high accuracy.
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems
·3486 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Sony Group Corporation
TalkHier, a novel framework for LLM multi-agent systems, uses structured communication and hierarchical refinement to achieve state-of-the-art performance on various tasks, improving collaboration and…
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
·2722 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 DeepSeek-AI
NSA: a novel sparse attention mechanism achieves efficient long-context modeling by combining algorithmic innovations with hardware-aligned optimizations, surpassing full attention models across vario…
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
·7040 words·34 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Zhejiang University
LLMs’ knowledge acquisition is unveiled through the lens of evolving knowledge circuits, revealing how new knowledge integration depends on relevance to existing knowledge, exhibiting distinct phases …
FinMTEB: Finance Massive Text Embedding Benchmark
·3630 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Hong Kong University of Science and Technology
FinMTEB: A new benchmark reveals that general-purpose embedding models struggle in the finance domain; domain-specific models excel, and surprisingly, simple BoW outperforms sophisticated models on ce…
Dyve: Thinking Fast and Slow for Dynamic Process Verification
·1995 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Chinese University of Hong Kong
Dyve: A novel dynamic process verifier boosts LLM reasoning accuracy by cleverly combining fast, immediate checks with deeper, slower analyses for complex steps, achieving significant performance gain…
Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages
·2355 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Minzu University of China
XLM-SWCM: A novel framework efficiently adapts multilingual encoders for text generation in extremely low-resource languages by cleverly sharing weights between encoder and decoder, achieving superior…
Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey
·1603 words·8 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Northeastern University
This survey paper comprehensively analyzes methods for injecting domain-specific knowledge into LLMs, categorizing them into four key approaches and evaluating their trade-offs to enhance performance …
You Do Not Fully Utilize Transformer's Representation Capacity
·4126 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 T-Tech HSE University Moscow Institute of Physics and Technology
Boosting Transformer performance, Layer-Integrated Memory (LIMe) enhances representation capacity by enabling access to earlier layers’ hidden states, significantly improving performance across variou…
Typhoon T1: An Open Thai Reasoning Model
·3148 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 SCB 10X R&D
Typhoon T1: Open Thai reasoning model improves complex task performance by generating long chains of thought, detailed methodology, and open-source resources are provided.
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
·2201 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tencent AI Lab
LLMs often fail to demonstrate true understanding of concepts, acting as ‘stochastic parrots’ – a phenomenon quantitatively proven by the PHYSICO benchmark.
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
·4209 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 MIT
SelfCite: A self-supervised approach boosts LLM citation accuracy via context ablation. By removing or isolating cited text, SelfCite trains LLMs to generate high-quality citations without manual ann…
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
·2116 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Beijing University of Posts and Telecommunications
MUDDFormer boosts Transformer performance by dynamically generating connection weights, improving cross-layer information flow and surpassing models trained with significantly more compute.
CRANE: Reasoning with constrained LLM generation
·2445 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Illinois Urbana-Champaign
CRANE: A novel constrained decoding algorithm boosts LLM reasoning accuracy by strategically alternating between unconstrained reasoning and constrained generation.
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
·3429 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 National University of Singapore
CoT-Valve dynamically adjusts reasoning chain lengths based on task difficulty, significantly reducing inference costs in large language models without substantial accuracy loss.
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging
·3494 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 SCB 10X R&D
Low-resource language LLMs gain strong reasoning abilities by merging with a high-resource reasoning model, achieving performance comparable to state-of-the-art models while maintaining target languag…
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs
·2416 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tsinghua University
New benchmark COUNTERMATH enhances LLMs’ mathematical reasoning using counterexample-driven proofs, revealing current models’ limitations and paving the way for improved mathematical capabilities.
Better Embeddings with Coupled Adam
·2826 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 AI Sweden
Coupled Adam: A novel optimizer fixes anisotropic word embeddings in LLMs, boosting model performance.