Natural Language Processing
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
·2710 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 School of Computer Science, Fudan University
Contrary to popular belief, longer reasoning chains don’t always boost Large Language Model (LLM) accuracy; this research reveals that parallel scaling with shorter solutions outperforms sequential sc…
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
·2524 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Xi'an Jiaotong University
PhysReason benchmark evaluates physics-based reasoning in LLMs, revealing critical limitations and guiding future improvements.
Large Language Models and Mathematical Reasoning Failures
·397 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 KTH Royal Institute of Technology
Large language models struggle with mathematical word problems, demonstrating flaws in reasoning despite achieving high accuracy; a new study highlights these persistent gaps in generalization abiliti…
Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance
·1604 words·8 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 KTH Royal Institute of Technology
LLMs’ performance on language complexity tasks (LIX & ADD) reveals a strong correlation with general capabilities, suggesting complexity metrics as noisy zero-shot proxies for model evaluation.
Continuous Diffusion Model for Language Modeling
·1809 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Text Generation
🏢 Korea Advanced Institute of Science and Technology
RDLM: A novel continuous diffusion model for language modeling leverages the geometry of categorical distributions, outperforming existing discrete approaches and approaching autoregressive model perf…
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity
·2347 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 University of Illinois Urbana-Champaign
PoPilot, a novel proof-oriented programming LLM, outperforms GPT-40 by 64% under data scarcity by using synthetic data augmentation.
Atom of Thoughts for Markov LLM Test-Time Scaling
·2660 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Hong Kong University of Science and Technology
Atom of Thoughts (AOT) revolutionizes LLM test-time scaling by decomposing complex reasoning into independent sub-questions, drastically reducing computation while maintaining high accuracy.
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems
·3486 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Sony Group Corporation
TalkHier, a novel framework for LLM multi-agent systems, uses structured communication and hierarchical refinement to achieve state-of-the-art performance on various tasks, improving collaboration and…
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
·2722 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 DeepSeek-AI
NSA: a novel sparse attention mechanism achieves efficient long-context modeling by combining algorithmic innovations with hardware-aligned optimizations, surpassing full attention models across vario…
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
·7040 words·34 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Zhejiang University
LLMs’ knowledge acquisition is unveiled through the lens of evolving knowledge circuits, revealing how new knowledge integration depends on relevance to existing knowledge, exhibiting distinct phases …
FinMTEB: Finance Massive Text Embedding Benchmark
·3630 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Hong Kong University of Science and Technology
FinMTEB: A new benchmark reveals that general-purpose embedding models struggle in the finance domain; domain-specific models excel, and surprisingly, simple BoW outperforms sophisticated models on ce…
Dyve: Thinking Fast and Slow for Dynamic Process Verification
·1995 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Chinese University of Hong Kong
Dyve: A novel dynamic process verifier boosts LLM reasoning accuracy by cleverly combining fast, immediate checks with deeper, slower analyses for complex steps, achieving significant performance gain…
Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM's Nest
·3405 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Information Extraction
🏢 UC San Diego
Cuckoo: a novel information extraction (IE) model leverages LLM pre-training data, achieving superior performance in few-shot settings by reframing next-token prediction as token extraction.
Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages
·2355 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Minzu University of China
XLM-SWCM: A novel framework efficiently adapts multilingual encoders for text generation in extremely low-resource languages by cleverly sharing weights between encoder and decoder, achieving superior…
Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey
·1603 words·8 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Northeastern University
This survey paper comprehensively analyzes methods for injecting domain-specific knowledge into LLMs, categorizing them into four key approaches and evaluating their trade-offs to enhance performance …
You Do Not Fully Utilize Transformer's Representation Capacity
·4126 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 T-Tech HSE University Moscow Institute of Physics and Technology
Boosting Transformer performance, Layer-Integrated Memory (LIMe) enhances representation capacity by enabling access to earlier layers’ hidden states, significantly improving performance across variou…
Typhoon T1: An Open Thai Reasoning Model
·3148 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 SCB 10X R&D
Typhoon T1: Open Thai reasoning model improves complex task performance by generating long chains of thought, detailed methodology, and open-source resources are provided.
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
·2201 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tencent AI Lab
LLMs often fail to demonstrate true understanding of concepts, acting as ‘stochastic parrots’ – a phenomenon quantitatively proven by the PHYSICO benchmark.
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models
·4327 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Intel Labs
SQUARE, a novel prompting technique, enhances LLM reasoning by prompting self-interrogation through sequential question answering, significantly outperforming traditional methods.
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking
·1354 words·7 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 University of Copenhagen
Fact-checkers need explainable AI: This study reveals how AI tools can better support fact-checkers by providing explanations tailored to their workflows, addressing unmet needs, and improving the eff…