Skip to main content

Natural Language Processing

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
·2284 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Alibaba International Digital Commerce
Marco-01: a novel large reasoning model surpasses existing LLMs by using Chain-of-Thought, Monte Carlo Tree Search, and reflection mechanisms to excel in open-ended problem-solving, particularly in co…
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
·5261 words·25 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 ETH Zurich
LLMs’ hallucinations stem from entity recognition: SAEs reveal model ‘self-knowledge’, causally affecting whether it hallucinates or refuses to answer. This mechanism is even repurposed by chat finet…
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
·4011 words·19 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 National University of Singapore
AnchorAttention enhances long-context LLMs by mitigating BFloat16’s disruptive effects on RoPE, improving performance and speeding up training.
Patience Is The Key to Large Language Model Reasoning
·477 words·3 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Tsinghua University
Boosting Large Language Model (LLM) reasoning without massive datasets: A novel training method encourages ‘patient’ reasoning, improving accuracy by up to 6.7% on benchmark tasks.
ORID: Organ-Regional Information Driven Framework for Radiology Report Generation
·3437 words·17 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Text Generation 🏒 University of Sydney
ORID framework leverages organ-regional information to boost radiology report generation, achieving state-of-the-art accuracy by integrating multi-modal data and reducing noise from unrelated organs.
Hymba: A Hybrid-head Architecture for Small Language Models
·4219 words·20 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 NVIDIA
Hymba: Hybrid-head architecture boosts small language model performance by 11.67x cache size reduction and 3.49x throughput, surpassing existing models.
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
·2774 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 University College London
BALROG benchmark rigorously evaluates LLMs’/VLMs’ abilities in complex games, revealing their strengths and weaknesses in long-term planning and decision-making, highlighting the need for improved vis…
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection
·2311 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Government Technology Agency Singapore
New data-free methodology creates effective, generalizable LLMs guardrails against off-topic prompts, significantly improving LLM safety and responsible use.
Ultra-Sparse Memory Network
·5103 words·24 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 ByteDance
UltraMem, a novel ultra-sparse memory network, drastically speeds up LLM inference by 6x compared to MoE while maintaining performance, paving the way for efficient large-scale model deployment.
RedPajama: an Open Dataset for Training Large Language Models
·7625 words·36 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Stanford University
RedPajama, two massive open-source datasets, are released for training LLMs, improving transparency and facilitating the development of high-performing open-source models.
Evaluating Tokenizer Performance of Large Language Models Across Official Indian Languages
·3728 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Assam Kaziranga University
SUTRA tokenizer outperforms other LLMs in Indian languages, improving efficiency and facilitating better model performance.
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
·2024 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Chinese Information Processing Laboratory
Verifier engineering: A new post-training paradigm for foundation models using automated verifiers to provide effective supervision signals, enhancing capabilities beyond traditional data-centric meth…
Drowning in Documents: Consequences of Scaling Reranker Inference
·273 words·2 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Information Retrieval 🏒 Databricks
Scaling reranker inference surprisingly degrades retrieval quality beyond a certain point, prompting the need for more robust reranking techniques.
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
·3206 words·16 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Tsinghua University
SageAttention2 achieves 4-bit accurate attention, boosting inference speed by 2x compared to FlashAttention2, while maintaining end-to-end accuracy across diverse models.
LLΓ€Mmlein: Compact and Competitive German-Only Language Models from Scratch
·3133 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Center for Artificial Intelligence and Data Science
New German-only LLMs, LLΓ€Mmlein 120M & 1B, trained from scratch & openly released, show competitive performance and offer insights into efficient model training.
SlimLM: An Efficient Small Language Model for On-Device Document Assistance
·2811 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Auburn University
SlimLM: Efficient small language models (SLMs) optimized for mobile document assistance, achieving comparable or superior performance to existing SLMs.
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
·2885 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Tsinghua University
LLaMA-Mesh: Unifying 3D mesh generation with LLMs by directly representing meshes as text, enabling efficient text-to-3D conversion within a single model.
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering
·5666 words·27 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Question Answering 🏒 Department of Computer Science, University of Oregon
MedRGB benchmark reveals current LLMs struggle with noisy medical data, emphasizing the need for robust RAG systems in healthcare AI.
Adaptive Decoding via Latent Preference Optimization
·4975 words·24 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Meta AI
LLMs can dynamically adjust decoding temperature using Adaptive Decoding and Latent Preference Optimization, improving performance across creative and factual tasks.
Cut Your Losses in Large-Vocabulary Language Models
·2958 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Apple
Cut Cross-Entropy (CCE) dramatically reduces the memory footprint of training large language models by cleverly computing the cross-entropy loss without materializing the full logit matrix.