Skip to main content

Large Language Models

UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages
·2221 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Ajou University
UnifiedCrawl efficiently harvests massive monolingual datasets for low-resource languages from Common Crawl, enabling affordable LLM adaptation via QLoRA, significantly improving performance.
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
·2284 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Alibaba International Digital Commerce
Marco-01: a novel large reasoning model surpasses existing LLMs by using Chain-of-Thought, Monte Carlo Tree Search, and reflection mechanisms to excel in open-ended problem-solving, particularly in co…
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
·5261 words·25 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 ETH Zurich
LLMs’ hallucinations stem from entity recognition: SAEs reveal model ‘self-knowledge’, causally affecting whether it hallucinates or refuses to answer. This mechanism is even repurposed by chat finet…
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
·4011 words·19 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 National University of Singapore
AnchorAttention enhances long-context LLMs by mitigating BFloat16’s disruptive effects on RoPE, improving performance and speeding up training.
Patience Is The Key to Large Language Model Reasoning
·477 words·3 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Tsinghua University
Boosting Large Language Model (LLM) reasoning without massive datasets: A novel training method encourages ‘patient’ reasoning, improving accuracy by up to 6.7% on benchmark tasks.
Hymba: A Hybrid-head Architecture for Small Language Models
·4219 words·20 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 NVIDIA
Hymba: Hybrid-head architecture boosts small language model performance by 11.67x cache size reduction and 3.49x throughput, surpassing existing models.
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
·2774 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 University College London
BALROG benchmark rigorously evaluates LLMs’/VLMs’ abilities in complex games, revealing their strengths and weaknesses in long-term planning and decision-making, highlighting the need for improved vis…
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection
·2311 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Government Technology Agency Singapore
New data-free methodology creates effective, generalizable LLMs guardrails against off-topic prompts, significantly improving LLM safety and responsible use.
Ultra-Sparse Memory Network
·5103 words·24 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 ByteDance
UltraMem, a novel ultra-sparse memory network, drastically speeds up LLM inference by 6x compared to MoE while maintaining performance, paving the way for efficient large-scale model deployment.
RedPajama: an Open Dataset for Training Large Language Models
·7625 words·36 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Stanford University
RedPajama, two massive open-source datasets, are released for training LLMs, improving transparency and facilitating the development of high-performing open-source models.
Evaluating Tokenizer Performance of Large Language Models Across Official Indian Languages
·3728 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Assam Kaziranga University
SUTRA tokenizer outperforms other LLMs in Indian languages, improving efficiency and facilitating better model performance.
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
·2024 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Chinese Information Processing Laboratory
Verifier engineering: A new post-training paradigm for foundation models using automated verifiers to provide effective supervision signals, enhancing capabilities beyond traditional data-centric meth…
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
·3206 words·16 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Tsinghua University
SageAttention2 achieves 4-bit accurate attention, boosting inference speed by 2x compared to FlashAttention2, while maintaining end-to-end accuracy across diverse models.
LLΓ€Mmlein: Compact and Competitive German-Only Language Models from Scratch
·3133 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Center for Artificial Intelligence and Data Science
New German-only LLMs, LLΓ€Mmlein 120M & 1B, trained from scratch & openly released, show competitive performance and offer insights into efficient model training.
SlimLM: An Efficient Small Language Model for On-Device Document Assistance
·2811 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Auburn University
SlimLM: Efficient small language models (SLMs) optimized for mobile document assistance, achieving comparable or superior performance to existing SLMs.
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
·2885 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Tsinghua University
LLaMA-Mesh: Unifying 3D mesh generation with LLMs by directly representing meshes as text, enabling efficient text-to-3D conversion within a single model.
Adaptive Decoding via Latent Preference Optimization
·4975 words·24 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Meta AI
LLMs can dynamically adjust decoding temperature using Adaptive Decoding and Latent Preference Optimization, improving performance across creative and factual tasks.
Cut Your Losses in Large-Vocabulary Language Models
·2958 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Apple
Cut Cross-Entropy (CCE) dramatically reduces the memory footprint of training large language models by cleverly computing the cross-entropy loss without materializing the full logit matrix.
Can sparse autoencoders be used to decompose and interpret steering vectors?
·2017 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 University of Oxford
Sparse autoencoders fail to accurately decompose and interpret steering vectors due to distribution mismatch and the inability to handle negative feature projections; this paper identifies these issue…
CamemBERT 2.0: A Smarter French Language Model Aged to Perfection
·1996 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Inria, Paris, France
CamemBERT 2.0: Two new French language models (CamemBERTav2 & CamemBERTv2) outperform predecessors by addressing temporal concept drift via larger, updated datasets and enhanced tokenization, demonstr…